Solution to MSTP frame desynchronization

Updated on 2021.7.13

We had discussed BACnet MSTP weakness to frame desynchronization in below:

BACnet MSTP frame lost synchronization

Attack BACnet MSTP by frame desynchronization

But what is BACRouter’s solution to this problem, let’s look for the clues from the standard.

9.5.2 Variables

SilenceTimer: A timer with nominal 5 millisecond resolution used to measure and generate silence on the medium between octets. It is incremented by a timer process and is cleared by the Receive State Machine when activity is detected and by the SendFrame procedure as each octet is transmitted.

9.5.3 Parameters:

Tframe_gap: The maximum idle time a sending node may allow to elapse between octets of a frame the node is transmitting: 20 bits times.

Tturnaround: The minimum time after the end of the stop bit of the final octet of a received frame before a node may enable its EIA-485 driver: 40 bits time.

Tpostdrive: The maximum time after the end of the stop bit of the final octet of a transmitted frame before a node must disable its EIA-485 driver: 15 bit times.

9.5.5 The SendFrame Procedure

If SilenceTimer is less than Tturnaround, wait (Tturnaround – SilenceTimer).

9.2.3 Timing

Transmitter disable: The node shall disable its EIA-485 driver within Tpostdrive after the beginning of the stop bit of the final octet of a frame in order that it not interfere with any subsequent frame transmitted by another node. This specification allows, but does not encourage, the use of a “padding” octet after the final octet of a frame in order to facilitate the use of common UART transmit interrupts for driver disable control. If a “padding” octet is used, its value shall be X’FF’. The “padding” octet is not considered part of the frame, that is, it shall be included within Tpostdrive.

(It’s unclear that whether the Tturnaround include “padding” octet, but in 135.1 testing standard, chapter 12.1.3.4 “Verify T turnaround”: If the reference master employs a “padding” octet of X’FF’ as the last octet of every frame, then the time shall be measured starting from the trailing edge of the stop bit of the octet that precedes the X’FF’ “pad” octet in the frame transmitted by the reference master)

So in a valid frame,  the maximum bus idle is Tframe_gap plus tailing bit “1” in the previous octet. it’s 29 bits time (assuming previous octet is X’FF’)

Considering “padding” octet, the minimum bus idle between 2 frames is Tturnaround – Tpostdrive + 9 (tailing bit “1” in the “padding” octet), it’s 34 bits time.

BACRouter use a revised RSM to implement previous logic:

  1. When the time between receiving 2 sequential bytes is longer than 20 bits time, the receiving frame is aborted.
  2. Idle time on the bus greater than or equal to 33 bits time means there is a new frame.
  3. To be compatible with devices not respecting to Tturnaround, any data following valid frame will be regarded as new frame.

In 115200bps, one bit time is only 8.7us. To precisely measure duration of idle line, the timer granularity of BACRouter is set to only 5us. It help to resist to frame desynchronization, and reach 98.8% bandwidth utilization on 115.2kbps because BACRouter no more waste time when 40 bits Tturnaround is over.

Attack BACnet MSTP by frame desynchronization

As pointed out by previous article “BACnet MSTP frame lost synchronization” , BACnet MSTP has a design flaw on frame synchronization, but how to utilize it to perform attack and strictly obey the standard at the same time?

We make some assumptions here:

  1. There are at least 3 devices on the bus with MAC address 1, 8,10. The device 1 is carefully designed to perform attack. Device 8 and 10 are innocent.
  2. Device 1 supports extended frame, device 8 and 10 are not.
  3. The timers of 3 devices is precise enough.

The work flow of device 1 is:

  1. When get token, send out frame A
  2. Pass token to MAC address 2
  3. When get token again, send out frame B
  4. Pass token to MAC address 2
  5. goto step 1 again.

Frame A is a valid proprietary frame (hexadecimal);

55 ff 80 ff 01 00 1d a3 02 2b 72 fe 55 ff 03 08 01 00 11 a0 ff 55 ff 21 01 08 00 09 ce d4 f3 55 ff 00 01 08 00 00 bf

Frame B is also a valid proprietary frame as:

55 ff 80 ff 01 00 1d a3 02 2b fe dc 55 ff 03 0a 01 00 11 b1 ff 55 ff 21 01 0a 00 09 fd 8a 51 55 ff 00 01 0a 00 00 8c

Every thing will go well if there is no frame desynchronization, but after hours running, if device 8 losses synchronization with frame A header (It has same effect if device 10 losses synchronization when device 1 sends frame B) , device 8 find another frame when scan Frame A’s data portion:

55 ff 03 08 01 00 11 a0 ff 55 ff 21 01 08 00 09 ce d4 f3 55 ff 00 01 08 00 00 bf

It’s a Test-Request frame send to device 8,  so device 8 try to reply it after Tturnaround with a Test-Response frame:

55 ff 04 01 08 00 11 ae ff 55 ff 21 01 08 00 09 ce d4 f3 55 ff 00 01 08 00 00 bf

but at the same time, device 1 passes token by sending:

55 ff 00 02 01 00 00 73

So the first 8 bytes of two frames are collided, so device 10 drop invalid header, find data as below:

55 ff 21 01 08 00 09 ce d4 f3 55 ff 00 01 08 00 00 bf

When device 1 finishs sending, it starts receiving data and get the same as:

55 ff 21 01 08 00 09 ce d4 f3 55 ff 00 01 08 00 00 bf

For device 10, it get a valid Not-For-Us frame header, so it enter SKIP-DATA state, there is not enough data to skip, so device 10 will wait until Tframe_abort.

For device 1, it’s a BACnet-Extended-Not-Expecting-Reply frame header, because it support extended frame, so it validate header by procedure described in Addendum 135-2012an. Because the data length is too short, so it abort the frame enter IDLE state again, then find another frame:

55 ff 00 01 08 00 00 bf

It’s a token frame passing token to device 1, so device 1 get token then sending Frame B just after Tturnaround:

55 ff 80 ff 01 00 1d a3 02 2b fe dc 55 ff 03 0a 01 00 11 b1 ff 55 ff 21 01 0a 00 09 fd 8a 51 55 ff 00 01 0a 00 00 8c

As mentioned above, Device 10 still wait 1 byte to skip previous frame (because Tturnaround < Tframe_abort), so it miss this frame header. get wrong frame as:

55 ff 03 0a 01 00 11 b1 ff 55 ff 21 01 0a 00 09 fd 8a 51 55 ff 00 01 0a 00 00 8c

It’s another Test-Request frame send to device 10, thing repeats.

From above, every devices strictly obey standard, but once frame desynchronization occurs, the whole MSTP bus is stalled forever.

Read more on Solution to mstp frame desynchronization

BACnet MSTP frame lost synchronization

There are two concepts of frame:

  1. BACnet MSTP datalink layer frame, it has at least 8 octet bytes, including: 2 preamble bytes of 0x55 and 0xff, frame_type, destination_mac, source_mac, 2 data_len bytes, crc, and omissible data portion.  We call this type of frame as “MSTP frame”.
  2. EIA-485 frame, it is consisted of bits, including start bit, data bit, parity bit, stop bit. BACnet MSTP using non-return to zero (NRZ) encoding with one start bit, eight data bits, no parity, and one stop bit. The start bit shall have a value of zero, while the stop bit shall have a value of one. The data bits shall be transmitted with the
    least significant bit first. We call this type of frame as “byte frame”.

BACnet MSTP Receive Frame Finite State Machine (hereafter refer to RSM) distinguish starting of frame by preamble bytes. If there is no DataAvailable or ReceiveError within a frame for Tframe_abort (60 bits time, but Implementations may use larger values for this timeout, not to exceed 100 milliseconds) , the frame is aborted, RSM search for next frame again.

Because preamble bytes is allowed on other portions of MSTP frame (Extended frame introduced by Addendum 135-2012an is an exception, it use COBS encoding to avoid 0x55 existing in data portion), so RSM may parse MSTP frame beginning from data portion of previous MSTP frame.

The minimum time gap between MSTP frames is Tturnaround (40 bits time), it is less than Tframe_abort, so if RSM lost sychronization of MSTP frame, it may parse the wrong MSTP frame across actual MSTP frames.

There are several cause of losing synchronization:

  1. Program defect on sending or receiving device. It could be eleminated by code review and test.
  2. Timer precision. BACnet MSTP standard only requires a 1% precise timer with a resolution of 5 ms or less. So it is hard to check out Tframe_abort. Because there is only 5 ms of error space between delay and timeout (Tusage_delay to Tusage_timeout, Treply_delay to Treply_timeout), it is very likely causing collisions for slow responding devices.
  3. Noise on bus line. Noise causes byte frame error and crc error, RSM will abort previous MSTP frame in both situations.

Some may argue that MSTP frame integrity is protected by header crc8 and data crc16. Even without considering malicious devices(Here is a attack example) and random data collision, there are still applications may send valid frame in data portion.

(Updated 25 June 2021) For example, There are some products that have implemented a packet capture function on MSTP bus, then the captured data may be transferred by BACnet service (PrivateTransfer or AtomicReadFile?). What would happen if there is electronic noise on the bus when the data is passing through a MSTP network?

It is not only possible to cause MSTP bus being blocked, but also device
malfunction if the wrong frame is a APDU or break whole BACnet
inter-network if the wrong frame is a I-Am-Router-To-Network.


Byte Frame Desynchronization

For the byte frame, losing synchronization usually is caused by noise, no termination or absence of biasing. The symptoms include regarding data bit of 1 as idle line, regarding data bit of 0 as start bit of new byte. In addition to data error, the losing synchronization of last byte in a MSTP frame will introduce measure error of idle time between MSTP frames.

Read more on Solution to mstp frame desynchronization