Tuesday, April 6, 2021

Watch Your Step: Research Into the Concrete Effects of Fault Injection on Processor State via Single-Step Debugging

by Ethan Shackelford, Associate Security Consultant at IOActive

Fault injection, also known as glitching, is a technique where some form of interference or invalid state is intentionally introduced into a system in order to alter the behavior of that system. In the context of embedded hardware and electronics generally, there are a number of forms this interference might take. Common methods for fault injection in electronics include:

  • Clock glitching (errant clock edges are forced onto the input clock line of an IC)

  • Voltage fault injection (applying voltages higher or lower than the expected voltage to IC power lines)

  • Electromagnetic glitching (Introducing EM interference)

This article will focus on voltage fault injection, specifically, the introduction of momentary voltages outside of normal operating conditions on the target device's power rails. These momentary pulses or drops in input voltage (glitches) can affect device operation, and are directed with the intention of achieving a particular effect. Commonly desired effects include "corrupting" instructions or memory in the processor and skipping instructions. Previous research has shown that these effects can be predictably achieved [1], as well has provided some explanation as to the EM effects (caused by the glitch) which might be responsible for the various behaviors [2].

However, a gap in published research exists in correlating glitches (and associated EM effects) with concrete changes in state at the processor level (i.e. what exactly occurs in the processor at the moment of a glitch that causes an instruction to be corrupted or skipped, an incorrect branch to be taken, etc.). This article seeks to quantify and qualify the state of a processor before, during, and after an injected fault, and describe discrete changes in markers such as registers including general registers as well as control registers such as $pc and $lr, memory, and others.

 

Past Research and Thanks

Special thanks to the folks at Toothless Consulting, whose excellent series of blog posts [3] were my introduction to fault injection, and the inspiration for this project. Additional thanks to Chris Gerlinsky, whose research into embedded device security and in particular his talk [4] on breaking CRP on the LPC family of chips was an invaluable resource during this project.


Test Setup

The target device chosen for testing was the NXP LPC1343, an ARM Cortex-M3 microcontroller. In order to control the input target voltage and coordinate glitches, the Digilent Arty A7 development board was used, built around the Xilinx Artix 7 FPGA. Custom gateware was developed for the Arty board, in order to facilitate control and triggering of glitches based on a variety of factors. For the purposes of this article, the two main triggers used are a GPIO line which goes high/low synchronized to certain device operations, and SWD signals corresponding to a "step" event. The source code for the FPGA gateware is available here.

In order to switch between the standard voltage level (Vdd) and the glitch voltage level (Vglitch), a Maxim MAX4617 Multiplexer IC was used. It is capable of switching between inputs in as little as 10ns, and is thus suitable for producing a glitch waveform on the LPC 1343 power rails with sufficient accuracy and timing. 




As illustrated in the image above, the Arty A7 monitors a “trigger” line, either a GPIO output from the target or the SWD lines between the target and the debugger, depending on the mode of operation. When the expected condition is met, the A7 will drive the “glitch out” according to a provided waveform specifier, triggering a switch between Vdd and Vglitch via the Power Mux Circuit and feeding that to the target Vcore voltage line. A Segger J-Link was used to provide debug access to the target, and the SWD lines are also fed to the A7 for triggering.

In order to facilitate triggering on arbitrary SWD commands, a barebones SWD receiver was implemented on the A7. The receiver parses SWD transactions sniffed from the bus, and outputs the deserialized header and transaction data, values which can then be compared with a pre-configured target value. This allows for triggering of the glitchOut line based on any SWD data – for example, the S TEP and RESUME transactions, providing a means of timing glitches for single-stepped instructions.


 

Prior to any direct testing of glitches performed while single-stepping instructions, observing glitches during normal operation and the effects they cause is helpful to provide a base understanding, as well as to provide a platform for making assumptions which can be tested later on. To provide an environment for observing the results of glitches of varied form and duration, program execution consists of a simple loop, incrementing and decrementing two variables. At each iteration, the value of each variable is checked against a known target value, and execution will break out of the loop when either one of the conditions is met. Outside of the loop, the values are checked against expected values and those values are transmitted via UART to the attacking PC if they differ.

Binary Ninja reverse engineering software was used to provide a visual representation of the compiled C. Because the assembly presented represents the machine code produced after compiling and linking, we can be sure that it matches the behavior of the processor exactly (ignoring concepts like parallel execution, pipelining etc. for now), and lean on that information when making assumptions about timing and processor behavior with regard to injecting faults. 

 


 

Though simple, this environment provides a number of interesting targets for fault injection. Contained in the loop are memory access instructions (LDR, STR), arithmetic operations (ADDS, SUBS), comparisons, and branching operations. Additionally, the pulse of PIO2_6 provides a trigger for the glitchOut signal from the FPGA – depending on the delay applied to that signal, different areas/instructions in the overall loop may be targeted. By tracing the power consumption of the ARM core with a shunt resistor and transmission line probe, execution can be visualized. 

The following waveform shows the GPIO trigger line (blue), and the power trace coming from the LPC (purple). The GPIO line goes high for one cycle then low, signaling the start of the loop. What follows is a pattern which repeats 16 times, representing the 16 iterations of the loop. This is bounded on either side by the power trace corresponding to the code responsible for writing data to the UART, and branching back to the start of the main loop, which is fairly uniform. 

 

 

 

We now have: 

  1. A reference of the actual instructions being executed by the processor (the disassembly via Binary Ninja)
  2. A visual representation of that execution, viewable in real time as the processor executes (via the power trace)
  3. A means of taking action within the system under test which can be calibrated based on the behavior of the processor (the FPGA glitcher).

Using the above information, it is possible to vary the offset of the glitch from the trigger, and (roughly) correlate that timing to a given instruction or group of instructions being executed. For example, by triggering a glitch sometime during the sixth repetition of the pattern on the power trace, we can observe that that portion of the power trace appears to be cut off early, and the values reported over UART by the target reflect some kind of misbehavior or corruption during the sixth iteration of the loop.

 

 



So far, the methodology employed has been in line with traditional fault injection parameter search techniques – optimize for visibility into a system to determine the most effective timing and glitch duration using some behavior baked into device operation (here, a GPIO line pulsing). While this provides coarse insight into the effects of a successfully injected fault (for the above example we can make the assumption that an operation at some point during the sixth iteration of the loop was altered, any more specificity is just speculation), it may have been a skipped load instruction, a corrupted store, or a flipped compare among many other possibilities.

To illustrate this point, the following is the parsed, sorted, and counted output of the UART traffic from the target device, after running the glitch for a few thousand iterations of the outer loop. The glitch delay and duration remained constant, but resulted in a fairly wide spread of discreet effects on the state of the variables at the end of the loop. Some entries are easy to reason about, such as the first and most common result: B is the expected value after six iterations (16 - 6 = 10), but A is 16, and thus a skipped LDR or STR instruction may have left the value 16 in the register placed there by previous operations. However, other results are harder to reason about, such as the entries containing ascii text, or entries where the variable with the incorrect value doesn't appear to correlate to the iteration number of the loop.




This level of vagueness is acceptable in some applications of fault injection, such as breaking out of an infinite loop as is sometimes seen in secure boot bypass techniques. However, for more complex attacks where a particular operation needs to be corrupted in just the right way greater specificity, and thus a more granular understanding, is a necessity. 

And so what follows is the novel portion of the research conducted for this article: creating a methodology for targeting fault injection attacks to single instructions, leveraging debug interfaces such as SWD/JTAG for instruction isolation and timing. In addition to the research value offered by this work, the developed methodology may also have practical applications under certain, not uncommon circumstances regarding devices in the wild as well, which will be discussed in a later section. 

 

A (Very) Quick Rundown of the SWD protocol

SWD is a debugging protocol developed by ARM and used for debugging many devices, including the Cortex-M3 core in the LPC 1343 target board. From the ARM Debug Interface Architecture Specification ADIv5.0 to ADIv5.2

The Arm SWD interface uses a single bidirectional data connection and a separate clock to transfer data synchronously. An operation on the wire consists of two or three phases: packet request, acknowledgement response, and data transfer.

Of course, there's more to it than that, but for the purposes of this article all we're really interested in is the data transfer, thanks to a quirk of Cortex-M3 debugging registers: halting, stepping, and continuing execution are all managed by writes to the Debug Halting Control and Status Register (DHCSR). Additionally, writes to this register are always prefixed with 0xA05F, and only the low 4 bits are used to control the debug state -- [MASKINTS, STEP, HALT, DEBUGEN] from high to low. So we can track STEP and RESUME actions by looking for SWD write transaction with the data 0xA05F0001 (RESUME) and 0xA05F000D (STEP).

 



Because of the aforementioned bidirectionality of the protocol, it isn't as easy as just matching a bit pattern: based on whether a read or write transaction is taking place, and which phase is currently underway, data may be valid on either clock edge. Beyond that, there are also turnaround periods that may or may not be inserted between phases, depending on the transaction. The simplest solution turned out to be just implementing half of the protocol, and discarding the irrelevant portions keeping only the data for comparison. The following is a Vivado ILA trace of the-little-SWD-implementation-that-could successfully parsing the STEP transaction sniffed from the SWD lines.

 


Isolating Instructions

So, by single stepping an instruction and sniffing the SWD lines from the A7, it is possible to trigger a glitch the instant (or very close to, within 10ns) the data is latched by the target board's debug machinery. Importantly, because the target requires a few trailing SWCLK cycles to complete whatever actions the debug probe requires of it, there is plenty of wiggle room between the data being latched and the actual execution of the instruction. And indeed, thanks to the power trace, there is a clear indication of the start of processor activity after the SWD transaction completes.





As can be seen above, there is a delay of somewhere in the neighborhood of 4us, an eternity at the 100MHz of the A7. By delaying the glitch to various offsets into the "bump" corresponding to instruction execution, we can finally do what we came here to do: glitch a single-stepping processor.

In order to produce a result more interesting than "look, it works!" a simple script was written to manage the behavior of the debugger/processor via OpenOCD. The script has two modes: a "fast" mode, which single steps as fast as the debugger can keep up with used for finding the correct timing and waveform for glitches, and a (painfully) "slow" mode, which inspects registers and the stack before and after each glitch event, highlighting any unexpected behavior for perusal. Almost immediately, we can see some interesting results glitching a load register instruction in the middle of the innermost loop -- in this case a LDR r3, [sp] which loads the previous value of the A variable into r3, to be incremented in the next instruction.

 




We can see that nothing has changed, suggesting that the operations simply didn't occur or finish -- a skipped instruction. This reliably leads to an off-by-one discrepancy in the UART output from the device: either A/B ends up 1 less/greater than it should be at the end of the loop, because one of the inc/dec operations was acting on data which is not actually associated with the state of the A variable.

Interestingly, this research shows that the effectiveness of fault injection is not limited only to instructions which access memory (LDR, STR, etc.), but can also be used to affect the execution of arithmetic operations, such as ADDS and CMP, or even branch instructions (though whether the instructions themselves are being corrupted or if the corruption is occurring on the ASPR by which branches are decided requires further study). In fact, no instruction tested for this article proved impervious to single-step-glitching, though the rate of success did vary depending on the instruction.

 

 


 

We see here the CMP instruction which determines whether or not A matches the expected 0x10 being targeted. We see that the xPSR is not updated (meaning the zero flag is not set and as far as the processor is concerned, the CMP'd values did not match, and so the values of A and B are sent via UART. However, because it was the CMP instruction itself being glitched, the reported values are the correct 0x10 and 0. Interestingly, we see that r1 has been updated to 0x10, the same immediate value used in the original CMP. Referring to the ARMv7 Architecture Reference Manual, the machine code for CMP r3, 0x10 should be 0x102b. Considering possible explanations for the observed behavior, one might consider an instruction like LDR or MOVS, which could have moved the value into the r1 register. And as it turns out, the machine code for MOVS r1, 0x10 is 0x1021, not too many bits away from the original 0x102b!

While that isn't the definitive answer as to cause for the observed behavior, its a guess well beyond the level of information available via power trace analysis and similar techniques alone. And if it is correct, we not only know what generally occurred to cause this behavior, but can even see which bits specifically in the instruction were flipped for a given glitch delay/duration.

Including all the script output for every instruction type in this article is a bit impractical, but for the curious the logs detailing registers/stack before and after each successful glitch for each instruction type will be made available in the git repo hosting the glitcher code. 


Practical Applications

I know what you're thinking. 

"If you have access to a device via JTAG/SWD debugger, why fuss with all the fault injection stuff? You can make the device do anything you want! In fact, I recently read a great blog post where I learned how to take advantage of an open JTAG interface!"

However, there is a very common configuration for embedded devices in the wild to which the research presented here could prove useful. Many devices, including the STM32 series (such as the DUT for this article), implement a sort of "high but not the highest possible" security mode, which allows for limited debugging capabilities, but prevents reads and writes to certain areas of memory, rendering the bulk of techniques for leveraging an open JTAG connection ineffective. This is chosen over the more secure option of disabling debugging entirely because the latter leaves no option for fixing or updating device firmware (without a custom bootloader), and many OEMs may choose to err towards serviceability rather than security. In most such implementations though, single stepping is still permitted!

In such a scenario, aided by a copy of device firmware, a probing setup analogous the one described here, or both, it may be possible to render an otherwise time-consuming and tedious attack nearly trivial, stripping away all the calibration and timing parameterization normally required for fault injection attacks. Need to bypass secure boot on a partially locked down device? No problem, just break on the CMP that checks the return value of is_secureboot_enabled().

 

Future Research

Further research is required to really categorize the applicability of this methodology during live testing, but the initial results do seems promising. Further testing will likely be performed on more realistic/practical device firmware, such as the previously mentioned secure boot scenario.

Additionally and more immediately, part two of this series of blog posts will continue to focus on developing a better understanding of what happens within an integrated circuit, and in particular a complex IC such as a CPU, when subjected to fault injection attacks. I have been putting together an 8-bit CPU out of 74 series discreet components in my spare time over the last few months and once complete it will make the perfect target for this research: the clock is controllable/steppable externally, and each individual module (the bus, ALU, registers, etc.) are accessible by standard oscilloscope probes and other equipment. 

This should allow for incredibly close examination of system state under a variety of conditions, and make transitory issues caused by faults which are otherwise difficult to observe (for example an injected fault interfering with the input lines of the ALU but not the actual input registers) quite clear to see.

Stay tuned!

 

Video Demonstration

 


References

[1] J. Gratchoff, "Proving the wild jungle jump," University of Amsterdam, Jul. 2015

[2] Y. Lu, "Injecting Software Vulnerabilities with Voltage Glitching," Feb. 2019

[3] D. Nedospasov, "NXP LPC1343 Bootloader Bypass,"

Breaking Code Read Protection on the NXP LPC-family Microcontrollers," Jan. 2017, https://recon.cx/2017/brussels/talks/breaking_crp_on_nxp.html

[5] A. Barenghi, G. Bertoni, E. Parrinello, G. Pelosi, "Low Voltage Fault Attacks on the RSA Cryptosystem," 2009

Tuesday, February 23, 2021

Probing and Signal Integrity Fundamentals for the Hardware Hacker, part 2: Transmission Lines, Impedance, and Stubs

by Andrew D. Zonenberg, Ph.D.
Associate Principal Security Consultant

This is the second post in our ongoing series on the troubles posed by high-speed signals in the hardware security lab.

What is a High-speed Signal?

Let's start by defining "high-speed" a bit more formally:

A signal traveling through a conductor is high-speed if transmission line effects are non-negligible.

That's nice, but what is a transmission line? In simple terms:

A transmission line is a wire of sufficient length that there is nontrivial delay between signal changes from one end of the cable to the other.

You may also see this referred to as the wire being "electrically long."

Exactly how long this is depends on how fast your signal is changing. If your signal takes half a second (500ms) to ramp from 0V to 1V, a 2 ns delay from one end of a wire to the other will be imperceptible. If your rise time is 1 ns, on the other hand, the same delay is quite significant - a nanosecond after the signal reaches its final value at the input, the output will be just starting to rise!

Using the classical "water analogy" of circuit theory, the pipe from the handle on your kitchen sink to the spout is probably not a transmission line - open the valve and water comes out pretty much instantly. A long garden hose, on the other hand, definitely is.

Propagation Delay

The exact velocity of propagation depends on the particular cable or PCB geometry, but it's usually somewhere around two thirds the speed of light or six inches per nanosecond. You may see this specified in cable datasheets as "velocity factor." A velocity factor of 0.6 means a propagation velocity of 0.6 times the speed of light.

Let's make this a bit more concrete by running an experiment: a fast rising edge from a signal generator is connected to a splitter (the gray box in the photo below) with one output fed through a 3-inch cable to an oscilloscope input, and the other through a much longer 24-inch cable to another scope input.

Experimental setup for propagation delay demonstration

This is a 21-inch difference in length. The cable I used for this test (Mini-Circuits 086 series Hand-Flex) uses PTFE dielectric and has a velocity factor of about 0.695. Plugging these numbers into Wolfram Alpha, we get an expected skew of 2.56 ns:

Calculating expected propagation delay for the experiment

Sure enough, when we set the scope to trigger on the signal coming off the short cable, we see the signal from the longer cable arrives a touch over 2.5 ns later. The math checks out!

Experimental observation of cable delay
 
As a general rule for digital signals, if the expected propagation delay of your cable or PCB trace is more than 1/10 the rise time of the signal in question, it should be treated as a transmission line.
 
Note that the baud rate/clock frequency is unimportant! In the experimental setup above, the test signal has a frequency of only 10 MHz (100 ns period) but since the edges are very fast, cable delay is easily visible and thus it should be treated as a transmission line.

This is a common pitfall with modern electronics. It's easy to look at a data bus clocked at a few MHz and think "Oh, it's not that fast." But if the I/O pins on the device under test (DUT) have sharp edges, as is typical for modern parts capable of high data rates, transmission line effects may still be important to consider.

Impedance, Reflections, and Termination

If you open an electrical engineering textbook and look for the definition of impedance you're probably going to see pages of math talking about cable capacitance and inductance, complex numbers, and other mumbo-jumbo. The good news is, the basic concept isn't that difficult to understand.

Imagine applying a voltage to one end of an infinitely long cable. Some amount of current will flow and a rising edge will begin to travel down the line. Since our hypothetical cable never ends, the system never reaches a steady state and this current will keep flowing at a constant level forever. We can then plug this voltage and current into Ohm's law (V=I * R) and solve for R. The transmission line thus acts as a resistor!

This resistance is known as the "characteristic impedance" of the transmission line, and depends on factors such as the distance from signal to ground, the shape of the conductors, and the dielectric constant of the material between them. Most coaxial cable and high-speed PCB traces are 50Ω impedance (because this is a round number, a convenient value for typical material properties at easily manufacturable dimensions, and more) although a few applications such as analog television use 75Ω and other values are used in some specialized applications. The remainder of this discussion assumes 50Ω impedance for all transmission lines unless otherwise stated.

What happens if the line forks? We have the same amount of current flowing in since the line has a 50Ω impedance at the upstream end, but at the split it sees 25Ω (two 50Ω loads in parallel). The signal thus reduces in amplitude downstream of the fork, but otherwise propagates unchanged.

There's just one problem: at the point of the split, we have X volts in the upstream direction and X/2 volts in the downstream direction! This is obviously not a stable condition, and results in a second wavefront propagating back upstream down the cable. The end result is a mirrored (negative) copy of the incident signal reflecting back.

We can easily demonstrate this effect experimentally by placing a T fitting at the end of a cable and attaching additional cables to both legs of the fitting. The two cables coming off the T are then terminated (one at a scope input and the other with a screw-on terminator). We'll get to why this is important in a bit.

Experimental setup for split in transmission line  

The cable from the scope input to the split is three inches long; using the same 0.695 velocity factor as before gives a propagation delay of 0.365 ns. So for the first 0.365 ns after the signal rises everything is the same as before. Once the edge hits the T the reduced-voltage signal starts propagating down the two legs of the T, but the same reduced voltage also reflects back upstream.

Observed waveform from split test

It takes another 0.365 ns for the reflection to reach the scope input so we'd expect to see the voltage dip at around 0.73 ns (plus a bit of additional delay in the T fitting itself) which lines up nicely with the observed waveform.

In addition to an actual T structure in the line, this same sort of negative reflection can be caused by any change to a single transmission line (different dielectric, larger wire diameter, signal closer to ground, etc.) which reduces the impedance of the line at some point.

Up to this point in our analysis, we've only considered infinitely long wires (and the experimental setups have been carefully designed to make this a good approximation). What if our line is somewhat long, but eventually ends in an open circuit? At the instant that the rising edge hits the end of the wire, current is flowing. It can't stop instantaneously as the current from further down the line is still flowing - the source side of the line has no idea that anything has changed. So the edge goes the only place it can - reflecting down the line towards the source.

Our experimental setup for this is simple: a six-inch cable with the other end unconnected.

Experimental setup for open circuit at end of a transmission line

Using the same 0.695 velocity factor, we'd expect our signal to reach the end of the cable in about 0.73 ns and the reflection to hit the scope at 1.46 ns. This is indeed what we see.

Observed waveform from open circuit test

Sure enough, we see a reflected copy of the original edge. The difference is, since the open circuit is a higher-than-matched impedance instead of lower like in the previous experiment, the reflection has a positive sign and the signal amplitude increases rather than dropping.

A closer inspection shows that this pattern of reflections repeats, much weaker, starting at around 3.0 ns. This is caused by the reflection hitting the T fitting where the signal generator connects to the scope input. Since the parallel combination of the scope input and signal generator is not an exact 50Ω impedance, we get another reflection. The signal likely continues reflecting several more times before damping out completely, however these additional reflections are too small to show up with the current scope settings.

So if a cable ending in a low impedance results in a negative reflection, and a cable ending in a high impedance results in a positive reflection, what happens if the cable ends in an impedance equal to that of the cable - say, a 50Ω resistor? This results in a matched termination which suppresses any reflections: the incident signal simply hits the resistor and its energy is dissipated as heat. Terminating high-speed lines (via a resistor at the end, or any of several other possible methods) is critical to avoid reflections degrading the quality of the signal.

One other thing to be aware of is that the impedance of circuit may not be constant across frequency. If there is significant inductance or capacitance present, the impedance will have frequency-dependent characteristics. Many instrument inputs and probes will specify their equivalent resistance and capacitance separately, for example "1MΩ || 17 pF" so that the user can calculate the effective impedance at their frequency of interest.

Stub Effects and Probing

We can now finally understand why the classic reverse engineering technique of soldering long wires to a DUT and attaching them to a logic analyzer is ill-advised when working with high speed systems: doing so creates an unterminated "stub" in the transmission line.

Typical logic analyzer inputs have high input impedances in order to avoid loading down the signals on the DUT. For example, the Saleae Logic Pro 8 has an impedance of 1MΩ || 10 pF and the logic probe for the Teledyne LeCroy WaveRunner 8000-MS series oscilloscopes is 100 kΩ || 5 pF. Although the input capacitance does result in the impedance decreasing at higher frequencies, it remains well over 50Ω for the operating range of the probe.

This means that if the wire between the DUT and the logic analyzer is electrically long, the signal will reflect off the analyzer's input and degrade the signal as seen by the DUT. To see this in action, let's do an experiment on a real board which boots from an external SPI flash clocked at a moderately fast speed - about 75 MHz.
 
As a control, we'll first look at the signal using an electrically short probe. I'll be using a Teledyne LeCroy D400A-AT, a 4 GHz active differential probe. This is a very high performance probe meant for non-intrusive measurements on much faster signals such as DDR3 RAM.

Active probe on SPI flash

Probe tip seen through microscope

Looking at the scope display, we see a fairly clean square wave on the SPI SCK pin. There's a small amount of noise and some rounding of the edges, but nothing that would be expected to cause boot failures.

Observed SPI SCK waveform


The measured SPI clock frequency is 73.4 MHz and the rise time is 1.2 ns. This means that any stub longer than 120 ps round trip (60 ps one way) will start to produce measurable effects. With a velocity of 0.695, this comes out to about half an inch. You may well get away with something a bit longer ("measurable effects" does not mean "guaranteed boot failure"), but at some point the degradation will be sufficient to cause issues.

Now that we've got our control waveform, let's build a probing setup more typical of what's used in lower speed hardware reverse engineering work: a 12-inch wire from a common rainbow ribbon cable bundle, connected via a micro-grabber clip to a Teledyne LeCroy MSO-DLS-001 logic probe. The micro-grabber is about 2.5 inches in length, which when added to the wire comes to a total stub of about 14.5 inches.

(Note that the MSO-DLS-001 flying leads include a probe circuit at the tip, so the length of the flying lead itself does not count toward the stub length. When using lower end units such as the Saleae Logic that use ordinary wire leads, the logic analyzer's lead length must be considered as part of the stub.)

Experimental setup with long probe stub wire (yellow)

We'd thus expect to see a reflection at around 3.5 ns, although there's a fair bit of error in this estimate because the velocity factor of the cable is unspecified since it's not designed for high-speed use. We also expect the reflections to be a bit more "blurry" and less sharply defined than in the previous examples, since the rise time of our test signal is slower.

Measured SPI clock waveform with long stub wire

 

There's a lot of data to unpack here so let's go over it piece by piece.

First, we do indeed see the expected reflection. For about the first half of the cycle - close to the 3.5 ns we predicted - the waveform is at a reduced voltage, then it climbs to the final value for the rest of the cycle.

Second, there is significant skew between the waveform seen by the analog probe and the logic analyzer, which is caused by the large difference in length between the path from the DUT to the two probe inputs.

Third, this level of distortion is very likely to cause the DUT to malfunction. The two horizontal cursors are at 0.2 and 0.8 times the 1.8V supply voltage for the flash device, which are the logic low and high thresholds from the device datasheet. Any voltage between these cursors could be interpreted as either a low or a high, unpredictably, or even cause the input buffer to oscillate.

During the period in which the clock is supposed to be high, more than half the time is spent in this nondeterministic region. Worst case, if everything in this region is interpreted as a logic low, the clock will only appear to be high for about a quarter of the cycle! This would likely act like a glitch and result in failures.

Most of the low period is spent in the safe "logic low" range, however it appears to brush against the nondeterministic region briefly. If other noise is present in the DUT, this reflection could be interpreted as a logic high and also create a glitch.

Conclusions

As electronics continue to get faster, hardware hackers can no longer afford to remain ignorant of transmission line effects. A basic understanding of these physics can go a long way in predicting when a test setup is likely to cause problems.

A Practical Approach to Attacking IoT Embedded Designs (II)

by Ruben Santamarta

In this second and final blog post on this topic, we cover some OTA vulnerabilities we identified in wireless communication protocols, primarily Zigbee and BLE.




As in the previous post, the findings described herein are intended to illustrate the type of vulnerabilities a malicious actor could leverage to attack a specified target to achieve DoS, information leakage, or arbitrary code execution.

These vulnerabilities affect numerous devices within the IoT ecosystem. IOActive worked with the semiconductor vendors to coordinate the disclosure of these security flaws, but it is worth mentioning that due the specific nature of the IoT market and despite the fact that patches are available, a significant number of vulnerable devices will likely never be patched.

As usual, IOActive followed a responsible disclosure process, notifying the affected vendors and coordinating with them to determine the proper time to disclose issues. In general terms, most vendors properly handled the disclosure process.

At the time of publishing this blog post, the latest versions of the affected SDKs contain fixes for the vulnerabilities. Please note that IOActive has not verified these patches.

OTA Vulnerabilities

Affected vendors

  • Nordic Semiconductor
  • Texas Instruments
  • Espressif Systems
  • Qualcomm

Nordic Semiconductor  - www.nordicsemi.com

Vulnerability

Integer overflow in ‘ble_advdata_search

Affected Products

nRF5 SDK prior to version 16

Background 

“The nRF5 SDK is your first stop for building fully featured, reliable and secure applications with the nRF52 and nRF51 Series. It offers developers a wealth of varied modules and examples right across the spectrum including numerous Bluetooth Low Energy profiles, Device Firmware Upgrade (DFU), GATT serializer and driver support for all peripherals on all nRF5 Series devices. The nRF5 SDK will almost certainly have something for your needs in developing exciting yet robust wireless products” https://www.nordicsemi.com/Software-and-tools/Software/nRF5-SDK

Impact

A malicious actor able to send specially crafted BLE advertisements could leverage this vulnerability to execute arbitrary code in the context of a device running a nRF5-SDK-based application. This may lead to the total compromise of the affected device.

Technical Details

At line 644, an attacker-controlled buffer pointed to by ‘p_encoded_data[i]’ may be 0x00, which will overflow ‘len’, whose value will be 0xFFFF after the operation.

This effectively bypasses the sanity check at line 645.

File: nRF5SDK160098a08e2/components/ble/common/ble_advdata.c
619: uint16_t ble_advdata_search(uint8_t const * p_encoded_data,
620:                             uint16_t        data_len,
621:                             uint16_t      * p_offset,
622:                             uint8_t         ad_type)
623: {
624:     if ((p_encoded_data == NULL) || (p_offset == NULL))
625:     {
626:         return 0;
627:     }
628: 
629:     uint16_t i = 0;
630: 
631:     while (((i < *p_offset) || (p_encoded_data[i + 1] != ad_type)) && (i < data_len))
632:     {
633:         // Jump to next data.
634:         i += (p_encoded_data[i] + 1);
635:     }
636: 
637:     if (i >= data_len)
638:     {
639:         return 0;
640:     }
641:     else
642:     {
643:         uint16_t offset = i + 2;
644:         uint16_t len    = p_encoded_data[i] - 1;      // FLAW
645:         if ((offset + len) > data_len) 		   // bypass
646:         {
647:             // Malformed. Extends beyond provided data.
648:             return 0;
649:         }
650:         *p_offset = offset;
651:         return len;
652:     }
653: }

Exploitation

Different scenarios are possible depending on how ‘len’ is handled by the caller. In the following example, this vulnerability leads to a classic stack overflow at line 185.

File: nRF5SDK160098a08e2/examples/ble_central/experimental/ble_app_hrs_nfc_c/ble_m.c
153: static void on_adv_report(ble_gap_evt_adv_report_t const * p_adv_report)
154: {
155:     ret_code_t  err_code;
156:     uint8_t   * p_adv_data;
157:     uint16_t    data_len;
158:     uint16_t    field_len;
159:     uint16_t    dev_name_offset = 0;
160:     char        dev_name[DEV_NAME_LEN];
161: 
162:     // Initialize advertisement report for parsing.
163:     p_adv_data = (uint8_t *)p_adv_report->data.p_data;
164:     data_len   = p_adv_report->data.len;
165: 
166:     // Search for advertising names.
167:     field_len = ble_advdata_search(p_adv_data, 
168:                                    data_len, 
169:                                    &dev_name_offset, 
170:                                    BLE_GAP_AD_TYPE_COMPLETE_LOCAL_NAME);
171:     if (field_len == 0)
172:     {
173:         // Look for the short local name if it was not found as complete.
174:         field_len = ble_advdata_search(p_adv_data, 
175:                                        data_len,
176:                                        &dev_name_offset,
177:                                        BLE_GAP_AD_TYPE_SHORT_LOCAL_NAME);
178:         if (field_len == 0)
179:         {
180:             // Exit if the data cannot be parsed.
181:             return;
182:         }
183:     }
184: 
185:     memcpy(dev_name, &p_adv_data[dev_name_offset], field_len);

Vulnerability

Incorrect DFU packet length resulting in remote code execution

Affected Products

nRF5 SDK for Mesh prior to version 4.1.0

Background 

“The nRF5 SDK for Mesh combined with the nRF52 Series is the complete solution for your Bluetooth mesh development.” https://www.nordicsemi.com/Software-and-tools/Software/nRF5-SDK-for-Mesh

Impact

A malicious actor able to initiate a DFU connection to the affected device could potentially leverage this vulnerability to execute arbitrary code in the context of the bootloader. This may lead to the total compromise of the affected device.

Technical Details

When the bootloader handles DFU messages, the length of the mesh advertising data packets is not properly checked. The vulnerable code path is as follows:

1. In ‘bootloader_init’ at line 466, the rx callback is initialized to ‘rx_cb’ by ‘transport_init’.

File: nRF5-SDK-for-Mesh-master/mesh/bootloader/src/bootloader.c
439: void bootloader_init(void)
440: {
441:     rtc_init();
442: 
443:     memset(&m_flash_fifo, 0, sizeof(fifo_t));
.
SNIP

461: 
462: #ifdef RBC_MESH_SERIAL
463:     mesh_aci_init();
464: #endif
465: 
466:     transport_init(rx_cb, RBC_MESH_ACCESS_ADDRESS_BLE_ADV);
467: 
468:     bool dfu_bank_flash_start;
469:     dfu_bank_scan(&dfu_bank_flash_start);
 

2. At line 211, the advertising packet queue is checked for DFU packets by calling ‘mesh_packet_adv_data_get’, which does not perform proper validation of the ‘adv_data_length’ field (e.g. by checking for a minimum value [ > 3 ]). As a result at line 217, 'p_adv_data->adv_data_length' (8-bit) may wrap to a large 32-bit value, which is stored at ‘rx_cmd.params.rx.length’.

File: nRF5-SDK-for-Mesh-master/mesh/bootloader/src/bootloader.c
209: static void rx_cb(mesh_packet_t* p_packet)
210: {
211:     mesh_adv_data_t* p_adv_data = mesh_packet_adv_data_get(p_packet);
212:     if (p_adv_data && p_adv_data->handle > RBC_MESH_APP_MAX_HANDLE)
213:     {
214:         bl_cmd_t rx_cmd;
215:         rx_cmd.type = BL_CMD_TYPE_RX;
216:         rx_cmd.params.rx.p_dfu_packet = (dfu_packet_t*) &p_adv_data->handle;
217:         rx_cmd.params.rx.length = p_adv_data->adv_data_length - 3;
218:         bl_cmd_handler(&rx_cmd);
219:     }
220: }


3. A ‘signature’ packet is then routed, without checking the length (truncated to 16-bit at ‘bl_cmd_handler’), through ‘bl_cmd_handler’-> ‘dfu_mesh_rx’ -> ‘handle_data_packet’ and finally ‘target_rx_data’, where the memory corruption may occur at line 861.

File:  nRF5-SDK-for-Mesh-master/mesh/bootloader/src/dfu_mesh.c
827: static uint32_t target_rx_data(dfu_packet_t* p_packet, uint16_t length, bool* p_do_relay)
828: {
829:     uint32_t* p_addr = NULL;
830:     uint32_t error_code = NRF_ERROR_NULL;
831: 
832:     if (m_data_req_segment == p_packet->payload.data.segment)
833:     {
834:         /* Got missing packet, stop requesting. */
835:         m_data_req_segment = DATA_REQ_SEGMENT_NONE;
836:         bl_evt_t tx_abort_evt;
837:         tx_abort_evt.type = BL_EVT_TYPE_TX_ABORT;
838:         tx_abort_evt.params.tx.abort.tx_slot = TX_SLOT_BEACON;
839:         bootloader_evt_send(&tx_abort_evt);
840:     }
841: 
842:     if (p_packet->payload.data.segment <=
843:         m_transaction.segment_count - m_transaction.signature_length / SEGMENT_LENGTH)
844:     {
845:         p_addr = addr_from_seg(p_packet->payload.data.segment, m_transaction.p_start_addr);
846:         error_code = dfu_transfer_data((uint32_t) p_addr,
847:                 p_packet->payload.data.data,
848:                 length - (DFU_PACKET_LEN_DATA - SEGMENT_LENGTH));
849:     }
850:     else /* treat signature packets at the end */
851:     {
852:         uint32_t index = p_packet->payload.data.segment -
853:             (m_transaction.segment_count - m_transaction.signature_length / SEGMENT_LENGTH) - 1;
854:         if (index >= m_transaction.signature_length / SEGMENT_LENGTH ||
855:                 m_transaction.signature_bitmap & (1 << index))
856:         {
857:             error_code = NRF_ERROR_INVALID_STATE;
858:         }
859:         else
860:         {
861:             memcpy(&m_transaction.signature[index * SEGMENT_LENGTH],
862:                     p_packet->payload.data.data,
863:                     length - (DFU_PACKET_LEN_DATA - SEGMENT_LENGTH));
864: 
865:             __LOG("Signature packet #%u\n", index);
866:             m_transaction.signature_bitmap |= (1 << index);
867:             error_code = NRF_SUCCESS;
868:         }

Vulnerability

Multiple buffer overflows when handling Advertising Bearer data packets

Affected Products

nRF5 SDK for Mesh prior to version 4.1.0

Background 

“The nRF5 SDK is your first stop for building fully featured, reliable and secure applications with the nRF52 and nRF51 Series. It offers developers a wealth of varied modules and examples right across the spectrum including numerous Bluetooth Low Energy profiles, Device Firmware Upgrade (DFU), GATT serializer and driver support for all peripherals on all nRF5 Series devices. The nRF5 SDK will almost certainly have something for your needs in developing exciting yet robust wireless products” https://www.nordicsemi.com/Software-and-tools/Software/nRF5-SDK

Impact

A malicious actor able to send malicious Advertising Bearer packets to the affected device could potentially leverage this vulnerability to execute arbitrary code. This may lead to the total compromise of the affected device.

Technical Details

The length of the Advertising Bearer data packets is not properly checked. The vulnerable code path is as follows:

1. When an AD listener is dispatched (it has been previously registered at line 1062 in 'prov_bearer_adv.c'), there is just one action performed to sanitize the length, at line 115 ( > 0 ).

File: nRF5-SDK-for-Mesh-master/mesh/prov/src/prov_bearer_adv.c
1059: AD_LISTENER(m_pb_adv_ad_listener) = {
1060:     .ad_type = AD_TYPE_PB_ADV,
1061:     .adv_packet_type = BLE_PACKET_TYPE_ADV_NONCONN_IND,
1062:     .handler = packet_in,
1063: };


File: nRF5-SDK-for-Mesh-master/mesh/bearer/src/ad_listener.c
108: void ad_listener_process(ble_packet_type_t adv_type, const uint8_t * p_payload, uint32_t payload_length, const nrf_mesh_rx_metadata_t * p_metadata)
109: {
110: #ifdef AD_LISTENER_DEBUG_MODE
111:     uint8_t frame_hash = hash_count(p_payload, payload_length);
112: #endif
113: 
114:     for (ble_ad_data_t * p_ad_data = (ble_ad_data_t *)p_payload;
115:          (uint8_t *)p_ad_data < &p_payload[payload_length] && p_ad_data->length > 0;
116:          p_ad_data = packet_ad_type_get_next((ble_ad_data_t *)p_ad_data))
117:     {
118:         NRF_MESH_SECTION_FOR_EACH(ad_listeners, const ad_listener_t, p_listener)
119:         {
120:             if ((adv_type != p_listener->adv_packet_type && (uint8_t) p_listener->adv_packet_type != ADL_WILDCARD_ADV_TYPE) ||
121:                 (p_listener->ad_type != p_ad_data->type && p_listener->ad_type != ADL_WILDCARD_AD_TYPE))
122:             {
123:                 continue;
124:             }
125: 
126:             p_listener->handler(p_ad_data->data, p_ad_data->length - BLE_AD_DATA_OVERHEAD, p_metadata);
127: 
128: #ifdef AD_LISTENER_DEBUG_MODE
129:             NRF_MESH_ASSERT(hash_count(p_payload, payload_length) == frame_hash);
130: #endif
131:          }
132:      }
133: }


2. The handler for Advertising Bearer packets does not perform any additional validation on the received ‘length’, which is then propagated to specific packet handling functions at lines 1035, 1047, and 1051.

File: nRF5-SDK-for-Mesh-master/mesh/prov/src/prov_bearer_adv.c
1020: 
1021: static void packet_in(const uint8_t * p_data, uint32_t data_len, const nrf_mesh_rx_metadata_t * p_metadata)
1022: {
1023:     NRF_MESH_ASSERT(p_data != NULL);
1024:     NRF_MESH_ASSERT(p_metadata != NULL);
1025: 
1026:     pb_adv_pdu_t * p_packet = (pb_adv_pdu_t *) p_data;
1027: 
1028:     nrf_mesh_prov_bearer_adv_t * p_pb_adv = get_bearer_from_link_id(BE2LE32(p_packet->link_id));
1029: 
1030:     switch (p_packet->pdu.control)
1031:     {
1032:         case PB_ADV_PACKET_C_TRANSACTION_START:
1033:             if (p_pb_adv != NULL && p_pb_adv->state == PROV_BEARER_ADV_STATE_LINK_OPEN)
1034:             {
1035:                 handle_transaction_start_packet(p_pb_adv, p_packet, data_len);
1036:             }
1037:             break;
1038:         case PB_ADV_PACKET_C_TRANSACTION_ACK:
1039:             if (p_pb_adv != NULL && p_pb_adv->state == PROV_BEARER_ADV_STATE_LINK_OPEN)
1040:             {
1041:                 handle_transaction_ack_packet(p_pb_adv, p_packet);
1042:             }
1043:             break;
1044:         case PB_ADV_PACKET_C_TRANSACTION_CONTINUE:
1045:             if (p_pb_adv != NULL && p_pb_adv->state == PROV_BEARER_ADV_STATE_LINK_OPEN)
1046:             {
1047:                 handle_transaction_continuation_packet(p_pb_adv, p_packet, data_len);
1048:             }
1049:             break;
1050:         case PB_ADV_PACKET_C_CONTROL:
1051:             handle_control_packet(p_pb_adv, p_packet, data_len);
1052:             break;
1053:         default:
1054:             /* Ignore */
1055:             break;
1056:     }
1057: }

3. ‘handle_transaction_start_packet’ does not perform any validation on ‘length’ before reaching lines 706 (underflow) and 707 (buffer overflow).

File: nRF5-SDK-for-Mesh-master/mesh/prov/src/prov_bearer_adv.c
684: /**** Packet handling ****/
685: 
686: static void handle_transaction_start_packet(nrf_mesh_prov_bearer_adv_t * p_pb_adv, pb_adv_pdu_t * p_packet, uint32_t length)
687: {
688:     if (p_pb_adv->buffer.state == PROV_BEARER_ADV_BUF_STATE_UNUSED)
689:     {
690:         if (p_packet->transaction_number == p_pb_adv->transaction_in)
691:         {
692:             /* finished_segments, which is used in handle_transaction_continuation_packet
693:                 is 8-bits hence we are limited to receiving a maximum of 7 segments. */
694:             p_pb_adv->buffer.length = BE2LE16(p_packet->pdu.payload.transaction.start.total_length);
695:             uint8_t SegN = transaction_total_segment_count_get(p_pb_adv->buffer.length);
696:             if (SegN > (sizeof(p_pb_adv->buffer.finished_segments)*8) -1)
697:             {
698:                 prov_bearer_adv_link_close(&p_pb_adv->prov_bearer, NRF_MESH_PROV_LINK_CLOSE_REASON_ERROR);
699:             }
700:             else
701:             {
702:                 /* New message */
703:                 p_pb_adv->buffer.fcs = p_packet->pdu.payload.transaction.start.fcs;
704:                 p_pb_adv->buffer.state = PROV_BEARER_ADV_BUF_STATE_RX;
705:                 p_pb_adv->buffer.finished_segments = 1;
706:                 uint32_t payload_length = length - PB_ADV_PACKET_OVERHEAD - PROV_BEARER_PACKET_TRANSACTION_START_OVERHEAD;
707:                 memcpy(p_pb_adv->buffer.payload, p_packet->pdu.payload.transaction.start.payload, payload_length);


4. ‘handle_transaction_continuation_packet’ does not perform any validation on ‘length’ before reaching lines 759 (underflow) and 760 (buffer overflow).

File: nRF5-SDK-for-Mesh-master/mesh/prov/src/prov_bearer_adv.c
748: static void handle_transaction_continuation_packet(nrf_mesh_prov_bearer_adv_t * p_pb_adv, pb_adv_pdu_t * p_packet, uint32_t length)
749: {
750:     if (p_pb_adv->buffer.state == PROV_BEARER_ADV_BUF_STATE_RX)
751:     {
752:         if (p_packet->transaction_number == p_pb_adv->transaction_in)
753:         {
754:             /* Check segment bitfield, to figure out if we've received this packet before: */
755:             if (!((1 << p_packet->pdu.id) & p_pb_adv->buffer.finished_segments))
756:             {
757:                 /* First time we receive this packet. */
758:                 uint32_t data_index = (PROV_BEARER_ADV_PACKET_START_PAYLOAD_MAXLEN + (p_packet->pdu.id - 1) * PROV_BEARER_ADV_PACKET_CONTINUATION_PAYLOAD_MAXLEN);
759:                 uint32_t payload_length = length - PB_ADV_PACKET_OVERHEAD - PROV_BEARER_PACKET_TRANSACTION_CONTINUATION_OVERHEAD;
760:                 memcpy(&p_pb_adv->buffer.payload[data_index], p_packet->pdu.payload.transaction.continuation.payload, payload_length); 


Vulnerability

Buffer overflow in BLE Queued Writes

Affected Products

nRF5 SDK prior to version 16

Background 

“The nRF5 SDK is your first stop for building fully featured, reliable and secure applications with the nRF52 and nRF51 Series. It offers developers a wealth of varied modules and examples right across the spectrum including numerous Bluetooth Low Energy profiles, Device Firmware Upgrade (DFU), GATT serializer and driver support for all peripherals on all nRF5 Series devices. The nRF5 SDK will almost certainly have something for your needs in developing exciting yet robust wireless products” https://www.nordicsemi.com/Software-and-tools/Software/nRF5-SDK

Impact

A malicious actor able to send a initiate a Queued Write request to the affected device could potentially leverage this vulnerability to execute arbitrary code. This may lead to the total compromise of the affected device.

Technical Details

val_offset’ and ‘val_len’ are not properly sanitized. As a result, a malicious request containing a specific combination of both values (containing a large ‘val_len’ value) may lead to an integer overflow  at line 135, resulting in a value that can bypass the check at line 136. Finally, at line 138, the overflow occurs as ‘val_len’ is used in the memcpy operation.

File: nRF5_SDK_16.0.0_98a08e2/components/ble/nrf_ble_qwr/nrf_ble_qwr.c
101: 
102: ret_code_t nrf_ble_qwr_value_get(nrf_ble_qwr_t * p_qwr,
103:                                  uint16_t        attr_handle,
104:                                  uint8_t       * p_mem,
105:                                  uint16_t      * p_len)
106: {
107:     VERIFY_PARAM_NOT_NULL(p_qwr);
108:     VERIFY_PARAM_NOT_NULL(p_mem);
109:     VERIFY_PARAM_NOT_NULL(p_len);
110:     VERIFY_MODULE_INITIALIZED();
111: 
112:     uint16_t i          = 0;
113:     uint16_t handle     = BLE_GATT_HANDLE_INVALID;
114:     uint16_t val_len    = 0;
115:     uint16_t val_offset = 0;
116:     uint16_t cur_len    = 0;
117: 
118:     do
119:     {
120:         handle = uint16_decode(&(p_qwr->mem_buffer.p_mem[i]));
121: 
122:         if (handle == BLE_GATT_HANDLE_INVALID)
123:         {
124:             break;
125:         }
126: 
127:         i         += sizeof(uint16_t);
128:         val_offset = uint16_decode(&(p_qwr->mem_buffer.p_mem[i]));
129:         i         += sizeof(uint16_t);
130:         val_len    = uint16_decode(&(p_qwr->mem_buffer.p_mem[i]));
131:         i         += sizeof(uint16_t);
132: 
133:         if (handle == attr_handle)
134:         {
135:             cur_len = val_offset + val_len;
136:             if (cur_len <= *p_len)
137:             {
138:                 memcpy((p_mem + val_offset), &(p_qwr->mem_buffer.p_mem[i]), val_len);
139:             }
140:             else
141:             {
142:                 return NRF_ERROR_NO_MEM;
143:             }
144:         }
145: 
146:         i += val_len;
147:     }
148:     while (i < p_qwr->mem_buffer.len);
149: 
150:     *p_len = cur_len;
151:     return NRF_SUCCESS;
152: }
153: #endif

Texas Instruments  - www.ti.com

Vulnerability

Z-Stack - Multiple heap overflows in ZCL parsing functions

Affected Products

SIMPLELINK-CC13X2-26X2-SDK prior to version 4.40.00.44
Other Zigbee stacks based on the Z-Stack code are also affected (i.e. Telink)

Background 

“Z-Stack is a component of the SimpleLink™ CC13x2 / CC26x2 Software Development Kit. This component enables development of Zigbee® 3.0 specification based products. Z-Stack is TI’s complete solution for developing certified Zigbee 3.0 solution on CC13x2 and CC26x2 platforms. Z-Stack contained in this release is based on Zigbee 3.0 specification with the added benefit of running on top of TI-RTOS." https://www.ti.com/tool/Z-STACK


Impact

A malicious actor in possession of the NWK key (authenticated to the Zigbee Network) may send OTA malicious Zigbee ZCL packets to the victim’s node, which may result in the execution of arbitrary code in the context of the affected device.

Technical Details

Z-Stack parses the ZCL payloads by performing a two-steps flawed logic:

1. It calculates the total length of the attributes by iterating over the incoming ZCL frame payload without checking for integer overflows.
2. Dynamic memory is allocated according to this total length; however, attributes are individually copied to the parsing structure without sanitizing its length.

In the following example, the first step can be mapped to the ‘while’ loop at lines 3699-3718.

File: simplelink_cc13x2_26x2_sdk_4_20_00_35/source/ti/zstack/stack/zcl/zcl.c
3674: #ifdef ZCL_WRITE
3675: /*********************************************************************
3676:  * @fn      zclParseInWriteCmd
3677:  *
3678:  * @brief   Parse the "Profile" Write, Write Undivided and Write No
3679:  *          Response Commands
3680:  *
3681:  *      NOTE: THIS FUNCTION ALLOCATES THE RETURN BUFFER, SO THE CALLING
3682:  *            FUNCTION IS RESPONSIBLE TO FREE THE MEMORY.
3683:  *
3684:  * @param   pCmd - pointer to incoming data to parse
3685:  *
3686:  * @return  pointer to the parsed command structure
3687:  */
3688: void *zclParseInWriteCmd( zclParseCmd_t *pCmd )
3689: {
3690:   zclWriteCmd_t *writeCmd;
3691:   uint8_t *pBuf = pCmd->pData;
3692:   uint16_t attrDataLen;
3693:   uint8_t *dataPtr;
3694:   uint8_t numAttr = 0;
3695:   uint8_t hdrLen;
3696:   uint16_t dataLen = 0;
3697: 
3698:   // find out the number of attributes and the length of attribute data
3699:   while ( pBuf < ( pCmd->pData + pCmd->dataLen ) )
3700:   {
3701:     uint8_t dataType;
3702: 
3703:     numAttr++;
3704:     pBuf += 2; // move pass attribute id
3705: 
3706:     dataType = *pBuf++;
3707: 
3708:     attrDataLen = zclGetAttrDataLength( dataType, pBuf );
3709:     pBuf += attrDataLen; // move pass attribute data
3710: 
3711:     // add padding if needed
3712:     if ( PADDING_NEEDED( attrDataLen ) )
3713:     {
3714:       attrDataLen++;
3715:     }
3716: 
3717:     dataLen += attrDataLen;
3718:   }
3719: 

dataLen’ is intended to hold the total length of the attributes in the message. Each length is individually calculated in ‘zclGetAttrDataLength’. 

File:  simplelink_cc13x2_26x2_sdk_4_20_00_35/source/ti/zstack/stack/zcl/zcl.c
3258: uint16_t zclGetAttrDataLength( uint8_t dataType, uint8_t *pData )
3259: {
3260:   uint16_t dataLen = 0;
3261: 
3262:   if ( dataType == ZCL_DATATYPE_LONG_CHAR_STR || dataType == ZCL_DATATYPE_LONG_OCTET_STR )
3263:   {
3264:     dataLen = BUILD_UINT16( pData[0], pData[1] ) + 2; // long string length + 2 for length field
3265:   }
3266:   else if ( dataType == ZCL_DATATYPE_CHAR_STR || dataType == ZCL_DATATYPE_OCTET_STR )
3267:   {
3268:     dataLen = *pData + 1; // string length + 1 for length field
3269:   }
3270:   else
3271:   {
3272:     dataLen = zclGetDataTypeLength( dataType );
3273:   }
3274: 
3275:   return ( dataLen );


There is neither an overflow check for ‘dataLen’ nor a bounds check in the last iteration of the loop against ‘pBuf’ before adding the value to ‘dataLen’. According to this logic, an attacker can create a ZCL payload containing a specific combination of attributes that may force the ‘dataLen’ integer to be wrapped, holding a value lower than the actual total length.

Example:

#define MALICIOUS_PAYLOAD "\x00\x01\x43\x06\x00\x41\x41\x41\x41\x41\x41\x00\x02\x43\x10\x00\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\x00\x03\x43\xF0\xFF"    

Attribute 1: 
type: long octet string: length : 0x6 (+2)
Attribute 2: 
type: long octet string: length:  0x10 (+2)
Attribute 3:
type: long octet string: length: 0xFFF0 (+2)
Total length (truncated to 16-bit as in ‘dataLen’) = 0xC

Back in ‘zclParseInWriteCmd’, ‘dataLen’ is used to allocate the buffer where the attributes’ data will be copied. As there is no sanity check on the consistency of this memory allocation (line 3723 is allocating less memory than expected due to the ‘dataLen’ overflow), this operation may result in a memory corruption at line 3740 (memcpy) as ‘attrDataLen’ may be higher than the buffer allocated at ‘dataPtr’. 

3720:   // calculate the length of the response header
3721:   hdrLen = sizeof( zclWriteCmd_t ) + ( numAttr * sizeof( zclWriteRec_t ) );
3722: 
3723:   writeCmd = (zclWriteCmd_t *)zcl_mem_alloc( hdrLen + dataLen );
3724:   if ( writeCmd != NULL )
3725:   {
3726:     uint8_t i;
3727:     pBuf = pCmd->pData;
3728:     dataPtr = (uint8_t *)( (uint8_t *)writeCmd + hdrLen );
3729: 
3730:     writeCmd->numAttr = numAttr;
3731:     for ( i = 0; i < numAttr; i++ )
3732:     {
3733:       zclWriteRec_t *statusRec = &(writeCmd->attrList[i]);
3734: 
3735:       statusRec->attrID = BUILD_UINT16( pBuf[0], pBuf[1] );
3736:       pBuf += 2;
3737:       statusRec->dataType = *pBuf++;
3738: 
3739:       attrDataLen = zclGetAttrDataLength( statusRec->dataType, pBuf );
3740:       zcl_memcpy( dataPtr, pBuf, attrDataLen);
3741:       statusRec->attrData = dataPtr;
3742: 
3743:       pBuf += attrDataLen; // move pass attribute data
3744: 
3745:       // advance attribute data pointer
3746:       if ( PADDING_NEEDED( attrDataLen ) )
3747:       {
3748:         attrDataLen++;
3749:       }
3750: 
3751:       dataPtr += attrDataLen;
3752:     }
3753:   }
3754: 
3755:   return ( (void *)writeCmd );
3756: }

Most of the parsing routines in ‘zclCmdTable’ are affected.

File:  simplelink_cc13x2_26x2_sdk_4_20_00_35/source/ti/zstack/stack/zcl/zcl.c
270: static CONST zclCmdItems_t zclCmdTable[] 

Additionally, there is an integer overflow in the way ‘zclLL_ProcessInCmd_GetGrpIDsRsp’ (also ‘zclLL_ProcessInCmd_GetEPListRsp’ and ‘zclLL_ProcessInCmd_DeviceInfoRsp’) parses the incoming message, as ‘cnt’ is not properly sanitized before allocating the buffer (line 1214). As a result, ‘rspLen’ wraps around, holding a value which is actually lower than ‘cnt’. Later on ‘cnt’ is used as the test expression in the ‘for’ loop (lines 1227) so it will end up triggering memory corruption at line 1231.

File:  simplelink_cc13x2_26x2_sdk_4_20_00_35/source/ti/zstack/stack/zcl/zcl_ll.c
1205: static ZStatus_t zclLL_ProcessInCmd_GetGrpIDsRsp( zclIncoming_t *pInMsg,
1206:                                                   zclLL_AppCallbacks_t *pCBs )
1207: {
1208:   ZStatus_t status = ZFailure;
1209: 
1210:   if ( pCBs->pfnGetGrpIDsRsp )
1211:   {
1212:     zclLLGetGrpIDsRsp_t *pRsp;
1213:     uint8_t cnt = pInMsg->pData[ZLL_CMDLEN_GET_GRP_IDS_RSP-1];
1214:     uint8_t rspLen = sizeof( zclLLGetGrpIDsRsp_t ) + ( cnt * sizeof( grpInfoRec_t ) );
1215: 
1216:     pRsp = (zclLLGetGrpIDsRsp_t *)zcl_mem_alloc( rspLen );
1217:     if ( pRsp )
1218:     {
1219:       uint8_t *pBuf = pInMsg->pData;
1220:       uint8_t i;
1221: 
1222:       pRsp->total = *pBuf++;
1223:       pRsp->startIndex = *pBuf++;
1224:       pRsp->cnt = *pBuf++;
1225:       pRsp->grpInfoRec = (grpInfoRec_t *)(pRsp+1);
1226: 
1227:       for ( i = 0; i < cnt; i++ )
1228:       {
1229:         grpInfoRec_t *pRec = &(pRsp->grpInfoRec[i]);
1230: 
1231:         pRec->grpID = BUILD_UINT16( pBuf[0], pBuf[1] );
1232:         pBuf += 2;
1233: 
1234:         pRec->grpType = *pBuf++;
1235:       }
1236: 
1237:       status = pCBs->pfnGetGrpIDsRsp( &(pInMsg->msg->srcAddr), pRsp );
1238: 
1239:       zcl_mem_free( pRsp );
1240:     }
1241:   }
1242: 
1243:   return ( status );
1244: }

Vulnerability

EasyLink – memory corruption in ‘rxDoneCallback

Affected Products

SIMPLELINK-CC13X2-26X2-SDK prior to version 4.40.00.44

Background 

“The EasyLink API should be used in application code. The EasyLink API is intended to abstract the RF Driver in order to give a simple API for customers to use as is or extend to suit their application use cases." http://software-dl.ti.com/simplelink/esd/simplelink_cc13x0_sdk/4.10.01.01/exports/docs/proprietary-rf/proprietary-rf-users-guide/easylink/easylink-api-reference.html


Impact

A remote attacker may send a specially crafted OTA EasyLink packet to the victim’s device, which may result in either a DoS condition or the execution of arbitrary code.

Technical Details

EasyLink does not properly validate the length of the received packet. At line 533, the attacker-controlled buffer ('pDataEntry->data') is used to extract 1 byte that is then used to calculate the number of bytes that will be copied (at line 545) to the static buffer pointed to by 'rxBuffer.payload' (fixed at 128 bytes).

File: simplelink_cc13x2_26x2_sdk_4_20_00_35/source/ti/easylink/EasyLink.c
503: //Callback for Async Rx complete
504: static void rxDoneCallback(RF_Handle h, RF_CmdHandle ch, RF_EventMask e)
505: {
506:     EasyLink_Status status = EasyLink_Status_Rx_Error;
507:     //create rxPacket as a static so that the large payload buffer it is not
508:     //allocated from the stack
509:     static EasyLink_RxPacket rxPacket;
510:     rfc_dataEntryGeneral_t *pDataEntry;
511:     pDataEntry = (rfc_dataEntryGeneral_t*) rxBuffer;
512: 
513:     if (e & RF_EventLastCmdDone)
514:     {
515:         //Release now so user callback can call EasyLink API's
516:         Semaphore_post(busyMutex);
517:         asyncCmdHndl = EASYLINK_RF_CMD_HANDLE_INVALID;
518: 
519:         //Check command status
520:         if (EasyLink_cmdPropRxAdv.status == PROP_DONE_OK)
521:         {
522:             //Check that data entry status indicates it is finished with
523:             if (pDataEntry->status != DATA_ENTRY_FINISHED)
524:             {
525:                 status = EasyLink_Status_Rx_Error;
526:             }
527:             else if ( (rxStatistics.nRxOk == 1) ||
528:                     //or filer disabled and ignore due to addr mistmatch
529:                     ((EasyLink_cmdPropRxAdv.pktConf.filterOp == 1) &&
530:                      (rxStatistics.nRxIgnored == 1)) )
531:             {
532:                 //copy length from pDataEntry
533:                 rxPacket.len = *(uint8_t*)(&pDataEntry->data) - addrSize;
534:                 if(useIeeeHeader)
535:                 {
536:                     hdrSize = EASYLINK_HDR_SIZE_NBYTES(EASYLINK_IEEE_HDR_NBITS);
537:                 }
538:                 else
539:                 {
540:                     hdrSize = EASYLINK_HDR_SIZE_NBYTES(EASYLINK_PROP_HDR_NBITS);
541:                 }
542:                 //copy address from packet payload (as it is not in hdr)
543:                 memcpy(&rxPacket.dstAddr, (&pDataEntry->data + hdrSize), addrSize);
544:                 //copy payload
545:                 memcpy(&rxPacket.payload, (&pDataEntry->data + hdrSize + addrSize), rxPacket.len);
546:                 rxPacket.rssi = rxStatistics.lastRssi;
547:                 rxPacket.absTime = rxStatistics.timeStamp;


Espressif Systems  - www.espressif.com

Vulnerability

Protocomm ‘transport_simple_ble_read’ information leak

Affected Products

ESP-IDF prior to v4.0.2 https://github.com/espressif/esp-idf

Background 

“Espressif provides basic hardware and software resources to help application developers realize their ideas using the ESP32 series hardware. The software development framework by Espressif is intended for development of Internet-of-Things (IoT) applications with Wi-Fi, Bluetooth, power management and several other system features.”  https://docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/

This bug was awarded a $2,229 bounty as part of the ESP32 bug bounty program (https://www.espressif.com/en/news/bug-bounty), which was donated by Espressif, on my behalf, to a Spanish animal rescue organization.

Impact

A remote attacker may send a specially crafted BLE packet to the victim’s device, which may result in either a DoS condition or an information leak.

Technical Details

When handling a BLE READ request from the client, ‘offset’ is not properly sanitized before copying data to the response (line 128). As a result, a malicious client may leak sensitive information from the device by setting an overly large ‘offset’ parameter in the READ request.

File: esp-idf-v4.0.1/components/protocomm/src/transports/protocomm_ble.c
107: static void transport_simple_ble_read(esp_gatts_cb_event_t event, esp_gatt_if_t gatts_if, esp_ble_gatts_cb_param_t *param)
108: {
109:     static const uint8_t *read_buf = NULL;
110:     static uint16_t read_len = 0;
111:     esp_gatt_status_t status = ESP_OK;
112: 
113:     ESP_LOGD(TAG, "Inside read w/ session - %d on param %d %d",
114:              param->read.conn_id, param->read.handle, read_len);
115:     if (!read_len && !param->read.offset) {
116:         ESP_LOGD(TAG, "Reading attr value first time");
117:         status = esp_ble_gatts_get_attr_value(param->read.handle, &read_len,  &read_buf);
118:     } else {
119:         ESP_LOGD(TAG, "Subsequent read request for attr value");
120:     }
121: 
122:     esp_gatt_rsp_t gatt_rsp = {0};
123:     gatt_rsp.attr_value.len = MIN(read_len, (protoble_internal->gatt_mtu - 1));
124:     gatt_rsp.attr_value.handle = param->read.handle;
125:     gatt_rsp.attr_value.offset = param->read.offset;
126:     gatt_rsp.attr_value.auth_req = ESP_GATT_AUTH_REQ_NONE;
127:     if (gatt_rsp.attr_value.len && read_buf) {
128:         memcpy(gatt_rsp.attr_value.value,
129:                 read_buf + param->read.offset,
130:                 gatt_rsp.attr_value.len);
131:     }
132:     read_len -= gatt_rsp.attr_value.len;
133:     esp_err_t err = esp_ble_gatts_send_response(gatts_if, param->read.conn_id,
134:                                                 param->read.trans_id, status, &gatt_rsp);
135:     if (err != ESP_OK) {
136:         ESP_LOGE(TAG, "Send response error in read");
137:     }
138: }

Qualcomm - www.qualcomm.com

Vulnerability

Api_ParseInfoElem’ improper handling of IEEE80211_ELEMID_RSN length may lead to a remote DoS

Affected Products

Qualcomm WIFI_QCA Middleware 

Background 

“The QCA4004 is an intelligent platform for the Internet of Things that contains a low-power Wi-Fi connectivity solution on a single chip. It includes a number of TCP/IP-based connectivity protocols along with SSL, allowing a low-cost, low-complexity system to obtain full-featured internet connectivity and reliable information exchange.”  https://www.qualcomm.com/products/qca4004 

Impact

A malicious actor able to send malicious 802.11 management frames to the affected device may potentially leverage this vulnerability to perform a DoS, as unmapped or invalid memory may be hit when parsing RSN IEs after a SCAN operation has been invoked.  

Technical Details

The vulnerable code path is as follows: 

1. When parsing the RSN IE, its length (‘ie_len’) is not properly sanitized against ‘len’ before calling ‘security_ie_parse’.

File: middleware/wifi_qca/common_src/api_interface/api_ioctl.c
695: A_STATUS
696: Api_ParseInfoElem(void *pCxt, WMI_BSS_INFO_HDR *bih, int32_t len, A_SCAN_SUMMARY *pSummary)
697: {
698:     uint8_t *buf;
699:     uint8_t *pie, *pieEnd, *pieTemp;
700:     uint8_t ie_result[2];
701:     uint16_t ie_len;

 SNIP

740:             case IEEE80211_ELEMID_RSN:
741:                 /*******************/
742:                 /* parse RSN IE    */
743:                 /*******************/
744:                 ie_len = pie[1];   /* init ie_len - sizeof wpa_oui */ 
745:                 pieTemp = &pie[2]; /* init pieTemp beyond wpa_oui */
746: 
747:                 if (A_LE_READ_2(pieTemp) != RSN_VERSION)
748:                 {
749:                     break;
750:                 }
751:                 ie_len -= 2;
752:                 pieTemp += 2;
753:                 ie_result[0] = ie_result[1] = 0;
754: 
755:                 security_ie_parse(pieTemp, ie_len, &ie_result[0], IEEE80211_ELEMID_RSN);

security_ie_parse’ does not perform any additional validation on the received ‘ie_len’. Then ‘ie_len’ is decremented at 637, 644, and 658 without performing any check for alignment or underflow, so a specific value in ‘ie_len’ makes the second condition in the ‘for’ loop (line 647) always true. 
As a result, this ‘for’ loop only depends on the first condition (where ‘cnt’ is also controlled by the potential attacker). This situation may force the ‘for’ loop to run beyond the original buffer’s bound, potentially hitting unmapped or invalid memory.

File: middleware/wifi_qca/common_src/api_interface/api_ioctl.c
629: static void security_ie_parse(uint8_t *pie, uint8_t ie_len, uint8_t *pResult, uint8_t ie_type)
630: {
631:     uint16_t cnt;
632:     uint16_t i;
633:     uint8_t wepKeyLen;
634:     /* skip mcast cipher */
635:     if (ie_len >= 4)
636:     {
637:         ie_len -= 4;
638:         pie += 4;
639:     }
640:     /* examine ucast cipher(s) */
641:     if (ie_len > 2)
642:     {
643:         cnt = A_LE_READ_2(pie);
644:         ie_len -= 2;
645:         pie += 2;
646: 
647:         for (i = 0; ((i < cnt) && (ie_len > 0)); i++)
648:         {
649:             if (ie_type == IEEE80211_ELEMID_RSN)
650:             {
651:                 pResult[0] |= rsn_cipher_parse(pie, &wepKeyLen);
652:             }
653:             else
654:             {
655:                 pResult[0] |= wpa_cipher_parse(pie, &wepKeyLen);
656:             }
657: 
658:             ie_len -= 4;
659:             pie += 4;
660:         }
661:     }