To begin at the beginning.
A password hash is a transformation of a password using what we call a "one-way" function. So, for example, ROT-13 (rotate by half the alphabet) would be a very, very bad password hash function and would give fairly recognizable results like "Cnffjbeq123!". The one-way property means it must be essentially impossible to construct the inverse function and recover the original, and functions like MD5 or SHA1 certainly meet that particular criterion. Iterated encryption functions like DES have also been used (for example LAN Manager hashes), but seem to have fallen out of favor a while ago. There are a load of technical, cryptographic details which neither you or I need to know, but essentially we'll end up guessing at loads of passwords and then performing the same one-way function and checking to see if we've arrived at the correct answer.
Now if we wanted to optimize our guessing procedure, we could create a giant lookup table of all possible passwords and all possible corresponding hashes. (As an aside, this might actually be slower than computing a really fast hash like MD5, because disk access is slower than doing a small number of things in CPU.) So at some point, a long time ago, a thing called a salt was added to the password hash on UNIX systems. A salt is a random number stored along with the hash, which goes into the hash computation as well. This gives two benefits:
1. If the salt is long enough and random enough, it is no longer feasible to build a lookup table - it just gets too big.
2. You have to attack each hash on its own; i.e. run a separate computation for each (salt, password candidate) pair.
So we arrive at something like sha512crypt, which is many iterations of SHA512 with a large random salt. It is fairly slow to compute each guess, which makes our job harder, for example:
```
$6$52450745$k5ka2p8bFuSmoVT1tzOyyuaREkkKBcCNqoDKzYiJL9RaE8yMnPgh2XzzF0NDrUhgrcLwg78xs1w5pJiypEdFX/
```
Now, you can attack this by putting it in a file and running John the Ripper (https://www.openwall.com/john/) like this:
```
D:\john\run> john.exe sha512crypt.txt
```
or with hashcat (https://hashcat.net/hashcat/) like this:
```
hashcat.exe -a0 -m 1800 D:\hashcrack\john\run\sha512crypt.txt D:\hashcrack\dict\Top95Thousand-probable.txt -r D:\hashcrack\rules\best1222.rule -O -w3
```
So the problem now becomes "just" choosing all the magic parameters, which are explicit with hashcat and implicit with John the Ripper. Because I don't like typing the same thing over and over again, I wrote a helper script called "hashcrack" (https://github.com/blacktraffic/hashcrack).
Briefly, hashcrack tries to guess the hash type and then runs some sensible attacks based on the speed of the hash. It of course does this by invoking hashcat, John the Ripper, a bunch of conversion/extraction scripts that come with John the Ripper, or occasionally impacket or SQLite. For example, taking a Responder db file (from the test suite) below:
```
D:\hashcrack>python hashcrack.py -i tests\Responder.db
Running under win32
Reading file: D:\hashcrack\tests\Responder.db
Couldn't parse first line of file - trying as latin1
[at this point it pulls out the NetNTLMv1 and v2 hashes into .tmp files - "SELECT fullhash FROM responder where type like 'NTLMv2%'" ]
RUN: hashcat.exe -a0 -m 5600 D:\hashcrack\tests\Responder.db.tmp D:\hashcrack\dict\\\Top95Thousand-probable.txt -r D:\hashcrack\rules\\\best22405.rule --loopback -O --bitmap-max=26 -w3 --session hc
...
IEUSER::IEWIN7:ed9cc20456c23e34:939e00dfea66e08e8b...03100000000000000000000000000:Passw0rd!
```
It is very much a glorified helper script, but it does at least tell you what command it decided to run in case you do need to tweak it further, and it does its best to unpack whatever it is to hashcat format.
## First, catch your hash
Obtaining and recognizing hashes and then turning them into the correct format for John the Ripper or hashcat is not always a trivial exercise. At a minimum, we have the following main types.
### Windows
* Stored passwords, such as NTLM - How Attackers Dump Active Directory Database Credentials - https://adsecurity.org/?p=2398
* NetLMv1/v2 - These can be captured using Responder or leaked from various applications, if you can get them to connect to your fake SMB share.
* Kerberoasted hashes - These can be captured using Invoke-Kerberoast, Rubeus, or similar to perform Kerberoasting against an AD domain.
* Domain cached credentials - These can be captured using password dumping tools on the local machine or by taking the three registry hives (security, system, and SAM) and unpacking them with impacket’s “secretsdump” (https://github.com/SecureAuthCorp/impacket)
### UNIX
For Linux, these would typically be in /etc/shadow and the format will be sha512crypt or bcrypt for modern operating systems, starting with things like "$2$" or "$6$". For AIX, you might find DEScrypt hashes in /etc/security/passwd, but the common theme is crypt formats (https://www.man7.org/linux/man-pages/man3/crypt.3.html).
### Database
Generally, as DBA, you can query the password hashes out of the databases such as PostgreSQL, MySQL, Oracle, MSSQL, etc.
### Documents
Password protected ZIP, Word, Excel, PDF files - Various scripts that come with John the Ripper which extract password hashes from these files.
### Web Applications
You might see these if you can connect directly to the backend database or if you can exploit SQL injection. This can be anything the developers thought was a good idea, from MD5 to bcrypt. Using the phpass format as an example, it might be something like `$P$984478476IagS59wHZvyQMArzfx58u.`
## Information implied by format
So where you get the hashes from gives you some clues about how to proceed. All Windows hashes should meet the domain's password policy - unless the admins have given special dispensation. Web or database passwords could be any length in theory. WPA2 passwords must be at least eight characters according to the standard.
For unsalted hashes, such as plain MD5, or NTLM, or SHA1, you can attack a thousand hashes as quickly as one, so you may as well load them all up. Otherwise, you may need to pick hashes of particular interest.
Again with the quicker hashes, you may as well just throw everything at it and see what comes out - you can always refine your approach based on what you crack. With slower ones, it's worth thinking about it or trying to find "cribs" (potential passwords, or parts of passwords) elsewhere on the system.
## Benefits and constraints of using a GPU
You can of course do this all on CPU if you want, but for most hash types, GPU is much quicker. For example, bcrypt can be quicker on CPU because it has been designed that way. I'm not going to talk about FPGAs here, but that is a good approach if you really need it (e.g. https://www.usenix.org/system/files/conference/woot14/woot14-malvoni.pdf)
Because a lot of people like playing games with nice graphics, fairly cheap parallel processors are readily available. But there are certain limitations on how we can get them to process our workload. For the two most popular password cracking programs, hashcat and John the Ripper, this means expressing the search space in terms of a dictionary and a set of rules (or transformations) or a mask, which is essentially a set of possible character values to search. For the faster hashes, like NTLM and SHA1, you will need to supply a list of rules and a big enough wordlist in order to keep the GPU busy - I gather this is something to do with how quickly you can send data over the bus, but whatever the reason, using a short wordlist and no rules will not make best use of the GPU (hashcat will warn you about this).
Meanwhile, mask attacks are quick if you give a sufficiently large search space.
Hashcat expresses masks in a slightly odd way, so we have ?a, representing all ASCII values, ?d representing digits, ?l and ?u being lower- and upper-case respectively, and ?s meaning special characters. You may also see ?b which means any byte.
Thus a search using `?u?l?l?l?l?l?l?l?s` will eventually match Password!, but will also catch everything from `Aaaaaaaa!` through to `Zzzzzzzzz}`
In the same way, the rules language describes how to transform a base word into the final guess, in terms of deleting or swapping characters or adding them to the start or end of the word. Like a CPU, it only implements a finite set of common operations, and we have to work with what's there. If we assume "password" is the base:
`$1$2$3` means append 123, so we get password123
`^3^2^1` means prepend 123 (because we prepend 3, then prepend 2 to "3password", etc.)
so0 means swap ALL 'o' characters to '0' (zero), so it becomes "passw0rd"
One of the limitations of hashcat is that we can't just swap some of the o characters to 0s, it's all or none. (Research shows that humans tend to add complexity to the end of passwords, so suffixes are more likely to be helpful than prefixes.)
## Dictionary and Rules
A "normal" run of hashcat might look like this, where we have a dictionary of common passwords in decreasing order of frequency and a rules file. I have given the hashcrack command, and the "RUN: " is exactly how it decides to run hashcat.
The file I'm running it against is the last couple of million of the Have I Been Pwned NTLM dataset (https://www.troyhunt.com/pwned-passwords-now-as-ntlm-hashes/). I've written elsewhere about cracking the first 500 million, but for now, it's just some convenient test data.
```
./hashcrack.py -i hashes.txt -t ntlm -d /root/dict/Top2Billion_probable.txt -r rules/InsidePro-PasswordsPro.rule
RUN: ./hashcat64.bin -a0 -m 1000 hashes.txt /root/dict/Top2Billion_probable.txt -r /root/hashcrack/rules/InsidePro-PasswordsPro.rule --loopback -O -w4
```
If we look at the debug file produced by doing --debug-mode=4 --debug-file=dbg.log, we can see the dictionary word on the left, the rules applied in the middle, and the thing we found on the right.
```
stellarfinance:u $1 $1:STELLARFINANCE11 (upper case all, append '1' append '1')
sstechinc:i11:s1stechinc (insert '1' at position 1)
ssgalactic:i2.:ss.galactic (insert '.' at position 2)
```
We can give a directory as the dictionary argument, which means it tries every file in the directory one after the other. And we can give two rule parameters, which means it combines each rule in the first with every rule in the second. This means it gets big fast, but this can be useful if you have orthogonal rulesets (e.g. one for dealing with passphrases and one for endings like "123", "123!" etc.). Combining insertions.rule and a normal ruleset got me `t++19882008` with the t coming from the normal rules, and the ++ coming from insertions.rule.
```
19882008:^t i1+ i1+:t++19882008
```
## Masks and Maskfiles
You can specify a search space using masks. For example, the following will try all ASCII passwords of length 1-5:
```
./hashcrack.py -i hashes.txt -t ntlm --mask ?a?a?a?a?a?a
RUN: ./hashcat64.bin -a3 -m 1000 hashes.txt ?a?a?a?a?a?a -i -O -w4
```
You can also specify this in a file, where you can define the character classes ?1, ?2, ?3, ?4 and then use them in the final field of the line. The following will search through default passwords and variants:
```
Pp,@aA4,s5$,o0,?1?2?3?3w?4rd
sS5,oO0,?1upp?2rt
Ll,3eE,1iI,?1?2tm?2?3n
Ll,3eE,1iI,?a,?1?2tm?2?3n?4
Cc,3eE,aA4@,?1h?3ng?2m?2
Cc,3eE,aA4@,?1h?3ng?2it
...
```
And then it will try each one, with the custom charsets given:
```
$ ./hashcrack.py -i hashes.txt -t ntlm --mask maskfiles/defaultpass.hcmask
RUN: ./hashcat64.bin -a3 -m 1000 hashes.txt /root/hashcrack/maskfiles/defaultpass.hcmask -O -w4 --session hc
```
This guesses things like P@ssword, p@ssword, Password, password, ... for example.
## Combining lists
But what if I want to make like the Large Hadron Collider and just smash things together and see what happens? Well, there's mode -a1 in hashcat, combinator and combinator3 from hashcat-utils (https://github.com/hashcat/hashcat-utils), and the PRINCE preprocessor (https://github.com/hashcat/princeprocessor).
Below, the file last1-5.txt contains commonly observed suffixes from my data, but you can also run two password dictionaries against each other. Found passwords will be the concatenation of one entry from the first dictionary and one from the second. In this case, "samtron" is in Top95Thousand-probable.txt and "_r89" is in last1-5.txt.
```
$ ./hashcrack.py -i hashes.txt -d /root/dict/Top95Thousand-probable.txt -e /root/dict/last1-5.txt -t ntlm
RUN: ./hashcat64.bin -a1 -m 1000 hashes.txt /root/dict/Top95Thousand-probable.txt /root/dict/last1-5.txt -O -w4 --session hc
8dd1b62216b2703737ad28b59b1bad1d:samtron_r89
8bdc261caed3145d2a9f4f9de8ab31e2:greentreejkvl
9e3e0d23ddb9be5a9498b4c9b4366336:ruby@bds
244e2d25960ca0b8747efd0a1ab3c2f6:shashank.n87
8c9df56a1769a1d8ed3a43989d25cd6f:conway7o4s
b35f0b7e18945d4f1e79b6338a51d519:GarfieldH170
1ff1fdb36d4b3c79cdc5a6d4d01230cb:canuckh2oz
```
## Phrases
I extracted a bunch of short phrases from Google's n-gram corpus (http://storage.googleapis.com/books/ngrams/books/datasetsv3.html) to play with, but there are other publicly available lists of passphrases. You will need slightly different rules, because you might want to capitalize things after spaces, swap space to underscore, or such. Try https://github.com/initstring/passphrase-wordlist/tree/master/hashcat-rules
```
$ python hashcrack.py -i hashes.txt -t ntlm -d c:\Users\jamie\Desktop\words\1-4grams.txt -r rules\passphrases.rule
RUN: hashcat.exe -a0 -m 1000 D:\hashcrack\hashes.txt c:\Users\jamie\Desktop\words\1-4grams.txt -r D:\hashcrack\rules\passphrases.rule --loopback -O --bitmap-max=26 -w3 --session hc
25d9bebab099e8ef6e0ee0c496a2c917:ambitiouspeople
f08eacad22f93cf1ad34ad34aaa119e6:industrialwelding
a820d36ec57c4ef2c4426c242f50248e:simplysatisfied
5dcbdc8902e458f07bc06eebd8273a6e:WarmFlame
241d1c802fd187f35b3e2b9789b81b6e:JohnBelfield
76cdc9d7543af6effda11f8ccef75669:EyeCannot
17c9ea43a66e21f14a1ed106d06755d4:perhaps_forget
4c2ba3c55cbdcaf1bc83f94777a3b6dd:imaginative_mind
```
## Markov Models
As with all machine learning/statistical approaches, this works best if your training data is representative of the stuff you're trying to crack. Which you don't know, because you haven't cracked it yet, but it's a reasonable guess that it might follow the same form as fragments of English text. Travco wrote a nice quick program to do this: https://github.com/travco/rephraser.
In this case, 'corpus.txt' should be a lot of English sentences (or from whatever language you think might be in use).
```
$ python3 ../rephraser/rephraser.py --corpus corpus.txt --model ./wiki1M_model.keyvi | ./hashcat64.bin -a0 -m 1000 hashes.txt -r /root/hashcrack/rules/passphrase.rule -r ../rules/best1222.rule --loopback -O -w4
57871172a2cd8ada7c7794fcb4a1820b:ToStayAtHome07
def7c2415d3d240ff0c4821858b26402:ToBeReset
affbfba14f0b54b5f2ff5db1873d3401:AnABootCamp
33398e1f2542b9b973b5b9a726caf347:TOUNTHEBIG
50d1e344e7d32c5c354ab8a97119c8c5:Abbyishard
7efe3e0e20145e33a35d83e07d69e7bc:Andistillmay
```
## Leetification
This is the term I use for swapping things like o->0, s->5, e->3 and so on, as people sometimes do in their passwords, for example "Pa55w0rd". You can do this in rules if you want to swap all occurrences, but I suspected there might be a lot of people who did not consistently do this. Hashcrack can do this using the -3 flag:
```
$ ./hashcrack.py -i hashes.txt -3 -t ntlm
```
which basically does this:
```
$ python3 ../scripts/leetify.py /root/dict/Top2Billion_probable.txt | ./hashcat64.bin -a0 -m 1000 hashes.txt -r /root/hashcrack/rules/best22405.rule --loopback -O -w4 --session hc
```
and that script does a recursive leetification process .
```
$ echo foo > foo
$ python3 scripts/leetify.py foo
foo
fo0
f0o
f00
```
Which gets us things like the following, where some, but not all, of the characters have been leetified.
```
46d31b126dec3444a31ebaa8c5aae69e:S3LECt1ON
```
## Measuring rule quality
You can use debug output to see how often a rule is firing by adding --debug-mode=4 --debug-file=foo to the hashcat command line.
You can then use this data to count how many times each rule fires and plot a nice graph.
Obviously, this doesn't account for the fact that two different rules may carry out the same transformation. ^"$" and $"^" both enclosed the candidate password in quotes, but only the first one in your list will get counted with this approach.
Below is the output of this process using a bunch of random rules tested against the HIBP NTLM corpus, with the "best" rules being toward the left of the graph.
## Measuring password quality
Another thing we can do is visualize how quickly passwords got cracked with any particular method, using hashcat's status output and counting the cumulative number of passwords at each "tick."
```
hashcat64.exe -a0 -m 1000 C:\Users\jamie\Desktop\hashcrack\defcon2010-ntlm.txt C:\Users\jamie\Desktop\hashcrack\dict\\\Top32Million-probable.txt -r C:\Users\jamie\Desktop\hashcrack\rules\\\l33tpasspro.rule --loopback -O --bitmap-max=26 -w3 --session hc --status >> graphme
d:\hashcrack>python graph-by-quality.py hashcat-5.1.0\graphme
```
Below we can see we cracked 20-30% of the passwords relatively quickly, with the rest remaining fairly stubbornly uncracked as time goes on:
This can be used for evaluating the overall strength of the passwords – obviously, a steep initial curve means that lots of things cracked very quickly, and the defenders want to make the graph as flat as possible.
## Representation and non-ASCII
If you're trying to crack non-ASCII characters, you need to know how the underlying system deals with encoding and storage of the non-ASCII characters. For example, if you try to crack the NTLM hash for "Gü" :
```
python hashcrack.py -i tst.txt --mask ?b?b?b?b?b?b?b?b -t 900 (MD4)
f343fdedf1447a61694603de4e0d132e:$HEX[4700fc00]
python hashcrack.py -i tst.txt --mask ?b?b?b?b?b?b?b?b -t 1000 (NTLM)
f343fdedf1447a61694603de4e0d132e:$HEX[47fc]
```
Which is ''ü'' according to CP-1252, as documented here: https://en.wikipedia.org/wiki/Windows-1252
```
$ echo -n Motörhead > motorhead.txt
$ echo -n Motörhead | sha1sum.exe | cut -f 1 -d' ' > mhash.txt
$ od -t x1 motorhead.txt
0000000 4d 6f 74 c3 b6 72 68 65 61 64
```
But if we crack it like this, we get something:
```
python hashcrack.py -i mhash.txt -d motorhead.txt
```
So it appears my DOS window doesn't like UTF-8:
```
ac05c7c87e3514e7f36a482c65c419e5fe58c6cb:Mot├Ârhead
```
but it does save it properly in the potfile when viewed with a sensible editor:
```
ac05c7c87e3514e7f36a482c65c419e5fe58c6cb:Motörhead
```
For more info, see:
* https://www.nixu.com/blog/cracking-non-english-character-passwords-using-hashcat
## Slower Hashes
Let's take a look at three different hash types with fairly different designs - NTLM, WordPress (phpass), and bcrypt. Firstly, NTLM is not salted, so we can attack many hashes in parallel for free, which is a truly awful thing for defenders. The other two are salted, meaning each hash needs to be attacked separately, making cracking millions of hashes millions of times harder.
The other main difference is relative speed. Using a 1080 Ti as a reference platform, I can get a speed of 50 billion guesses per second against NTLM, around 8 million against WordPress, and about 20 thousand against bcrypt. However, bcrypt comes with a cost parameter, which can be adjusted to make the hash slower to compute, and the hashcat example is unusually generous in using a cost factor of five. Most modern implementations would use 10 or 12, meaning we can only make about one thousand guesses per second.
The purpose of this cost factor is that defenders want to make the computation take as long as possible without actually annoying the user – so, ideally, it would be a couple of hundred milliseconds or so. As computers get faster, we increase the cost factor and hopefully make things prohibitively expensive for attackers. That is, providing we can stop our users from picking passwords like "Summer2020".
If you are attacking a hash like bcrypt, you need to start with the most likely passwords and possibly only attack the hashes you think are most useful; remember ,computing one bcrypt guess is 50 million times slower than computing one NTLM guess. Therefore attacking 1000 bcrypt passwords is 50 billion times slower than attacking 1000 NTLM passwords.
Just so you can see what the various hashes look like, here's a demo of cracking the "hashcat" example WordPress (phpass) hash:
```
D:\hashcrack>python hashcrack.py --hash "$P$984478476IagS59wHZvyQMArzfx58u." -d hashcat.txt
Running under win32
Reading file: C:\Users\jamie\AppData\Local\Temp\\zc2lxx1b.hash.tmp
Autodetected phpass
Cracking hash type 400
RUN: hashcat.exe -a0 -m 400 C:\Users\jamie\AppData\Local\Temp\\zc2lxx1b.hash.tmp D:\hashcrack\hashcat.txt --loopback -O --bitmap-max=26 -w3 --session hc
...
$P$984478476IagS59wHZvyQMArzfx58u.:hashcat
```
Cracking the "hashcat" example bcrypt hash:
```
D:\hashcrack>python hashcrack.py --hash "$2a$05$LhayLxezLhK1LhWvKxCyLOj0j1u.Kj0jZ0pEmm134uzrQlFvQJLF6" -d hashcat.txt
Reading file: C:\Users\jamie\AppData\Local\Temp\\_kwbmbla.hash.tmp
Autodetected bcrypt
Cracking hash type 3200
Selected rules: best1222.rule, dict Top95Thousand-probable.txt, inc 0
Using dict and rules
CWD: D:\hashcrack\hashcat-5.1.0
RUN: hashcat.exe -a0 -m 3200 C:\Users\jamie\AppData\Local\Temp\\_kwbmbla.hash.tmp D:\hashcrack\hashcat.txt -r D:\hashcrack\rules\\\best1222.rule --loopback -O --bitmap-max=26 -w3 --session hc
$2a$05$LhayLxezLhK1LhWvKxCyLOj0j1u.Kj0jZ0pEmm134uzrQlFvQJLF6:hashcat
```
And then, an example of a more realistic bcrypt hash with significantly slower speeds:
```
Hash.Target......: $2y$10$WRTjKNVhj..Le.aoy1EZTufJP.5Q1V319sDL7v3cvgvd...sYKrQK
Speed.#1.........: 1021 H/s (327.25ms) @ Accel:16 Loops:32 Thr:12 Vec:1
```
## Go Forth and Crunch
That is a bit of a whirlwind tour of hashcat and how to use it, together with various other add-ons. For further advice I'd say:
* NVIDIA 1080/2080 gives a reasonable bang for your buck right now.
* Learn the main modes and how you can use them to build the password guesses you're thinking of.
* Think of various hypotheses for what passwords might look like and test them out.
* Make use of cribs such as company name, location, current season, current year, and things like that.
* Cool your rig adequately.
* Have fun.
## References
* Example hashcat hashes for each type: https://hashcat.net/wiki/doku.php?id=example_hashes
* Description of hashcat rules: https://hashcat.net/wiki/doku.php?id=rule_based_attack
* Hashcat: https://hashcat.net/hashcat/
* Hashcat help forum: https://hashcat.net/forum/
* John the Ripper: https://www.openwall.com/john/
* Hashcrack, my helper script: https://github.com/blacktraffic/hashcrack
which has an accidental name clash with the amazing book:
* Hashcrack, 3rd edition: https://www.amazon.co.uk/Hash-Crack-Password-Cracking-Manual/dp/1793458618
* Terahash, makers of shiny hardware: https://terahash.com/#appliances
TL; DR: Modern mobile OSes encrypt data by default, nevertheless, the defense-in-depth paradigm dictates that developers must encrypt sensitive data regardless of the protections offered by the underlying OS. This is yet another case study of data stored unencrypted, and most importantly, a reminder to developers not to leave their apps’ data unencrypted. In this case study, physical access to an unlocked phone, trusted computer or unencrypted backups of an iPhone is required to exfiltrate the data, which in turn does not include authentication data and cannot be used to control or track the vehicle in any way.
Introduction
“While modern mobile operating systems allow encrypting mobile devices, which users can use to protect themselves, it is ultimately the developer’s responsibility to make sure that their software is thoroughly safeguarded. To this end, developers should provide reliable mobile app data encryption that leaves no user data without protection.” — Dmitriy Smetana.[1]
Physical theft is not the only attack vector that threatens the data stored on a mobile phone. Imagine, for instance, a shared computer at home or in the office where a phone has been authenticated and trusted. When the phone is connected and authenticated, a malicious actor with access to this computer would be able to extract its apps’ data. The likelihood is low in the real world, though.
One day during the pandemic I was wondering if my car’s mobile app was encrypting the data or not. So, I decided to analyze it:
The following navigation-equipped cars were used for this analysis:
· X5 xDrive40i (2020)
· 120i (2020)
· X1 sDrive20iA X Line (2018)
BMW Connected is a mobile app compatible with 2014 and newer navigation-equipped vehicles (BMW ConnectedDrive[2]). It allows the user to monitor and remotely control some features such as:
· Lock/Unlock
· Location tracking
· Lights
· Horn
· Climate control
· Destinations (navigation system)
· Doors and windows status
· Fuel level
· Mileage
BMW Connected App Demonstration
The latest version of the app available on Apple Store was:
· BMW Connected for iOS v10.6.2.1807
I installed the app on two iPhones, neither of which were jailbroken:
· iPhone XS Max (iOS 13.4.1)
· iPhone 8 Plus (iOS 13.3.1)
Then, I found unencrypted data using the following basic tools:
You’ll see how easy it was to extract and decode the stored data.
Data Stored Unencrypted
The cars were added and authenticated within the app:
For both installations, the same behavior was observed: data was stored base64-encoded but unencrypted in .plist files. I used the plistutil command to decode such files, then, I piped the output through other command-line tools to strip empty lines and spaces.
Once I had the base64 strings, I decoded them with the base64 tool and finally, formatted and colorized the JSON output with the jq tool:
· Favorite locations (FavoritesCollection.plist)
· Directions sent to the vehicle (TripStorage.plist)
· Status of doors and windows (VehicleHub.Suite.plist)
· Mileage and remaining fuel (VehicleHub.Suite.plist)
· History of remote actions (VehicleHub.Suite.plist)
· Car color and picture (VehicleHub.Suite.plist)
· Next maintenance due dates (VehicleHub.Suite.plist)
· VIN and model
· Owner’s first and last name and last logged date (group.de.bmw.connected.plist)
Weak Password and PIN Policies
On registration, I noticed the password policy only required eight characters from at least two of the following three charsets:
· Letters (abc = ABC)
· Numbers
· Special characters
Such a policy might seem good enough; however, making the password case-insensitive significantly decreases its complexity. During testing, it was possible to login with any of the following passwords:
· Qwerty12
· QWERTY12
· QwErTy12
Also, the app permits users to select an easy-to-guess PIN, which is used to unlock the car or access the app if the smartphone does not implement FaceID, TouchID, or a passcode. The existing PIN code policy allows users to choose weak combinations, such as consecutive numbers (e.g. "1234") or the same number (e.g. "0000").
However, the most commonly used feature for authentication is either FaceID or TouchID.
Recommendations
The takeaways are very simple:
· For end-users:
o Only authenticate your phone on trusted computers.
o Avoid connecting and trusting your phone to shared workstations.
o Use complex passwords and PIN codes.
· For developers:
o Do not put your complete trust in the operating system.
o Encrypt sensitive data on your own.
Responsible Disclosure
One of IOActive’s missions is to act responsibly when it comes to vulnerability disclosure.
The following is the communication timeline with BMW Group:
· May 2, 2020: IOActive’s assessment of the BMW Connected App started.
· May 15, 2020: IOActive sent a vulnerabilities report to BMW Group following its guidelines.[4]
· May 20, 2020: BMW Group replied. They internally sent the report to the responsible departments.
· May 26, 2020: IOActive asked BMW Group for any updates or questions and let them know about our intention to write a blog post.
· May 28, 2020: BMW Group said to wait for a new app release prior publishing a blog post and asked for details to include the researcher in BMW Group's Hall of Fame site.
· Aug 07, 2020: BMW Group and IOActive had a call to discuss the technical information that would be published in a blog post.
· Aug 13, 2020: BMW Group sent an email explaining how they would fix the issues.
· Aug 19, 2020: IOActive sent a draft of the blog post to be published to BMW Group for review.
· Aug 24, 2020: BMW Group suggested some modifications.
· Sep 08, 2020: IOActive sent the second version of the draft of the blog post to be published to BMW Group for review.
· Sep 11, 2020: BMW Group agreed with the final content.
· Sep 22, 2020: IOActive published this blog post.
The Fix
BMW Group’s security statement:
“Thanks to the notification of Alejandro Hernandez at IOActive via our responsible disclosure channel, we were able to change the way the app’s data cache is handled. Our app development team added an encryption step that makes use of the secure enclave of Apple devices, at which we generate a key that is used for storing the favorites and vehicle meta data that Alejandro was able to extract. We appreciate Alejandro for sharing his research with us and would like to thank him for reaching out to us.”
Acknowledgments
I would like to give special thanks to my friend Juan José Romo, who lent me two brand new cars for testing.
Also, I’d like to thank to Richard Wimmer and Hendrik Schweppe of BMW Group for their quick response and cooperation in fixing the issues presented here.
There are not many occasions when you can build a chain of exploits and not harm a single buffer, so it is interesting when you find yourself in one of those rare situations. As the title clearly indicates, this blog post will comprehensively describe the entire process that would allow a malicious actor to root Sierra
Wireless AirLink® devices.
Let’s do this!
A couple of years ago the guys at Talos did a great job and killed many bugs in AirLink devices. As usual, before buying a device I always analyze the firmware first in order to get an overall impression of what I may face. Sierra Wireless has a nice website where it is possible to download firmware, so I chose my target (the RV50) and proceeded.
Analyzing the Firmware
After unpacking the firmware, we are presented with the following list of files:
The first notable thing is that well-known image formats, such as ‘rootfs.sqfs.uboot’, ‘uImage.recovery’ or ‘zImage’ are detected as ‘data’ so there should be something going on. As expected, a quick look at those files shows that they are definitely encrypted. Hopefully the only ‘clean’ binary that is present in the firmware (‘swinstaller’) will help us to figure out the scheme.
As you can see, it seems that, as we initially guessed, the important files are all encrypted. So, the next step is to spend some time digging through a C++ binary to understand the encryption algorithm. Some of the strings clearly pointed to ‘libtomcrypt’ as the encryption library, which definitely will help to reconstruct some of the symbols and logic in order to facilitate this sometimes tedious task.
They are using AES CTR without any apparent hardcoded key or IV, so there should be some logic that generates them at runtime. After reverse engineering the binary, we can break the encryption scheme into two different items: the values needed to derive the IV and the key and process for deriving them.
1.Values
There are two different values that are required to properly derive the IV and the key for AirLink devices:
1.1Custom ‘seed’
This 8-byte hardcoded value can be found in the ‘swinstaller’ binary, close to the ‘sha256’/’aes’ strings in most cases.
Please note that it may vary across devices and versions.
1.2Custom ‘version’
This value can be found in the ‘manifest.txt’ file and corresponds to the ‘ALEOS_VERSION’ value, highlighted in the image below.
As in the previous case, it will obviously be different across versions.
2.Deriving the IV/Key
This non-canonical simple pseudo-code can be used to get an overall idea behind the generation.
a = "\x00"*32
b = version+seed
copy(a, rounds_sha256(b), 32)
materials = rounds_sha256(a+b)
iv = materials[0:31]
key = materials[32:63]
The full logic to decrypt AirLink firmware files has been implemented in following file:
// For research purposes only// // Sierra Wireless' Airlink Firmware Decrypter (Ruben Santamarta @ IOActive) // @IOActiveLabs https://labs.ioactive.com// // Dependencies: // libtomcrypt https://github.com/libtom// // Compile// $ gcc decrypter.c -o decrypter -Isrc/headers libtomcrypt.a// // Example// KEY is the ALEOS_VERSION at manifest.txt (manifest.txt!ALEOS_VERSION=KEY)// $ ./decrypter -d KEY aes /file/path/RV50/rootfs.sqfs.uboot /file/path/RV50/rootfs.sqfs.uboot.decrypted 4096 1/* Example output for RV50 firmware - ALEOS_VERSION=4.13.0.017 * Sierra Wireless' Airlink Firmware Decrypter (Ruben Santamarta @ IOActive) * - Initializing materials...Hashing at keyBuff+32 for 18 bytes...round 1round 2round 3round 4Copying 32 bytes from the hashed material to keyBuffNow hashing the entire keyBuff [50 bytes]...round 1round 2round 3round 4***=> IV: "\x11\x5F\x24\x07\x50\x3C\x68\xD2\x28\x26\xBA\x18\x4B\x12\x54\xF1\x2C\x20\x36\x01\x45\x86\x42\x99\x05\x6D\x43\x3C\xC5\x80\xCA\x94"***=> Key: "\x7D\x69\x78\x59\x55\x35\xF9\xAA\x4F\x8E\xBE\xE4\xE8\xD2\xEE\xFA\x86\x35\xD1\x6A\x58\x81\x53\x78\x6D\xFF\x2E\xB5\xBC\x88\x21\x11"[+] Decrypting firmware to decrypted.bin...[+] Done*/#include <tomcrypt.h>#include <stdio.h>#include <stdlib.h>#include <string.h>int errno;
typedefstruct _product_key{
unsignedchar seed[8];
char*name;
} product_key;
// SEED TABLE (ALEOS VERSION 4.13.0.017)// Extracted from the 'swinstaller' binary (different from product/version)
product_key seed_table[]={
{"\x60\x22\xD5\xCD\x3C\x09\xCD\xAB","ES450"},
{"\x5D\x5C\xAA\x26\x2D\x0B\xDE\x5A","RV50"},
{"\xFB\x76\x0D\xCE\xC1\x2C\xC8\x16","LX60"},
{"\xCB\x4E\x4A\x5F\x07\x89\x0B\xDE","RV55"},
{"\x1C\xDF\x8D\x14\xB3\x61\xCF\x12","MP70"},
{"\x60\x22\xD5\xCD\x3C\x09\xCD\xAB","GX450"},
{0}
};
intgenerate_materials(unsignedchar*inBuff, int len, void*dest, size_t*a4, int a5);
intinit_keys(char*keyString, int len, int product, unsignedchar**key, unsignedchar**IV);
intinit_keys(char*keyString, int len, int product, unsignedchar**key, unsignedchar**IV)
{
unsignedchar*keyBuff;
unsignedchar keyHash[64]={0};
unsignedchar ivHash[64]={0};
size_t retLen;
size_t keylen,totalen;
int result;
printf("\n- Initializing materials...\n");
*key = (unsignedchar*)calloc(0x40,1);
*IV = (unsignedchar*)calloc(0x40,1);
keylen = len;
totalen = keylen +40;
keyBuff = (unsignedchar*)calloc(totalen, 1);
retLen =32;
// Copy key string "\x00"*32+key
memcpy(keyBuff +32, keyString, keylen);
// Copy remaining materials "\x00"*32+key+seed
memcpy(keyBuff +32+ keylen, seed_table[product].seed, 8);
printf("Hashing at keyBuff+32 for %lu bytes...\n",totalen -32);
result = generate_materials( (keyBuff +32),
totalen -32,
keyHash,
(size_t*)&retLen,
5);
printf("Copying 32 bytes from the hashed material to keyBuff\n");
memcpy(keyBuff,keyHash, 0x20);
retLen =32;
printf("\nNow hashing the entire keyBuff [%lu bytes]...\n",totalen);
generate_materials( keyBuff,
totalen,
ivHash,
(size_t*)&retLen,
5);
memcpy(*IV,ivHash,0x20);
memcpy(*key,keyHash,0x20);
printf("***=> IV: \"");
for(int i=0; i<32;i++){
printf("\\x%02X",ivHash[i]);
}
printf("\"\n");
printf("***=> Key: \"");
for(int i=0; i<32;i++){
printf("\\x%02X",keyHash[i]);
}
printf("\"\n");
return1;
}
intgenerate_materials(unsignedchar*inBuff, int len, void*dest, size_t*a4, int a5)
{
int v5;
size_t*v7;
int v9;
int v10;
size_t n;
unsignedchar*outBuff;
int v13;
int i;
int v15;
v9 = len;
v7 = a4;
outBuff = (unsignedchar*)calloc(0x100,1);
v13 = find_hash("sha256");
n =128;
v15 = hash_memory(v13, inBuff, v9, outBuff, &n);
if ( *v7 > n ){
printf("Error hashing memory\n");
exit(0);
}
memcpy(dest, outBuff, n);
*v7 = n;
for ( i =1; i < a5 &&!v15; ++i )
{
printf("round %d\n",i);
v15 = hash_memory(v13, dest, *v7, outBuff, &n);
memcpy(dest, outBuff, n);
*v7 = n;
}
printf("\n");
if ( v15 )
v5 =-1;
else
v5 =0;
return v5;
}
intusage(char*name)
{
int x;
printf("\nUsage: %s -d version cipher('aes') infile outfile chunk_size product(ID)\nSupported products:\n", name);
for(x=0; seed_table[x].name !=NULL; x++) {
printf("ID: [%d] Description: %s\n",x, seed_table[x].name);
}
printf("\n$ ./decrypt -d 4.12.0.p31 aes /file/path/RV50/rootfs.sqfs.uboot /file/path/RV50/rootfs.sqfs.uboot.decrypted 4096 1\n");
exit(1);
}
voidregister_algs(void)
{
if (register_cipher (&aes_desc)){
printf("Error registering AES\n");
exit(-1);
}
if (register_hash(&sha256_desc) ==-1) {
printf("Error registering SHA256\n");
exit(-1);
}
}
intmain(int argc, char*argv[])
{
unsignedchar*plaintext,*ciphertext;
unsignedchar*inbuf;
size_t n, decrypt;
symmetric_CTR ctr;
int cipher_idx, hash_idx;
char*infile, *outfile, *cipher;
FILE*fdin, *fdout;
size_t amount;
unsignedchar*cKey;
unsignedchar*cIV;
if (argc <7) {
return usage(argv[0]);
}
register_algs();
inbuf = (unsignedchar*)calloc(8192,1);
cipher = argv[3];
infile = argv[4];
outfile = argv[5];
amount = atoi(argv[6]);
if (!strcmp(argv[1], "-d")) {
plaintext = (unsignedchar*)calloc(8192,1);
decrypt =1;
} else {
printf("\n[!] decryption only");
exit(0);
}
printf("\n* Sierra Wireless' Airlink Firmware Decrypter (Ruben Santamarta @ IOActive) * \n");
init_keys( argv[2], strlen(argv[2]), atoi(argv[7]), &cKey, &cIV );
fdin = fopen(infile,"rb");
if (fdin ==NULL) {
perror("Can't open input for reading");
exit(-1);
}
fdout = fopen(outfile,"wb");
if (fdout ==NULL) {
perror("Can't open output for writing");
exit(-1);
}
cipher_idx = find_cipher(cipher);
if (cipher_idx ==-1) {
printf("Invalid cipher entered on command line.\n");
exit(-1);
}
if (decrypt) {
if ((errno = ctr_start(cipher_idx,
cIV,
cKey,
32,
0,
CTR_COUNTER_LITTLE_ENDIAN,&ctr)) != CRYPT_OK) {
printf("ctr_start error: %s\n",error_to_string(errno));
exit(-1);
}
printf("\n[+] Decrypting firmware to %s...",outfile);
do {
n = fread(inbuf,1,amount,fdin);
if ((errno = ctr_decrypt(inbuf,plaintext,n,&ctr)) != CRYPT_OK) {
printf("ctr_decrypt error: %s\n", error_to_string(errno));
exit(-1);
}
if (fwrite(plaintext,1,n,fdout) != n) {
printf("Error writing to file.\n");
exit(-1);
}
} while (n == amount);
printf("\n[+] Done\n");
}
fclose(fdin);
fclose(fdout);
return0;
}
At this point, it is possible to decrypt all of the files, including the filesystem image, so we can start hunting.
Remote Command Injection - Preauth -
Initial analysis showed that the main web interface looks solid enough after all those killed bugs. I decided to take a look at one of the main features of these AirLink devices: the ALEOS Application Framework (AAF).
It is worth mentioning that this set of features is not enabled by default, so the administrator needs to enable AAF through the web interface. Once it has been activated, this framework will extend the regular capabilities of these devices, allowing external developers to create their own embedded applications. From the device perspective this has mainly been implemented using LUA, so I decided to take a look at the code (‘/usr/readyagent/lua’ folder). There was something that immediately got my attention: when AAF is enabled, a custom LUA RPC scheduler is exposed at LAN_IP:1999/TCP.
File: ‘/usr/readyagent/lua/rpc/sched.lua’
Following the code, we find that this RPC server deserializes arbitrary function names and arguments, which may be attacker controllable.
File: ‘/usr/readyagent/lua/rpc/sched.lua’
The first request (line 55) receives ‘t’,‘seqnum’ and the number of bytes of serialized data to be received from the client. Then, at line 162, our data will be deserialized using ‘luatobin’ format.
File: ‘/usr/readyagent/lua/rpc/proxy.lua’
These values will be handled by ‘common.execute,’ which allows any function to be executed.
File: ‘/usr/readyagent/lua/rpc/common.lua’
A malicious actor can leverage this vulnerability to invoke arbitrary LUA functions, such as ‘os.execute’. As a result, an attacker on a network adjacent to an AirLink device (with AAF enabled), will gain the ability to execute arbitrary commands under the privilege of the ‘rauser’ account.
Local Privilege Escalation to Root
At this point I could execute arbitrary commands without requiring any authentication, but ‘rauser’ is still a low-privileged account. The next step was to find a way to escalate privileges to root.
The main web interface is not running as root, but still we can update the firmware, reboot the device, etc., so there should be some logic that allows these ‘root’ operations to be requested from a different privilege level. By reverse engineering the different binaries involved, I eventually found the IPC mechanism: a message queue called ‘/urmG’
File: ‘/lib/libSWIALEOS41.so.1.0’
Any process can access this message queue:
-rw-rw-rw- 1 root 0 80 Sep 9 00:34 urmG
Basically, the root process ‘/usr/sbin/UpdateRebootMgr’ reads a message from this queue that contains the action that has to be performed on the requester’s behalf. Depending on the action, ‘UpdateRebootMgr’ will run the binary in charge of that action, while also passing the command line received from the low-privileged process through the message queue.
For instance, ‘RequestUpdate’ is a binary that sends messages to the ‘UpdateRebootMgr’ root process through the ‘/urmG’ message queue. When ‘UpdateRebootMgr’ processes a certain message, it will invoke ‘FW_UPLOAD_CMD’ using the command line passed in the ‘-o’ argument.
File: ‘/usr/sbin/atfw_rm_update’
…
RequestUpdate -c aleos -o "--aleos $LOCAL_FW" –w
…
Pay attention to this sequence:
1. File: ‘/usr/sbin/UpdateRebootMgr’
2. File: ‘/usr/sbin/libSWIALEOS41.so.1’
3. File: ‘/usr/sbin/UpdateRebootMgr’
This looks promising. Let’s see what is inside ‘ALEOS:swsystemtools::runSystem.’
File: ‘/lib/libswsystemtools.so’
‘ALEOS::swsystemtools::isSafeString’ looks like the kind of function that should prevent this injection from happening; however, it fails because when the first character is a ‘-‘ it is possible to bypass the ‘find_first_of’ check, which would detect some command injection characters.
As a result, it is possible to perform a classic command injection through the ‘/urmG’ message queue to escalate privileges to root.
This chain of exploits can be used from an adjacent network to get root access without requiring any authentication on any AirLink device that has AAF enabled. This is not the default option, so the attack is mitigated in that sense.
There are some security boundaries these vulnerabilities break in a Sierra Wireless AirLink device:
According to the documentation, the ‘root’ user is proprietary to Sierra Wireless.
The main firmware file is signed and certain key files in the package are encrypted. This attack allows malicious firmware to be installed on the device, thus gaining persistence.
There is an interesting feature, although it is unlikely to be exploited. AirLink customers can temporarily enable a remote support option. This adds a hardcoded root hash to ‘/etc/shadow’ and seems to be identical across devices. A rooted AirLink device might be used to trick Sierra Wireless support staff into remotely connecting to the device to capture the password.
Conclusion
IOActive notified Sierra Wireless about these vulnerabilities in January 2020, which resulted in the following advisories:
-----
Sierra Wireless thanks IOActive for the responsible disclosure of these vulnerabilities.
In current versions of ALEOS, the RPC server is enabled only when the AAF user password is defined.
Sierra
Wireless recommends that customers enable the AAF user only for devices
that are being used for AAF development and debugging. The AAF user is
not required for AAF applications
to be deployed and run.
Deployed devices must not have the AAF user password enabled.
Sierra
Wireless recommends upgrading to the latest ALEOS version for your
gateway. For devices running ALEOS 4.13 today, Sierra Wireless
recommends upgrading to ALEOS 4.14.0 once it is
available.