Having fun visualizing Swift’s sort()

Peter Naur, a renowned Danish computer scientist, said that programming is not merely the creation of a program, but an activity by which the programmers form a theory of the matters at hand. If you haven’t read his famous paper, I encourage you to reserve some time to do that. In my opinion, complexity prevents us from reaching a complete theory of a particular program, and many software engineering communities are exploring ways to avoid or reduce it (for example, by applying functional programming techniques to app development).

I’ve been recently exploring the domain of program visualization, that is, software on top of a debugger, or even isolated programs, whose only purpose is to help the programmer understand their code better. I think that better tools in this area would help programmers form a better theory of a complex program, and may reduce the number of bugs that are introduced and the time we spend debugging. To give you a more concrete example, I think that Apple’s recent memory graph debugger is a very neat abstraction that helps immensely when you are trying to understand the resource ownership graph of your program. It beats raw source code exploration and documentation by any measure.

The other day, I decided to have some fun with Swift arrays. Swift arrays have standard sorting routines to sort them, either in-place or by returning a new instance. Imagine that you want to understand which sorting algorithm is implemented by the standard library, and what steps are followed for a particular array. What alternatives do you have?

  • As the standard library is open source, you could simply read the implementation and try to understand the algorithms by yourself.
  • You could use sort(by:) and print the elements involved in each comparison.

The second bullet is what basic debugging is about, and I don’t know about you but I’ll probably need to have a pen and paper close by to draw the array, the elements, and understand how the sorting is happening, just like I did when I was studying sorting algorithms at university.

Can we do better? Any code that is heavily algorithmic certainly benefits from visualization. That’s why websites like https://visualgo.net are so popular among students and interested professionals. In order to show the contents of the array at each step, we may start searching GitHub for a good Swift image library. But there’s a poor man’s technique that can create images and even animated GIFs using plain print  statements in Swift if we follow the PPM format (the trade-off is that the image will be much bigger than a typical JPG or PNG). With tools like ImageMagick, you can even create an animated GIF from a sequence of PPM images.

A very simple Swift program that sorts a random array of 64 elements is this:

The sorting closure is invoked with each pair of elements that the algorithm needs to compare so we could expand it to access the partially sorted array and create a PPM image that depicts its current state. The problem we will face is that the samplearray is being modified in place, so any simultaneous access to it is going to be prevented by Swift’s runtime module that tracks exclusivity (read the memory ownership manifesto for more information). One way to avoid this is to use Swift 3’s compatibility model, which downgrades this exception into a warning, or use raw UnsafeMutableBufferPointers:

The program works as follows: Line 11 prints the required header for a PPM image in binary format, including the frame width, height, and “255”, which is the maximum intensity value.

Lines 12 to 27 and 14 to 26 iterate through the array and print each value in two possible ways:

  • If the element is currently selected by the algorithm, print it in red.
  • If not, print it in a shade of gray whose intensity depends on the element’s value (so that we can see how the array is being sorted).

Using ImageMagick to assemble a GIF from the sequence of images generated by the above program, the result is this:

sort

Red squares represent the elements that are selected at each stage of the algorithm, and the different shades of gray represent the current value stored in each slot. The algorithm at the beginning is quicksort, a well-known sorting algorithm with a bad worst case whose time complexity is quadratic in the size of the array. To avoid this bad worst case, when the depth of the recursion tree is greater than a certain threshold the algorithm switches to heapsort, whose time complexity is O(n*logn) in the worst case. By inspecting the source code we can see that the threshold value is 2*floor(log(N)). Finally, when subarrays are small enough, the algorithm switches to insertion sort, which is very efficient for short arrays.

Swift’s sort is thus a hybrid algorithm that combines several well-known algorithms to avoid bad worst-case performance. The official name of this hybrid algorithm is introsort (invented in 1997), and the libc++ team is implementing the same algorithm for Clang. The number of frames in the GIF animation, 407, gives us the idea that the time it took to sort this random array of 64 elements is roughly proportional to O(n*logn), which is asymptotically optimal for general sorting algorithms that compare elements. However, there are instances that trigger bad performance from Swift’s sorting algorithm. Can you think of one? Do you want to play evil and use this technique to show what happens in those worst cases? 🙂

Advertisements
Posted in Algorithmics | Tagged , , | Leave a comment

Reverse Engineering macOS High Sierra Supplemental Update

Reported by Matheus Mariano, a Brazilian software developer, a programming error was discovered in Apple’s most recent operating system, High Sierra, that exposed passwords of encrypted volumes as password hints. A serious bug that quickly made the headlines in technology websites everywhere.

disk-utility-password-prompt-800x367

The dreaded password hint bug: Here, “dontdisplaythis” is the actual password.

 

Apple was prompt to provide macOS High Sierra Supplemental Update to customers via the App Store, and ensured that every distribution of High Sierra in their servers included this update.

I decided to apply a binary diffing technique to the update to learn more about the root cause of this bug and hypothesize about how the defect could have been prevented.

Inspecting the 51MB package, we can see that there are changes in the Disk Utility and Keychain Access apps, and also in related frameworks and command line tools:

Screen Shot 2017-10-08 at 11.53.25 AM

This post will focus only on the password hint bug, so our first step is to extract Applications/Utilities/Disk Utility.app/Contents/MacOS/Disk Utility  and to compare it with the same binary from a stock macOS 10.13 High Sierra. For this, I have written an Emacs extension that launches IDA whenever I load a Mach-O file in a buffer, generates a SQL database with information about the decompiled functions, loads the patched binary, and finally outputs a diff generated by Diaphora. This technique is useful for deconstructing binaries that have been updated by a minor patch release because there are usually just a few changes and common heuristics work well.

The diff between both versions of the Disk Utility binary revealed no differences in the decompilation:

VirtualBox_Windows 10_08_10_2017_13_20_16

That usually means that the only substantial changes reside in one of the linked frameworks. The most interesting one for this investigation is StorageKit, a private Apple framework that exposes APFS functionality to Disk Utility. It has two parts: a client library and a daemon, storagekitd. The client connects to the daemon using an Apple standard XPC mechanism. The daemon executes the operations (represented as subclasses of NSOperation) that the client demands. Here’s an interesting usage of StorageKit inside Disk Utility:

VirtualBox_Windows 10_08_10_2017_13_42_24

Reference to a StorageKit structure from controller code in Disk Utility.

This is part of the code that runs when you add a new APFS volume from the Disk Utility interface (concretely, the controller responsible for managing the new volume sheet).

Diffing StorageKit provided much more interesting results:

VirtualBox_Windows 10_08_10_2017_14_00_13

​​​[SKHelperClient addChildVolumeToAPFSContainer:name:caseSensitive:minSize:maxSize:password:passwordHint:progressBlock:completionBlock:] was one of the functions modified by the supplemental update. Inspecting the differences in decompilation revealed the actual bug:

VirtualBox_Windows 10_08_10_2017_14_15_16

In the picture above, the old, vulnerable, StorageKit is diff’d against the updated one. Removed lines removed are depicted in red, added lines in green, and changes in yellow. The above function basically creates an instance of NSMutableDictionary (Cocoa’s representation of a hash table) and fills it with information about the volume. This dictionary is passed to addChildVolumeToAPFSContainer:optionsDictionary:handlingProgressForOperationUUID:completionBlock: as the optionsDictionary argument.

The most interesting keys in the dictionary are kSKAPFSDiskPasswordOption and kSKAPFSDiskPasswordHintOption, which are responsible for storing the password and the password hint, respectively. The bug is that the same variable, which contains the password, (represented in the decompilation as the same virtual register, v50) was used as value for both keys in the dictionary, meaning that the clear password was incorrectly sent as a password hint via XPC. In reconstructed Objective-C code, the bug would be something like this:

NSMutableDictionary *optionsDictionary = [NSMutableDictionary alloc] init];
[...]
optionsDictionary[kSKAPFSDiskPasswordOption] = password;

optionsDictionary[kSKAPFSDiskPasswordHintOption] = password;

Here’s the corrected function from the supplemental update:

Updated

Note that the correct variables for the password and the password hint are set.

This is an example of a common category of bugs where code with a common structure is copied and pasted but the developer forgets to make every required modification and consequently there’s a fatal change in behavior. If you are curious, this blog post shows you more examples of “Last Line Effect” bugs in open source software.

It’s important to emphasize that, although this particular dictionary is not stored anywhere (it’s simply used to pack the information that is sent to storagekitd), the fact that the password was sent incorrectly as password hint meant that storagekitd trusted its client and stored it as clear text, thinking it was a password hint.

Why did the bug not reproduce when using the command line?

This is a common question. Apparently, Disk Utility and command line diskutil use different code paths.  StorageKit does not appear as a direct dependency of diskutil, or in the transitive closure of its dependencies. Here’s otool -L output:

/usr/lib/libcsfde.dylib (compatibility version 1.0.0, current version 1.0.0)/usr/lib/libcsfde.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libCoreStorage.dylib (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1443.14.0) /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0) /System/Library/PrivateFrameworks/DiskManagement.framework/Versions/A/DiskManagement (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/DiscRecording.framework/Versions/A/DiscRecording (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0) /System/Library/Frameworks/DiskArbitration.framework/Versions/A/DiskArbitration (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libicucore.A.dylib (compatibility version 1.0.0, current version 59.1.0) /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1443.13.0)

This duplication in what’s more or less the same functionality, while sometimes justified, certainly increases the opportunity for bugs.

How could this have been prevented?

There’s two engineering practices that help with bugs like this (but do not eradicate them completely):

Unit testing

Unit testing is the practice of creating software tests that exercise a single unit in a computer program, where “unit” is typically a class or module. Effective unit testing requires sensing outputs reliably and asserting that they are expected, so side effects from functions complicate unit testing a bit. In this particular bug, the side effect is the communication with the XPC service, so separating the logic that creates the dictionary from the part that communicates with the service would help. When a software design is not easily testable, companies rely excessively on manual testing, which is not a very effective way of testing, given the high number of combinations that is typical in modern software (did the QA engineer test setting a password *and* a password hint?, easily forgettable on a tight deadline).

Code review

Code review is the practice of reviewing code before or after it lands the main development branch in a software project. Code reviews should always be small, so that the reviewer’s attention is focused and can suggest better improvements and even spot bugs like this. A “last line” bug can easily be ignored if it’s part of a huge code review.

Conclusion

An unfortunate bug in macOS High Sierra stained a bit its generally well-received debut, and from this root-case analysis we can learn what happened exactly and how good software development practices (including testable design and strict code reviews) can help reduce the chance that this kind of problems happen again in the future.

 

Posted in Reverse Engineering | 16 Comments

Welcome to Cocoa Engineering!

NSLog(@"Welcome to Cocoa Engineering!")

This is the first post of a new blog that I’m starting about Cocoa engineering. Cocoa is the name of Apple’s object-oriented API for writing OS X and iOS applications, so, as you can imagine, I’ll write here about how to make applications for iPhone and iPad, describing its common patterns, giving API guidance, and some platform trivia.

I hope you enjoy my blog!

Posted in No category | Leave a comment