Dusk: An Investigation Into Soap

February 23, 2020

I absolutely love Dusk, an indie retro FPS dripping with 90s charm. I’ll save my gushing about how good this game is for another post, for brevity’s sake - because what I wanted to talk about here was my quest for soap.

Soap?

On every level of Dusk, there are a number of small props that can be interacted with in the physics engine. One such prop that appears only once in every single level, is a simple bar of soap.

I've seen worse AirBnBs. Image © New Blood Interactive

The soap is fairly benign looking at first, appearing in the very first level beside a toilet, a roll of toilet paper, and a book - all of which are interactable. In the second level, it can be found neatly positioned on a shelf inside a shower cubicle. Once the game starts taking you to more fanciful locales, the soap’s repeated appearance becomes a little suspect.

Eventually, if you pick up the soap and happen to throw it at an enemy, you’ll be surprised to see that the enemy is swiftly replaced with a fine red mist.

If only cleaning the kitchen was as straightforward. Image © New Blood Interactive

As it turns out, the bar of soap is a bit of an easter egg, and by far the most damaging weapon in the game¹. As an easter egg, the bar of soap features as an achievement for the Steam version of the game, which challenges players to pick up the bar of soap on every level in the game.

The elusive achievement. Image © New Blood Interactive/Valve

I’m not usually one for being an achievement hunting completionist, but I enjoyed Dusk so much, and the 100% was so close and seemingly attainable², I decided to try and finish off my collection. The problem is that Dusk doesn’t tell you which soaps you still need to find, and I hadn’t been keeping track of which ones I had seen while I played through the game. I didn’t really fancy immediately replaying the entire game again searching for soaps!

Finding Obvious Traces of Soap

After a quick google, it seems that most people take the sensible approach of just following a guide, loading up every level, and rushing to the soap before moving to the next. Instead of following in their footsteps, I needed to find where the soap collection stats were stored!

The obvious first step is to track down all of the locations on the filesystem that Dusk stores settings and files in, and have a look for any obviously named files. There are three locations I’ve found:

The actual game data directory, e.g. C:\Program Files (x86)\Steam\steamapps\common\Dusk
A directory in the user’s AppData, e.g. C:\Users\george\AppData\LocalLow\David Szymanski\Dusk
In the registry, under Computer\HKEY_CURRENT_USER\Software\David Szymanski\Dusk

The AppData directory seems to consist of logging output from the Unity game engine. There’s nothing of interest in here, and seemingly nothing worth digging into deeper.

The registry entries are a little more interesting, but fairly self explanatory to what each key represents. The majority of the keys correspond to basic game settings, primarily graphics related. The other keys appear to be more default storage from the Unity game engine, e.g. unity.cloud_userid. There’s nothing in there that seems achievement related!

A snapshot of the registry entries present for Dusk shows that it's just simple config values.

That leaves only one place to look, in the main game directory. A cursory glance around the file tree, skipping over executables, libraries, and game assets, reveals only two directories that seem relevant:

saves
config

Soapy Saves?

Dusk doesn’t force you to assign a “save slot” like many classic games before starting a game, so I believe it’s possible to complete the game and earn the soap achievement without ever saving the game. This would suggest that the saves don’t store any global game state, and are only there to store current level state.

Running strings against a save file reveals that the saves are largely ASCII, and while not readily human readable, it’s fairly easy to see what the general idea of the file is:

$ strings saves/u | head -10
~2CrossbowAmmoPickup (14)(-72.9, -17.4, 1.9)tr4n5orm=
Weapon Pickup{~(tenarmor (4)(-71.3, -53.0, 56.0)tr4n5orm=
Health Pickup{~-treasurechest (4)(-76.7, -53.3, 56.7)tr4n5orm8
Untagged{~+treasurechest (4)(-76.7, -53.3, 56.7)he41th
HB{~+treasurechest (4)(-76.7, -53.3, 56.7)onf1re
{~+treasurechest (4)(-76.7, -53.3, 56.7)myn4m3
{~(woodbarrel (26)(-61.5, 5.5, -36.3)myn4m3
Lift Barrel{~*woodbarrel (26)(-61.5, 5.5, -36.3)tr4n5orm<
PickUpObject{~(woodbarrel (26)(-61.5, 5.5, -36.3)he41th
B{~(woodbarrel (26)(-61.5, 5.5, -36.3)onf1re

I will admit I have no idea why the save file is seemingly written in leetspeak - but it appears that the save files are concerned solely with entity locations in the level and their attributes. We can search a save file for mentions of soap:

$ strings saves/z | grep -i soap
Touch{~!Soap(-330.4, -94.3, 64.9)tr4n5orm<
Soap(-330.4, -94.3, 64.9)he41th
Soap(-330.4, -94.3, 64.9)onf1re
Soap(-330.4, -94.3, 64.9)myn4m3
        Lift Soap{~)pinetree (12)(-60.5, -95.0, 55.3)tr4n5orm8

This shows that save files aren’t totally useless for tracking our soap adventures, as we can at least tell that we’ve moved the soap in a given save. Unfortunately, I haven’t saved in every level, so I think this would only give me a partial view of my progress.

Configurable Soap?

The config directory contains 2 INI files, one for the main game, and one for the multiplayer component. These configuration files are helpfully commented, (although the key names themselves are quite self explanatory anyway) - and there’s nothing in here that seems relevant. Strangely, the INI files seem to duplicate a lot of the configuration stored in the registry.

The final file is definitely the most interesting, simply named scores. Again, I’m looking at a binary file with some ASCII inside, although the ASCII is a much smaller proportion of the file than what I saw in the save files:

$ strings config/scores | head
~       31lowtech
31completionist
31startingenemies
31startingsecrets
{~      31seconds
>n#HsA{~        31minutes
31levelbeaten
5levelbeaten
4startingenemies
4startingsecrets

These key names seem to correlate with the level completion stats and the various awards available (e.g. “completionist” for 100% kills and secrets in a level).

There’s no mention of soap though!

The Experimental Approach to Soap

Having not found any obviously relevant soap records, I decided to run a simple experiment:

Save a backup copy of the game directory.
Remove anything apparently personal from the data directory (or reinstall the game).
Start the game fresh, and start the first level, but don’t grab the soap.
Close the game and copy the game directory out of the way.
Re-launch the game, start the first level, grabbing the soap this time.
Close the game and diff the game directory against the copy taken in step 4.

I fiddled with the steps a few times, including making changes like actually completing the level with and without the soap. The only difference I could find each time was in the scores file - this must be where the soap residue can be found! I had to dig deeper…

Settling the Scores with Soap

It was time to work out what exactly was hidden within all that binary in the scores file. My first attempts started with continuing the experimental approach, looking at the diff between different score files when only making the smallest change - for example, completing the first level 1 second faster than previously. This was noisier than anticipated - huge chunks of the scores file would change despite only the slightest adjustment in my attempt (I would later learn that this is just the scores serialiser not having a fixed order of records).

An excerpt of the different scores file copies, with increasingly barbaric names reflecting what had changed in-game.

I wanted to work out exactly what the scores file format was, so I started examining the file more closely in a hex editor (here I use xxd, for an easier demo), to try and get a feel for any patterns.

$ xxd scores | head -15
00000000: 7e0c 336c 6576 656c 6265 6174 656e 0a00  ~.3levelbeaten..
00000010: 0000 ff56 08a8 e203 0000 007b 7e09 3331  ...V.......{~.31
00000020: 6c6f 7774 6563 680a 0000 00ff 5608 a8e2  lowtech.....V...
00000030: 0100 0000 7b7e 0f33 3163 6f6d 706c 6574  ....{~.31complet
00000040: 696f 6e69 7374 0a00 0000 ff56 08a8 e201  ionist.....V....
00000050: 0000 007b 7e11 3331 7374 6172 7469 6e67  ...{~.31starting
00000060: 656e 656d 6965 730a 0000 00ff 5608 a8e2  enemies.....V...
00000070: 0100 0000 7b7e 1133 3173 7461 7274 696e  ....{~.31startin
00000080: 6773 6563 7265 7473 0a00 0000 ff56 08a8  gsecrets.....V..
00000090: e201 0000 007b 7e09 3331 7365 636f 6e64  .....{~.31second
000000a0: 730a 0000 00ff 6bd7 3e6e 2348 7341 7b7e  s.....k.>n#HsA{~
000000b0: 0933 316d 696e 7574 6573 0a00 0000 ff6b  .31minutes.....k
000000c0: d73e 6e00 0000 407b 7e0d 3331 6c65 7665  .>n...@{~.31leve
000000d0: 6c62 6561 7465 6e0a 0000 00ff 5608 a8e2  lbeaten.....V...
000000e0: 1f00 0000 7b7e 0c35 6c65 7665 6c62 6561  ....{~.5levelbea

Some things were clear quite quickly³:

There are ASCII strings in there that quite obviously relate to level completion stat keys (e.g. “kills”, “levelbeaten, “completionist”)
Each ASCII string stat key is preceeded by at least one ASCII numeric digit.
Between each string there is the binary string 0x7B7E ({~ in ASCII), and between this apparent separator and the numeric digits there is always a single seemingly random byte.

So far so good. It’s not immediately obvious what it all means, but from this I could at least ascertain that the scores file seems to be a flat series of records with a fixed separator sequence and string keys. Each key was fairly obviously related to the stats shown in game.

In the hex editor I used, a data interpretation sidebar showed the interpretation of the data at the cursor in a range of common C data types. With a bit of poking around and comparison between the in-game display of kill counts and times, I was also able to identify:

The ASCII digit(s) prefixed onto the stat key are a level ID. All records with the same digit prefix are stats from the given level.
The last 4 bytes of seconds and minutes make up little endian single precision floats, which matches that part of the completion time.
The name record holds the level name that the numeric level ID corresponds to. The string name starts at the 10th byte after the end of the key string.
All the remaining records hold a little endian 32 bit integer value in the final 4 bytes of the record.

So far so good, I had managed to work out the key structure behind the values in the scores file. Unfortunately, none of the readable ASCII in the file obviously mentions soap, but there is still a lot of binary between the parts I deciphered that I had no explanation for, that might have held the secret…

Escaping the Hex Editor, Soap not Included

The hex editor was a useful tool to quickly view the file’s contents, but visually identifying patterns was difficult, as there was no way to align apparent records with similar records. I decided that to delve deeper, I’d need to start writing some code to help visualise what exactly I was looking at⁴. For rapid prototyping and testing purposes, I didn’t want or need the structure of a compiled, statically typed, language. I chose to use python, as I felt it’s tools for working with binary are a lot nicer than perl (which I use most often at work).

I decided to treat 0x7B as a record end marker, and 0x7E as record start (I represented these visually as $ and ^, for a more familiar representation for regex users). I then split the input data on record start markers, and defined a simple ScoreRecord class with fields for each of the known (and unknown) parts of the record. The constructor accepted a raw bytes object of the record, and then split that further to parse the known values.

While implementing this splitting, I noticed an extra detail; the 9th byte after the end of the name key, i.e. the byte immediately before the level name, gave the length of the string. This information wasn’t particularly useful, as I was already able to cut the string based on the record end indicator - but it does mean that the scores format is flexible enough to allow the record separators in level names. I tested this theory by adding a check in the parser to check the length indicated by the 9th byte to the remaining byte count before the record end marker.

meaning: ^  |??|"46" |"name"       |???                        | len |"Dusk"      |$
    raw: 7E |06|34 36|6E 61 6D 65  |0B 00 00 00 FF EE F1 E9 FD | 04  |44 75 73 6B |7B

To demonstrate my approach, here’s an excerpt from an early version of the ScoreRecord class:

class ScoreRecord:
    def __init__(self, raw):
        self.raw = raw
        # { appears to be the record end marker
        stripped = raw.rstrip(b'{')
        if stripped == raw:
            raise ValueError("Missing record end byte? " + str(raw))
        raw = stripped
        labelstart = False
        reststart = False
        label = ""
        self.pfix = raw[0] # What is this first byte???
        raw = raw[1:]
        levelind = bytearray()

        for i, b in enumerate(raw):
            if not labelstart and chr(b).isalpha():
                labelstart = True
                label += chr(b)
            elif labelstart and chr(b).isalpha():
                label += chr(b)
            elif labelstart and not chr(b).isalpha():
                self.rest = raw[i:]
                break
            else:
                levelind.append(b)

        self.level = bytes(levelind)
        self.label = label
        # apparently the last 4 bytes are the business end???
        if label in ['seconds', 'minutes']:
            self.value = struct.unpack('<f', self.rest[-4:])[0]
        elif label in ['name']:
            # 9th byte is potentially the string length...
            # though not really necessary with the end record marker?
            self.lenValue = self.rest[9]
            self.value = self.rest[10:10+self.lenValue].decode("utf-8")
            if len(self.value) != self.lenValue:
                printerr("Length indicator assumption is wrong?")
        else:
            self.value = int.from_bytes(self.rest[-4:], byteorder='little')

    def __str__(self):
        return ("[" + str(self.level) + "] " + self.label + ": "
                + str(self.value) + " {" + str(self.pfix) + "} "
                + str(self.rest))

With this, I was able to start printing out the records with their values and the remaining binary, sorted and/or filtered by various properties. For example, the first thing I did to help confirm what I’d learnt so far, was to print the formatted records by level identifier:

b'3'
        [b'3'] completionist: 1 {14} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'3'] kills: 26 {6} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x1a\x00\x00\x00'
        [b'3'] levelbeaten: 3 {12} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x03\x00\x00\x00'
        [b'3'] minutes: 7.0 {8} b'\n\x00\x00\x00\xffk\xd7>n\x00\x00\xe0@'
        [b'3'] name: Head cheese {5} b'\x12\x00\x00\x00\xff\xee\xf1\xe9\xfd\x0bHead cheese'
        [b'3'] ninja: 1 {6} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'3'] seconds: 19.41334342956543 {8} b'\n\x00\x00\x00\xffk\xd7>n\x87N\x9bA'
        [b'3'] secrets: 6 {8} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x06\x00\x00\x00'
        [b'3'] startingenemies: 26 {16} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x1a\x00\x00\x00'
        [b'3'] startingsecrets: 6 {16} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x06\x00\x00\x00'
b'31'
        [b'31'] completionist: 1 {15} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'31'] kills: 1 {7} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'31'] levelbeaten: 31 {13} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x1f\x00\x00\x00'
        [b'31'] lowtech: 1 {9} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'31'] minutes: 2.0 {9} b'\n\x00\x00\x00\xffk\xd7>n\x00\x00\x00@'
        [b'31'] name: THE GAUNTLET {6} b'\x13\x00\x00\x00\xff\xee\xf1\xe9\xfd\x0cTHE GAUNTLET'
        [b'31'] seconds: 15.205111503601074 {9} b'\n\x00\x00\x00\xffk\xd7>n#HsA'
        [b'31'] secrets: 1 {9} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'31'] startingenemies: 1 {17} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'
        [b'31'] startingsecrets: 1 {17} b'\n\x00\x00\x00\xffV\x08\xa8\xe2\x01\x00\x00\x00'

If you read the code above, you’ll notice there’s one particularly obvious messy part: the splitting of the start of the raw data to find the numeric level ID and the label string. I soon realised that, with a length indicator appearing just before the name string, perhaps the single mystery byte immediately after the record start was also a string length indicator!

        identifierLen = raw[0]
        identifier = raw[1:1+identifierLen].decode()

        for i, c in enumerate(identifier):
            if not c.isdecimal():
                break
        self.level = identifier[0:i]
        self.label = identifier[i:]
        self.rest  = raw[1+identifierLen:]

Much nicer!

Cleaning up Binary

At this point, I took some time out of dissecting code to refactor and improve the code quality, before making the script more complex. In order to make explaining the remaining binary (which I named “the middle chunk”) more easy, I was going to have to improve the output formatting to make it more easily interpretable by eye. In particular, the chunk of binary after the record label string, but before the actual value - I wanted to be able to compare that both across record types and levels.

I came up with a column format that made the breakdown clear (although leads to somewhat long lines!):

parsed> ^  |s:|46   |kills                                              |? (the "middle chunk")     |7           |$  |
hex---> 7E |07|34 36|6B 69 6C 6C 73                                     |0A 00 00 00 FF 56 08 A8 E2 |07 00 00 00 |7B |
dec---> 126|7 |52 54|107 105 108 108 115                                |10 0 0 0 255 86 8 168 226  |7 0 0 0     |123|


parsed> ^  |s:|47   |completionist                                      |?                          |1           |$  |
hex---> 7E |0F|34 37|63 6F 6D 70 6C 65 74 69 6F 6E 69 73 74             |0A 00 00 00 FF 56 08 A8 E2 |01 00 00 00 |7B |
dec---> 126|15|52 55|99 111 109 112 108 101 116 105 111 110 105 115 116 |10 0 0 0 255 86 8 168 226  |1 0 0 0     |123|

It was immediately clear that was a lot of repetition, but not every unexplained blob had the same binary, and the pattern was not clear.

What could it have been? Every mystery “middle chunk” of binary was 9 bytes long, an odd number in more ways than one.

Most records started with what could be interpreted as a newline character, but this wasn’t universal, and didn’t fit with the string length indicator at the front of each record.
Could the 9 bytes be a string? Seemed unlikely:
- There were almost never any printable characters
- All other strings in the scores file are variable length
Could it be some kind of checksum? Also seemed unlikely:
- Why would you need one for a singleplayer highscores file?
- It’s a very large field!

Perhaps instead of being a single 9 byte field, it was instead two 4 byte numbers and a single extra byte field. This seemed somewhat convincing, as it fits with the format of other values in the scores file, and there were a lot of null bytes, which we might expect to see in real world numbers.

I started by looking for duplication in the middle chunk values, and found an interesting pattern when I edited the script to deduplicate these:

----- middle chunk breakdown -----

Processed 307 records:
b'\x1b\x00\x00\x00\xff\xee\xf1\xe9\xfd': 1 [name]
b'\x0f\x00\x00\x00\xff\xee\xf1\xe9\xfd': 1 [name]
b'\x1e\x00\x00\x00\xff\xee\xf1\xe9\xfd': 1 [name]
b'\x0b\x00\x00\x00\xff\xee\xf1\xe9\xfd': 1 [name]
b'\x14\x00\x00\x00\xff\xee\xf1\xe9\xfd': 1 [name]
b'\x12\x00\x00\x00\xff\xee\xf1\xe9\xfd': 2 [name]
b'\x0e\x00\x00\x00\xff\xee\xf1\xe9\xfd': 2 [name]
b'\x10\x00\x00\x00\xff\xee\xf1\xe9\xfd': 2 [name]
b'\x16\x00\x00\x00\xff\xee\xf1\xe9\xfd': 2 [name]
b'\x17\x00\x00\x00\xff\xee\xf1\xe9\xfd': 3 [name]
b'\x18\x00\x00\x00\xff\xee\xf1\xe9\xfd': 3 [name]
b'\x15\x00\x00\x00\xff\xee\xf1\xe9\xfd': 3 [name]
b'\x13\x00\x00\x00\xff\xee\xf1\xe9\xfd': 3 [name]
b'\x11\x00\x00\x00\xff\xee\xf1\xe9\xfd': 4 [name]
b'\x19\x00\x00\x00\xff\xee\xf1\xe9\xfd': 4 [name]
b'\n\x00\x00\x00\xffk\xd7>n': 66 [seconds,minutes]
b'\n\x00\x00\x00\xffV\x08\xa8\xe2': 208 [levelbeaten,completionist,startingenemies,startingsecrets,ninja,secrets,kills,lowtech,pacifist]

In this example, I logged the bytes object of each “middle chunk” alongside the number of times it appeared, and what labels those were. It appeared that the middle chunk depended on the data type of the record:

seconds and minutes used a float
name stored a string
the remainder were all integers

Note that I’m not an expert at the game, and I have not 100% completed it - so my analysis was imperfect. For example, I have not got the “lowtech” award on every level, so that record is only present for levels where I’ve achieved it. Regardless, the pattern was interesting, and I think there was enough evidence of the data type link without an exhaustive view of the data.

At this point, I noticed that bytes 1 through 4 (0-indexed) were always 0x00 0x00 0x00 0xFF, regardless of the record type and the bytes either side changing. This lent more weight to the theory of the middle chunk being two 32 bit values, and a single 8 bit value, at least. I had some further thoughts on the meaning of these 4 bytes:

Is it a fixed record separator?
- That would be redundant, given the fixed length of numeric fields, and length indicators for strings.
Is it a difficulty indicator? Are scores recorded for each difficulty level?
- There’s no indication on the UI that this is information that can be rendered.
- A quick test playing through the first level on a different difficulty setting proved this was not the case.
Is it some kind of version number?
- This would be strange to store alongside each record - why would the game support a scores file with content spanning multiple versions?

None of those explanations were convincing enough. Faced with a dead end for these values, I looked more closely at the final 4 bytes of the middle chunk.

The final 4 bytes were more promising: they were always the same for any record of the same value type:

0xEE 0xF1 0xE9 0xFD (string records)
0x6B 0xD7 0x3E 0x6E (float records)
0x56 0x08 0xA8 0xE2 (int records)

So perhaps the final 4 bytes of the middle chunk, or maybe even the final 8 of 9 bytes, were simply a data type indicator? I could see that having type information in the record itself might be useful, as it would allow you to write a scores parser without having to store a type mapping in the parser code for every possible label. On the other hand, storing a whole 16 or even 32 bits just to mark the selection of one of three datatypes seemed… lazy, at best, crazy, at worst.

An Interesting but Far-Fetched Observation

At a loss, I started poking once more at the final 4 bytes in the hex editor, just to see their interpretation as different types. To my surprise, I discovered the following interpretation of the final 4 bytes of the integer records:

0x56 0x08 0xA8 0xE2 ==[as big endian uint]==> 1443408098
1443408098 ==[as unixtime]==> 2015-09-28 02:41:38

The final 4 bytes are a timestamp in 2015!

David Syzmanski, the developer of Dusk, released his previous game A Wolf in Autumn on Steam on 2015-10-27, almost exactly a month after the date in the scores file. Is this the date when Dusk development started? Or is it an easter egg related to the development of A Wolf in Autumn?

Unfortunately, things fall apart once we look at the other values from string and float records:

0x6B 0xD7 0x3E 0x6E => 1809268334 => 2027-05-02T14:32:14+00:00 (from float values)
0xEE 0xF1 0xE9 0xFD => 4008831485 => 2097-01-12T12:18:05+00:00 (from strings)

These are much less convincing timestamps, unless David is trying to embed some kind of hint to the timescale for his future releases?

So perhaps these are all random, but I think the coincidence is too great in the first timestamp.

Soap on a Rope: The First Byte

I’d looked at the final 8 bytes of the middle chunk, and had some success deciphering them, but the first byte of the middle chunk remained.

The first byte was the only value that changed in the whole middle chunk in string values.

The first byte was always the same for float and integer values.

Only a single string value (the level name) is stored for every level.

The game only needs a single boolean to store the soap state for every level!

COULD THE SOAP BE IN THE STRINGS?

The First Byte is the Cleanest

To find out, I had to find out why the value of the first byte sometimes matched. I treated the first byte as an integer, and sorted each level name into a group based upon the first byte:

Processed 33 records:
27: 1 [The Infernal Machine (20)]
15: 1 [Neobabel (8)]
30: 1 [The Dweller In Darkness (23)]
11: 1 [Dusk (4)]
20: 1 [The Ratacombs (13)]
18: 2 [Head cheese (11) ; The Foundry (11)]
14: 2 [Sawdust (7) ; THE DIG (7)]
16: 2 [CREATIONS (9) ; Blasphemy (9)]
22: 2 [The Escher Labs (15) ; City of Shadows (15)]
23: 3 [Down on the Farm (16) ; Through The Gate (16) ; Brimstone Ghetto (16)]
24: 3 [Old Time Religion (17) ; Dead of the Night (17) ; Into the Thresher (17)]
21: 3 [The Cutty Mine (14) ; The Dim Slough (14) ; BLOOD AND BONE (14)]
19: 3 [THE GRAINERY (12) ; THE GAUNTLET (12) ; Fire and Ice (12)]
17: 4 [Steamworks (10) ; Ghost Town (10) ; The Unseen (10) ; Homecoming (10)]
25: 4 [The Erebus Reactor (18) ; The Iron Cathedral (18) ; Crypt of the Flesh (18) ; As Above, So Below (18)]

I’ve included the length of each string beside it, to make the correlation very clear. The first byte is related to the length of the string.

I then simplified and sorted the output to try and explain the numbers:

byte value: str len | diff
        11:       4 | 7
        14:       7 | 7
        15:       8 | 7
        16:       9 | 7
        17:      10 | 7
        18:      11 | 7
        19:      12 | 7
        20:      13 | 7
        21:      14 | 7
        22:      15 | 7
        23:      16 | 7
        24:      17 | 7
        25:      18 | 7
        27:      20 | 7
        30:      23 | 7

If we consider the first byte of the integer and float records, the same pattern is followed:

byte value: 10
value length: 4

Note that the value length of string values is actually 1 greater than the string length, to account for the length byte. This means that the difference is once again 7.

It’s so close to being a length indicator… but the numbers don’t quite match reality. For example, if we break down the shortest string example, the name record for the final level of the game, “Dusk”:

parsed> ^ |s:|46   |name      :s|X (11)|"middle chunk" - 1      |s:|Dusk       :s|$ |
hex---> 7E|06|34 36|6E 61 6D 65 |0B    |00 00 00 FF EE F1 E9 FD |04| 44 75 73 6B |7B|
                                  ^______|__|__|__|__|__|__|__|___|___|__|__|__|___^
                                  0      1  2  3  4  5  6  7  8   9  10 11 12 13  14

In this breakdown, I’ve marked the first byte of the middle chunk as X, in the context of the entire record, and with the remaining 8 bytes of the middle chunk marked beside it. I’ve also annotated the hex with a byte counter from X, to help illustrate the issue - it doesn’t fit!

If we treat X in the same way that we do the string length indicators, then that gives us a region that stops right in the middle of the string value, which is obviously incorrect.

The first byte is always followed by three null bytes. If we instead treat these null bytes as part of a (ridiculously large) rest-of-record length indicator, it does make sense:

parsed> ^ |s:|46   |name      :s|X (11)      |???            |s:|Dusk       :s|$ |
hex---> 7E|06|34 36|6E 61 6D 65 |0B 00 00 00 |FF EE F1 E9 FD |04| 44 75 73 6B |7B|
                                           ^___|__|__|__|__|___|___|__|__|__|___^
                                           0   1  2  3  4  5   6   7  8  9 10  11

I thought this was the strongest evidence for the meaning of anything in the “middle chunk” up to that point. It left 5 unexplained bytes:

The first is always 0xFF, so its purpose is unclear, and it’s not very interesting.
The remaining 4 I covered already, and the best explanation I could find was the easter egg date explanation.

Verifying with Fuzzing

Fuzzing is a technique used to identify defects in software that could be exploited, by sending random data⁵ to the target software and monitoring the outcome. I decided to use a similar approach to see if I could use the game to verify any of my discoveries about the middle chunk and the likely 32 bit length indicator.

My first test was to take a simple scores file created after only winning the first level, and then completely overwrite all 9 bytes of the middle chunk in one of the records with random data. I would then load the game and navigate to the level select menu to see whether my high scores were still displaying as expected. I chose to use the “minutes” field (at random) as the victim:

The outcome of overwriting the entire middle chunk with random data in a single record. Note that here I've selected the secret level, E1MS, which the UI is displaying as E1M1. Image © New Blood Interactive

Loading the level select screen after making this edit to the scores file caused a noticeable hang in the UI, but not a crash. Once the level select screen did load, it was clear that this had broken the majority of score loading. Every level was showing as unlocked (despite having only played the first), secrets and time were showing as “N/A”, and every level was showing as me having earned every award. Even worse - every level was now showing as being “E1M1: Head Cheese”!

The delay combined with the broken scores display confirmed that the middle chunk is meaningful. This would make sense if the assumption about the first 4 bytes being a length indicator is correct. The game’s score file parser is likely:

Parsing the record label/level indicator using the string length byte
Reading in the next 4 bytes, and parsing that as the $length of the rest of the record
Reading in $length bytes into a buffer.
Parsing that buffer into the internal record representation.

With a corrupted length value, step 3 is now forcing Dusk to read the entire remainder of the file into the buffer, and this then fails to parse in step 4. Error handling then resolves the issue, but leaves the internal state with default values, which the UI renderer then does its best to display.

Happy that the length record’s use had been proven, I then decided to reset those 4 bytes back to their original values. I left the final 5 bytes overwritten with random values, and loaded up the level select screen once more:

After restoring the 4 bytes that made up the record length value, the majority of the stats then rendered correctly. Image © New Blood Interactive

This almost fixed the scores display, and fixed the short pause on load. The only remaining issues, as shown in the screenshot, were:

The time value continued to show as N/A.
The “ninja” award I had earned was no longer displayed.

This indicated that there was still something important in the remaining 5 bytes. The most likely candidate for the source of the problems in my mind was the first of those - the value that was always 0xFF, which seemed like it could be a separator. Sure enough, restoring this value fixed the score display:

Fixing the first 5 bytes, i.e. the length value and the fixed separator, fixed the display of scores. Image © New Blood Interactive

I could not see why having a separator was necessary, but I could at least accept that as its purpose. Mystery still surrounded the final 4 bytes - the potential easter egg timestamps that were tied to the data type.

If these were data type indicators, corrupting them should have broken the time display, but this was not the case. I wondered if this was perhaps some error handling in the code, that would fallback to a sensible default if the type value was invalid. In order to test this, I tried overwriting the 4 type bytes with a copy of those 4 bytes from a record of a different data type. Again, there were no errors.

I also tried setting all 4 bytes to nulls, and all 4 bytes to 0xFF, with no effect. I then tried the same set of tests on the name string record too, again with no effect.

I was at a dead end - these 4 bytes were seemingly completely ignored by the game!

I’m Just a Poor Boy, There is no Soap for Me

If you’ve made it this far, you’ll realise that the investigation into soap had long since failed, replaced instead by curiosity over the format of the high scores file used by the game.

I’m quite pleased with the progress I made into deciphering the Dusk scores file format. There are certainly some unanswered questions as to why some parts of the record data existed⁶. Perhaps the file format has organically grown over time, as production software likes to do, and certain structural parts had to be maintained for backwards compatibility (or weren’t worth development time to remove!).

What I unfortunately had to admit defeat on was the quest to find where I had missed soap pickups. I had no more ideas on where to look for stored progress, and the only outstanding unexplained data in the scores file never changed on a per-level basis, which would only make sense had I not been able to find the soap on any level (or if I had already found all the soaps).

I assume that the soap achievement progress is actually tracked by the Steamworks API - potentially using the documented INT statistics type as a bitfield. I don’t feel like trying to mess with the Steamworks API - that could be quite expensive if I get locked out of my Steam library.

I think if I want to clean up my achievements list, I’m going to have to go back and find the soap by hand. When a game is this good, it shouldn’t be too much of a struggle.

Update 2020/08/11

Just a quick follow-up to say two things:

I’ve now uploaded the python script and all of the git history to demonstrate how the investigation developed, if anyone’s interested or wants to see the breakdown against their own scores file. Check it out on GitHub.
I asked David Szymanski about the achievement progress and the potential easter egg dates on Twitter, and I got some answers! Sadly, it turns out that the date was just a coincidence, and the soap progress is only stored on the Steamworks API. Oh well!

Thankfully, the soap’s elusivity and difficulty to use in comparison to a dedicated weapon means it is far from game breaking, and instead can make for a fun self-inflicted challenge. ↩︎
The Battlezone remaster is an example of the opposite situation, where I love the game, but the final achievement is both secret (for no good reason) and incredibly difficult. ↩︎
Or at least they seem clear in hindsight. Perhaps this would have been easier to write as I worked! ↩︎
The avid reader and/or git history inspector may notice that the timeline of early discoveries here has been skewed slightly in the interest of narrative structure and as a result of poor note taking during the exciting first steps… As it turns out, writing as you go is as useful for random video game experiments as it is for penetration testing! ↩︎
Usually with a bias towards values that represent edge cases and invalid inputs, in order to trigger bugs. ↩︎
I did consider while investigating the scores file whether the file format was some standard for serialisation provided by either Unity or by the .NET runtime. If the scores file were based upon a standard file format, then I could explain the seemingly useless information as optional features that Dusk didn’t need, but weren’t easy to remove. The only such default serialisation I could find was this spec provided by Microsoft for C# serialisation - but this seemed to enforce the presence of a header segment with version information, which the scores files lacked. ↩︎