slot machine write-up (Google CTF 2017 Finals)

“slot machine” was a hardware task in the reverse-engineering category on Google CTF Finals 2017, which took place in Zurich back in October 2017.

All teams got the same ATtiny-based slot machine game, which contained 2 stub flags. The objective was to reverse-engineer it by disassembling it (in literal real-life sense) and dumping the firmware, figure out how flags can be obtained without access to the circuit, and do that with the machine at the organizers’ table, which contained the real flags.

You get:

Slot machine game
Printed schematics with some hints and instructions (instructions, schematics)
Screwdriver
Small prototyping breadboard
USBasp programmer
Several jumper cables

The set of provided tools suggests several things.

USBasp programmer means that we’re probably going to deal with AVR family microcontroller. For instance, popular Arduino Uno is based on AVR microcontroller.

The programmer has the ability to download the firmware (unless it was read-protected through so-called fuse bits), so that’s probably exactly what we’re going to do.

Let’s inspect the game first.

overview

closeup

The slot machine has an LCD screen, like those commonly used in various DIY Arduino projects, and can be found cheaply on AliExpress.

Out of 10 buttons, only 2 from the upper row appear functional and, according to the schematics, the lower row is not connected at all. Pressing the leftmost button cyclically changes the current bet from 1 to 5. Pressing the second button plays the game. At the start of the game, you have 100 credits.

The game works like a typical slot machine. At the center of the screen, there are three reels, which display random symbols each time game is played: nothing, “bar”, lemon, and “7”. Having three matching symbols in a row will give you some credits based on your bet.

However, at that point, the conditions to get the flags were not yet known.

So, without much hesitation, I disassembled the slot machine. What was inside is a printed circuit board with ATtiny88 microcontroller.

pcb

ATtiny is one of the less powerful microcontrollers in AVR family. Although all AVR microcontrollers share the same basic instruction set (there are some instruction extensions, though), they have different set of integrated peripherals, like timers, IO ports, SPI interfaces, and so on.

The programmer uses simple protocol called ISP (in-system programming) over Serial Peripheral Interface to talk to the microcontroller.

In order to figure out how to connect the programmer to the MCU, we need to consult with the datasheet.

pinout_colored

MISO (master in, slave out), MOSI (master out, slave in), SCK (serial clock), RST (reset), VCC (+5 volts) and GND (ground) must be connected to the corresponding ouputs of the programmer.

Since these outputs are not exposed on the PCB, microcontroller IC had to be removed from its socket and inserted into the breadboard.

breadboard-1

Once it’s done, it’s easy to dump the firmware using avrdude program. It wasn’t read-protected after all, so there were no problems dumping it.

avrdude -c usbasp -p t88 -U flash:r:slot_machine.bin:r

You can download the file here.

Reverse-engineering

Processor selection

AVR processors are relatively well-supported by IDA disassembler. Although there’s no Hexrays decompiler support, disassembler works mostly fine, but there are still some annoying problems.

One the of the major obstacles is CPU selection dialog.

ida_avr_device_name

Although all processors in the AVR family have the same basic instruction set, as it has been mentioned above, they have different set of integrated peripherals and amount of available memory.

Selecting device with mismatching memory size will result in either truncated code section or buggy data section references (or both).

The set of peripherals affects two twings: IO registers and interrupt vectors.

The registers tend to occupy the same location across all AVR family. That is, locations corresponding to missing peripheral will be described as “reserved” in the datasheet, and remaining registers will be “fragmented” instead of “compacted”.

However, interrupt vector layouts are very different. There are no unused vector numbers (which is likely caused by the fact that it would waste precious flash memory otherwise). For example, compare interrupt vectors on ATtiny88 and ATmega328p.

It wouldn’t be much of a problem if IDA actually had definitions for all the AVR processors. Unfortunately, as you can see on the screenshot, it isn’t the case. So the best option seems to select the best matching processor (which one is the best matching is also not obvious), and fix incorrect things (like interrupt names) by hand, consulting with the processor reference manual.

.data section

Another problem lies in how .data section is initialized.

To understand the problem, let’s first discuss C and compilers in general.

.data section is where initialized data in your program goes to. In C, that would be all non-zero initialized variables with static lifetime (i.e. global variables and static variables inside functions). There’s also .rodata section for things like string constants (AFAIK there’s no memory protection in AVR, so there’s no dedicated .rodata either).

How do these variables get from the executable file to the RAM?

On typical desktop Linux, this is handled by the ELF parser with the aid of MMU: ELF parser maps the corresponding regions of the ELF file with copy-on-write access rights. This operation is transparent, and the description of the .data region - where it lies in the binary and where it should be mapped - can be easily parsed from the ELF file.

There comes a little caveat. There’s usually no ELF parser in embedded devices. And no MMU, either. There’s just a blob of code in the flash storage, where the only thing you know for certain is entry point address.

How is .data section handled in this case?

In the avr-libc (which is a commonly used C library for AVR), there’s a small piece of code that’s executed right after reset, just before calling user’s main function.

What this code does is it copies initial values for .data section from the persistent flash memory to RAM. These initial values are stored simply appended at the end of .text (code) section.

Let’s take a look at the assembly code.

For the people unfamiliar with AVR assembly, a couple of points:

Data and code space are separate.
AVR executes code from flash memory directly. Instructions are word-addressable, 1 word = 2 octets (16 bits total).
AVR cannot execute code from RAM at all. It’s a Harvard architecture processor (purists will still find a couple of places where it deviates from pure Harvard architecture, though).
Flash memory can be read-accessed using special lpm instruction. Besides .data initialization, it’s often for string constants and various tables, because there’s much more flash memory available than RAM on (probably) all AVR devices. There are even special versions of typical libc functions that operate on data located in the program memory instead of RAM: pgmspace.h, snprintf_P. Unlike program counter, lpm instruction is byte-addressable.
Flash memory can also be programmed by the MCU itself using spm instructions. It’s often used by bootloaders, like the one flashed on Arduino Uno that programs sketches received via serial interface from the USB chip. It’s completely irrelevant for our discussion here, but I think it’s worth mentioning.
Since all registers are 8 bit in size, but memory address width is 16-bit, register pairs are used to access arbitrary memory addresses. They are X (r26-r27), Y (r28-r29) and Z (r30-r31).

ROM:034D __do_copy_data:
ROM:034D                 ldi     r17, 1          ; Load Immediate
ROM:034E                 ldi     r26, 0          ; X = 0x0100 (RAM:0100)
ROM:034F                 ldi     r27, 1          ; Load Immediate
ROM:0350                 ldi     r30, 0xA6       ; Z = 0x19A6 (ROM:0CD3)
ROM:0351                 ldi     r31, 0x19       ; Load Immediate
ROM:0352                 rjmp    loc_355         ; Relative Jump
ROM:0353 ; ---------------------------------------------------------------------------
ROM:0353
ROM:0353 loc_353:                                ; CODE XREF: __RESET+10↓j
ROM:0353                 lpm     r0, Z+          ; Load Program Memory
ROM:0354                 st      X+, r0          ; Store Indirect
ROM:0355
ROM:0355 loc_355:                                ; CODE XREF: __RESET+B↑j
ROM:0355                 cpi     r26, 0xDE       ; Compare with Immediate
ROM:0356                 cpc     r27, r17        ; RAM:01DE
ROM:0357                 brne    loc_353         ; Branch if Not Equal

In this fragment, data is copied from Z = 0x19A6 (byte offset for the program memory) to X = 0x0100 (byte offset for RAM) until X becomes 0x01DE.

In order to see the data in IDA, you’ll need to copy the data manually. The following IDA Python script will do that for you:

DATA_START = 0x0100
DATA_LENGTH = 0x01DE - DATA_START
DATA_PROGMEM_BYTE_OFFSET = 0x19A6

IDA_VIRTUAL_RAM_BEGIN = 0x100000

assert DATA_PROGMEM_BYTE_OFFSET % 2 == 0, "fix this script a bit"

for i in xrange(0, DATA_LENGTH, 2):
    # in IDA, AVR ROM memory has 16-bit words, so
    # 1) Byte() returns 16-bit values
    # 2) byte offset must be divided by two to get 16-bit word offset
    val = Byte((DATA_PROGMEM_BYTE_OFFSET + i)//2)
    
    val0 = val & 0xFF
    val1 = val >> 8

    PatchByte(IDA_VIRTUAL_RAM_BEGIN + i + DATA_START, val0)
    PatchByte(IDA_VIRTUAL_RAM_BEGIN + i + 1 + DATA_START, val1)

Actual reverse-engineering

Now that we’re set, let’s finally dive into the code.

Of course, I’m unable to reproduce the exact sequence of actions I did to figure out the program flow. Neither would it be practical due to rather chaotic nature of reverse-engineering process.

However, one of the first things I noticed certainly was the flag strings:

RAM:01A7	{FLAG_II}
RAM:01B1	{FLAG_I}

We can start from there.

Here comes another problem with IDA and AVR assembly. AVR is 8-bit processor having (up to) 16-bit addresses. So when constant address is passed to the function, it’s stored in register pair using two load instructions. IDA can’t recognize this pattern, and cross-references for data don’t work as is.

I had to resort to searching for individual bytes of the addresses using “Search -> immediate value…”.

There were two fragments of the code mentioning addresses of the two flags:

ROM:0A32                 ldi     r24, 0xB1       ; Load Immediate
ROM:0A33                 ldi     r25, 1          ; Load Immediate
ROM:0A34                 rjmp    loc_BB4         ; Relative Jump

They both lead to function that display “Congratulations!” and the flag on the screen.

The flag addresses don’t seem to appear anywhere else, so we’re likely looking in the right place.

Tracing the code graph upwards, I found the following code:

ROM:0A14                 lds     r24, unk_1001DE ; Load Direct
ROM:0A16                 cpi     r24, 9          ; Compare with Immediate
ROM:0A17                 brne    dispatch_start  ; Branch if Not Equal
ROM:0A18                 sts     unk_1001DE, r1 ; Store Direct to SRAM
ROM:0A1A                 movw    r30, r2         ; Copy Register Word
ROM:0A1B                 ldi     r22, 0x4C ; 'L' ; Load Immediate
ROM:0A1C                 ldi     r23, 1          ; RAM:014C
ROM:0A1D                 ldi     r18, 0x43 ; 'C' ; Load Immediate
ROM:0A1E                 ldi     r19, 1          ; RAM:0143
ROM:0A1F                 ldi     r24, 1          ; Load Immediate
ROM:0A20                 ldi     r25, 1          ; Load Immediate
ROM:0A21
ROM:0A21 loc_A21:                                ; CODE XREF: main+201↓j
ROM:0A21                 ld      r20, Z+         ; Load Indirect
ROM:0A22                 movw    r26, r22        ; Copy Register Word
ROM:0A23                 ld      r21, X+         ; Load Indirect
ROM:0A24                 movw    r22, r26        ; Copy Register Word
ROM:0A25                 cpse    r20, r21        ; Compare: Skip if Equal
ROM:0A26                 ldi     r25, 0          ; Load Immediate
ROM:0A27                 movw    r26, r18        ; Copy Register Word
ROM:0A28                 ld      r21, X+         ; Load Indirect
ROM:0A29                 movw    r18, r26        ; Copy Register Word
ROM:0A2A                 cpse    r20, r21        ; Compare: Skip if Equal
ROM:0A2B                 ldi     r24, 0          ; Load Immediate
ROM:0A2C                 cp      r30, r4         ; Compare
ROM:0A2D                 cpc     r31, r5         ; Compare with Carry
ROM:0A2E                 brne    loc_A21         ; Branch if Not Equal
ROM:0A2F                 tst     r25             ; Test for Zero or Minus
ROM:0A30                 brne    flag1           ; Branch if Not Equal
ROM:0A31                 rjmp    loc_BE4         ; Relative Jump
ROM:0A32 ; ---------------------------------------------------------------------------
ROM:0A32
ROM:0A32 flag1:                                  ; CODE XREF: main+203↑j
ROM:0A32                 ldi     r24, 0xB1       ; Load Immediate
ROM:0A33                 ldi     r25, 1          ; arg1 = RAM:01B1
ROM:0A34                 rjmp    loc_BB4         ; Relative Jump

unk_1001DE is incremented each time any button is pressed. So every time user presses buttons 9 times in any order, check_secret_sequence loop executes.

The loops compares some array that lies on the stack with RAM:014C and RAM:0143. These two static arrays contain [4, 3, 2, 1, 0, 1, 2, 3, 4] and [0, 1, 0, 1, 4, 3, 4, 3, 2] correspondingly. Interesting…

Intuition suggested that it’s some secret key combination, and that array on the stack contains last 9 pressed buttons.

And the intuition was indeed correct, pressing buttons in order (0 being the leftmost button, and so on) specified by the second array gave me the first flag.

Predicting the random generator

The second secret combination is more insteresting.

It opens so-called developer menu that allows user to advance pseudorandom number generator (PRNG) state by almost any arbitrary number.

developer_menu

I bet most of you have heard the story about hackers that predicted the internal state of actual real-life slot machine PRNG to get advantage and win money reliably (though I haven’t actually heard the story until after solving this task :)).

The second part of the task is pretty much the reenactment of that story.

Although the initial state of PRNG seems to be truly random, the PRNG itself is rather weak. It’s just linear congruential generator with 32 bits of state, driven by equation state = state * 0x41C64E6D + 31337.

Even though I couldn’t trace the exact code path that leads to the second flag, intuition once again suggested that in order to obtain the second flag, triple “7” must be rolled.

rand() function is called 9 times per every game, once for each symbol on the screen. Symbols are determined in column-major order, that is, the left reel comes first (from top to bottom), then the middle reel, and finally the right reel.

Each symbol has some fixed probability. It can be expressed as the following pseudo-code:

int next_symbol(void) {
    uint16_t val = rand();
    if (res >= 0xFD70) return SEVEN;
    if (res >= 0xF0A3) return LEMON;
    if (res >= 0xBAE0) return BAR;
    return EMPTY;
}

It leads to the following idea: play a couple of games, take notice of which symbols are rolled, then brute-force the internal state of the generator. Although the brute-force will be probabilistic, the longer the sequence is, the higher chance there is only one such initial state. In practice, playing 5 games is more than enough to reliably find the state value.

With current internal state known, it’s now easy to find the number that must be added to the current state in order to get the state where triple “7” will be rolled immediately.

Brute-force program and most of the PRNG reverse-engineering work was done by my teammate vient.

You can check out the program here.