primepwn write-up (34C3 CTF)
We are given an x86_64
ELF binary and remote server address. The goal is to gain remote execution and read the flag.
Reverse engineering
Let’s open the binary with your favourite disassembler and see what is going on. I’ll be using IDA Pro.
Function main
is rather simple: it calls the function test_prime
, then returns.
Function test_prime
shows more complicated logic:
__int64 test_prime()
{
unsigned int i; // [rsp+Ch] [rbp-14h]
size_t v2; // [rsp+18h] [rbp-8h]
if ( mmap((void *)0x1337000, 0x1000uLL, 7, 50, -1, 0LL) != (void *)0x1337000 )
{
perror("error on mmap");
exit(1);
}
v2 = fread((void *)0x1337000, 1uLL, 4096uLL, _bss_start);
for ( i = 0; (signed int)i < v2; ++i )
{
if ( !(unsigned __int8)is_prime(*(_BYTE *)((signed int)i + 0x1337000LL)) )
{
printf("Byte %d (value: %u) is not prime.\n", i, *(unsigned __int8 *)((signed int)i + 0x1337000LL));
exit(0);
}
}
puts("All bytes are prime!");
return jump_to((__int64 (*)(void))0x1337000);
}
Basically, it allocates 0x1000
RWX bytes at address 0x1337000
, reads 0x1000
bytes from stdin
to that buffer, checks something and jumps directly to the buffer’s beginning.
From the line puts("All bytes are prime!");
we can assume that it checks that all bytes are prime. Let’s take a look at the function is_prime
:
signed __int64 __fastcall is_prime(unsigned __int8 a1)
{
signed int i; // [rsp+10h] [rbp-4h]
if ( a1 <= 1u )
return 0LL;
for ( i = 2; i <= 255 && a1 > i; ++i )
{
if ( !(a1 % i) )
return 0LL;
}
return 1LL;
}
It does indeed check that the passed argument (uint8_t
) is prime.
In conclusion, the binary is pretty simple: it executes the passed shellcode at the known address but all bytes should be prime numbers.
Creating shellcode
Prime bytes definitely add a challenging part to the task. Since we have RWX memory at known address, the good approach will be to try to make two-stage shellcode: the first stage will be “prime” and will write a normal shellcode (second stage) to known address and then run it. That way we only need to make a write-what-where primitive in prime bytes.
Let’s take a look at the x86_64
opcode table to see what instructions we can use. I used this table and an assembler to check my assumptions. After that, the following table was born:
2 add r8, r/m8
3 add r, r/m
5 add eax, imm32 49 05 add rax, imm32
7 -
b or r, r/m
d or eax, imm32 49 0d or rax, imm32
11 adc r/m, r
13 adc r, r/m
17 -
1d sbb eax, imm32 49 1d sbb rax, imm32
1f -
25 and eax, imm32 49 25 and rax, imm32
29 sub r/m, r
2b sub r, r/m
2f -
35 xor eax, imm32 49 35 xor rax, imm32
3b cmp
3d cmp
43 REX.XB
47 REX.RXB
49 REX.WB
4f REX.WRXB
53 push rbx
59 pop rcx
61 -
65 GS
67 addr32 override
6b imul r, r/m, imm8
6d -
71 jno
7f jg
83 add/or/adc/sbb/and/sub/xor/cmp r/m, imm8
89 mov r/m, r
8b mov r, r/m
95 xchg eax, ebp 49 95 xchg r13,rax
97 xchg eax, edi 49 97 xchg r15,rax
9d popf
a3 movabs off, rAX
a7 cmps
ad lods eax, [rsi]
b3 mov bl,imm8
b5 mov ch,imm8
bf mov edi, imm32
c1 rol/ror/rcl/rcr/shl/shr/sal/sar r/m, imm8
c5 -
c7 mov r/m, imm32
d3 rol/ror/rcl/rcr/shl/shr/sal/sar r/m, CL
df
e3 jecxz
e5 -
e9 jmp
ef -
f1 -
fb -
First of all, we see useful operations like add eax, imm32
, and eax, imm32
, xor eax, imm32
, etc. Using and
we can zero the EAX
in two ops: and eax, 0x02020202; and eax, 0x05050505
. After that there are several ways to get any number in EAX
. I proceeded with opcode xor
because it occured that you can take 3 prime bytes, xor them and get any byte in range [0-255]
.
Okay, we can get any number in EAX
, but what registers can we move that number to? We have no mov r, r
opcode but we have xchg eax, ebp
and xchg eax, edi
.
Note: we also have
mov edi, imm32
opcode but it limits us to prime-bytes values ofEDI
.
Now we can set EAX
, EBP
and EDI
to arbitrary values. What if we already can do something like mov r/m, r
? We have this opcode in our list, 89
. After a bit of poking it occures that mov dword ptr [rdi], eax
assembles to 89 07
that are prime bytes. Looks like we have our desired write-what-where primitive.
The scheme is:
- set addr in
EAX
- move
EAX
toEDI
- set value in
EAX
- move
EAX
to[RDI]
Note: we have a lot of “set number in
EAX
” operations, each one takes 2 ANDs to zero theEAX
and then 3 XORs to make a number. To get rid of ANDs we trackEAX
value through the process of building the shellcode, then we can XOREAX
with any number, so we XOR it withN ^ EAX
to getN
inEAX
.
From that point we simply take some shellcode, for example, shellcraft.sh()
from pwntools
, write it DWORD by DWORD to our memory after the first stage, and pass the control. Since we do not have a NOP, we can use some other harmless one-byte instruction to fill the rest of memory like xchg edi, eax
(97
). Victory!
How NOT to create the shellcode
Here I’ll explain the more complicated solution I came up with on the CTF.
After finding a way to set up arbitrary values in EAX
, EDI
and EBP
I just started to search randomly for other useful unstructions to proceed with and that’s how I found the mov dword ptr [rdi], ebx
. Somehow I didn’t try to change EBX
to EAX
but proceeded with EBX
version. But it gives us a problem: we don’t have the control over the EBX
register… or do we?
We have mov dword ptr [rdi], ebx
, and we have mov r/m, r
and mov r, r/m
instructions in general, that means, we have mov ebx, dword ptr [rdi]
. We also have add r, r/m
. These two instructions give us the following scheme: search for different numbers in the binary itself (since we know base address), place one in the EBX
, then add other numbers until we get the needed one. For these purposes I take first 0xA70
bytes from binary. There we can find numbers like that: ['0x0', '0x1', '0x2', '0x3', '0x4', '0x5', ..., '0xe28', '0x1b90', '0x441f', '0x8be8', '0xb807', '0xb81b', '0xc308', '0xc3f3', '0xfffc', '0x10001', '0x10102', '0x1be00', '0x20000', ..., '0xf66c35d', '0xf66f4ff', '0x1000be00', '0x10070190', '0x100e4100', '0x100e4200', '0x100e4218', '0x10615567', '0x14ff41ff', '0x19e8c789', '0x1f0fc35d', '0x2008f315', '0x2009ce05', '0x20297525', ..., '0xffffffb0', '0xffffffc0', '0xffffffd0', '0xffffffe0']
. There are numbers almost similar to powers of 2, we need only ~log(N) additions to produce number N with them.
The other problem is how to efficiently find the optimal set of numbers whose sum gives the needed one. I believe that this problem somehow relates to subset sum problem and so is NP-complete. I use a greedy approach that takes maximum number that does not exceeds our goal, then keeps adding the largest possible numbers until we get the needed one. One step of algorithm is to take a number from priority queue, try to add each number from the binary, store sums that are less then the goal in priority queue. It seems that this algorithm gives a suboptimal result in constant time (constant since we have the same set of numbers and I make constant number of steps, 1000). For example, for the number 0xCAFEBABE
we get the following set of numbers to sum up: (3351726080, 50087367, 3670080, 131074, 65794, 7056, 3624, 504, 3)
.
To recap, we use the following algorithm:
- Set up address of zero in
EDI
- Move
[RDI]
toEBX
- Set up address of the addition term in
EDI
- Add
[RDI]
toEBX
- Repeat steps 3 and 4 if necessary
- Profit
The write-what-where then looks like this:
- Set up value in
EBX
- Set up address in
EDI
- Move
EBX
to[RDI]
Now we can use this primitive as described in previous part “Creating shellcode”. The source code can be found here.
That’s all!