refrain was a misc task at 0CTF/TCTF Quals 2019 where you had to reconstruct input data from a program trace.

A single file was given:

$ ls -alh perf.data
-rw-r--r-- 1 wgh wgh 17M Mar 23 09:19 perf.data

Also, the description said “Ubunu16.04+latex” without explanation what it really means and why it matters.

First look

The file name perf.data suggested that it’s some performance data captured by perf tool.

$ /usr/lib/linux-tools/4.8.0-54-generic/perf script --header -f
========
# captured on: Sun Mar 24 10:01:15 2019
# hostname : nuc
# os release : 4.10.0-42-generic
# perf version : 4.10.17
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
# cpuid : GenuineIntel,6,142,9
# total memory : 32835380 kB
# cmdline : /usr/lib/linux-hwe-tools-4.10.0-42/perf record -e intel_pt// convert -font Courier text:- image.png
# event : name = intel_pt//, , id = { 13, 14, 15, 16 }, type = 7, size = 112, config = 0x300c600, { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID, disabled = 1, inherit = 1, enable_on_exec = 1
, sample_id_all = 1, exclude_guest = 1
# event : name = dummy:u, , id = { 17, 18, 19, 20 }, type = 1, size = 112, config = 0x9, { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID, inherit = 1, exclude_kernel = 1, exclude_hv = 1, sampl
e_id_all = 1, context_switch = 1
# event : name = dummy:u, , id = { 21, 22, 23, 24 }, type = 1, size = 112, config = 0x9, { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, exclude
_hv = 1, mmap = 1, comm = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, intel_pt = 7, intel_bts = 6, breakpoint = 5, software = 1, tracepoint = 2, msr = 8
# contains AUX area data (e.g. instruction trace)
# HEADER_CACHE info available, use -I to display

There’re two important pieces of information:

  • What kind of performance events are captured, namely, so called intel_pt.
  • The command line of program being profiled: convert -font Courier text:- image.png.

Intel PT (Process Trace) is an interesting processor extension that allows one to capture performance events such as control flow branches in highly compressed format. Basically, entire trace of program execution, from dynamic loader up to process exit is recorded in a single 17MB file. That’s quite astounding. You can read more information about it in Linux kernel documentation.

The convert program is part of ImageMagick suite, and the commandline instructs it to render a string supplied in standard input in Courier font and save it as png file. Although we couldn’t be 100% sure, the input had to be the flag.

Reading the trace

You can easily get the trace in text format by running the perf script command. The trace will only include branches (conditional and unconditional jumps, calls and returns). Notably, it doesn’t include register values and memory contents. Mere 17MB file expands into 33 millions lines of text, weighing 8GB.

Installing debug symbols helps, as perf is able to read them and display function symbols. In the end, ImageMagick and libfreetype debug symbols were the most helpful.

For example, here’s the part of the trace that corresponds to ImageMagick loop reading input char by char (source):

7fd5dea220cb ReadBlobString+0xeb (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea209f0 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20a5e ReadBlob+0x6e (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea20a88 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20a8f ReadBlob+0x9f (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea20aab ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20ad5 ReadBlob+0xe5 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea0bfd0 fileno@plt (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea0bfd0 fileno@plt+0x0 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5de13fad0 __GI___fileno (/lib/x86_64-linux-gnu/libc-2.23.so)
7fd5de13fadf __GI___fileno+0xf (/lib/x86_64-linux-gnu/libc-2.23.so) =>     7fd5dea20ada ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20ae2 ReadBlob+0xf2 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea0c4a0 read@plt (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea0c4a0 read@plt+0x0 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5de4a44f0 __GI___libc_read (/lib/x86_64-linux-gnu/libpthread-2.23.so)
7fd5de4a44fe __read_nocancel+0x5 (/lib/x86_64-linux-gnu/libpthread-2.23.so) => ffffffffa9ad6690 [unknown] ([kernel.kallsyms])
           0 [unknown] ([unknown]) => ffffffffa9ad6690 [unknown] ([kernel.kallsyms])
           0 [unknown] ([unknown]) => ffffffffa9ad6690 [unknown] ([kernel.kallsyms])
           0 [unknown] ([unknown]) => ffffffffa9ad6690 [unknown] ([kernel.kallsyms])
           0 [unknown] ([unknown]) => ffffffffa9ad6690 [unknown] ([kernel.kallsyms])
           0 [unknown] ([unknown]) =>     7fd5de4a4500 __read_nocancel (/lib/x86_64-linux-gnu/libpthread-2.23.so)
7fd5de4a4508 __read_nocancel+0xf (/lib/x86_64-linux-gnu/libpthread-2.23.so) =>     7fd5dea20ae7 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20aea ReadBlob+0xfa (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea20a98 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20a9e ReadBlob+0xae (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea20c60 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20c63 ReadBlob+0x273 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea20a72 ReadBlob (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)
7fd5dea20a80 ReadBlob+0x90 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dea220d0 ReadBlobString (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)

The fact that the trace included so much data made things somewhat complicated. We just didn’t know where to look. Potential interesting places included:

  • ImageMagick reading data from standard input.
  • libfreetype rendering symbol as a bitmap.
  • ImageMagick rasterizing the glyph it got from libfreetype (src).
  • ImageMagick writing to a PNG image.

The trace fragment corresponding to the input wasn’t very useful. Indeed, the code didn’t check character values, so no branches depended on them, and the trace fragments were the same for every character, but we could infer the length of the flag by counting calls to ReadBlob:

$ zcat perf_script.txt.gz | grep -F '=>     7fd5dea209f0 ReadBlob' | wc -l
40

This is larger than actual length of the flag by 2 because of one extra call resulting in EOF, and one stray call further on. We captured traces of echo -n 'flag{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA}' | perf record -e intel_pt// convert -font Courier text:- image.png with varying A count to confirm that our hypothesis was indeed correct.

flag_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-1

(the image is cropped, the full size is A4-ish)

Our next idea was that the trace contained some functions that take codepoint/glyph as an argument, and there was a 1-to-1 relationship between the codepoint and the trace of the function. So we could create artificially constructed fingerprints of all possible characters (by calling convert under perf ourselves with different inputs), and match them with the trace of the real flag.

The idea proved to be correct, but we got stuck for several hours. We tried to read the source code, match it to the trace, find some useful conditional branches, and so on.

After a while, we noticed these interesting trace fragments:

7fd5dea0d420 FT_Get_Char_Index@plt+0x0 (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0) =>     7fd5dd382ea0 FT_Get_Char_Index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd382ebc FT_Get_Char_Index+0x1c (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3e8b30 t1_cmap_unicode_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3e8b3e t1_cmap_unicode_char_index+0xe (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecaf0 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb18 ps_unicodes_char_index+0x28 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb46 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb27 ps_unicodes_char_index+0x37 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb57 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb5e ps_unicodes_char_index+0x6e (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb32 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb27 ps_unicodes_char_index+0x37 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb57 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb5e ps_unicodes_char_index+0x6e (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb32 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb27 ps_unicodes_char_index+0x37 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb57 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb5e ps_unicodes_char_index+0x6e (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb32 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb27 ps_unicodes_char_index+0x37 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb57 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb5e ps_unicodes_char_index+0x6e (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb32 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb4d ps_unicodes_char_index+0x5d (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb20 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb44 ps_unicodes_char_index+0x54 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd3ecb70 ps_unicodes_char_index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd3ecb77 ps_unicodes_char_index+0x87 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd382ebf FT_Get_Char_Index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd382ec2 FT_Get_Char_Index+0x22 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dd382ec6 FT_Get_Char_Index (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1)
7fd5dd382ec7 FT_Get_Char_Index+0x27 (/usr/lib/x86_64-linux-gnu/libfreetype.so.6.12.1) =>     7fd5dea1882e RenderFreetype (/usr/lib/x86_64-linux-gnu/libMagickCore-6.Q16.so.2.0.0)

The function ps_unicodes_char_index performs a binary search in the character table, and it takes code point as an argument. Looks like what we need.

Trace of a binary search isn’t terribly complicated, so, in theory, it might have been possible to infer values without constructing fingerprints of all possible arguments. However, the fingerprint approach is arguably easier, and we had already written a script extracting fingerprints of function calls, so we decided to follow the fingerprint way.

As a fingerprint, we used just a hash of execution trace in text format. We obviously needed to de-ASLR the addresses to make them reproducible.

A bit of retrospective why we took so long to get there. One mistake was that we didn’t install libfreetype symbols as soon as possible, and concentrated efforts on ImageMagick code, skimming over symbol-less libfreetype code. We didn’t notice at first that the trace somehow included intructions from libgomp executed by other processors, which sometimes were interleaved with instructions being fingerprinted. At some point, we were accidentally analyzing the trace of the real flag instead of flag{AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA}, and this was probably why we didn’t notice the patterns in ps_unicodes_char_index function calls and continued looking elsewhere.

Reconstructing the flag

We decided to build a dictionary by running convert under perf, giving it different letters, and collecting fingerprints to use them for matching later. As we were still unsure whether we were picking trace fragments right, we took somewhat strange approach - we generated the string abbcccddddeeeee... - a string with 1 a, 2 bs etc., each next symbol repeating one more time than previous one. Character set included english letters (both upper and lower case), numbers and some punctuation with total 94 characters which means that last character (tilde) repeated 94 times.

import string

alph = string.letters+string.digits+string.punctuation

res = ''
for i, c in enumerate(alph):
    res += c * (i + 1)
print(res)

We than fed this string to convert:

# python gens.py | perf record -e intel_pt// convert -font Courier text:- image.png

And then we ran perf script on resulting file perf.data

# /usr/lib/linux-tools/4.8.0-54-generic/perf  script | gzip > /mnt/manyletters.txt.gz

This led to rather large trace - manyletters.txt.gz took 1.5G in gzipped form, without compression it would be a lot larger. We stored most traces produced by perf script in gzipped form due to their large size.

We then ran fingerprint extraction on that trace:

# zcat /mnt/manyletters.txt.gz |python3 extract_trace.py | sort | uniq -c | sort -h
... a lot of warnings from perf skipped here ...
      1 28 65988bed4fd669483c4f7deea3f4d73a23e190fd828f1dc09bffe33da05bdd75
      1 29 d9c0739d7a51376d25373a42d2f86f4a809bf578f772fd853eac2c530980e259
      1 61 f85967910083f6d050cebeea7990c344edca856e7238d1428c09da14eef7c7dc
      1 7fab6f49da68
      1 =>     7fab6f49da6d
      1 libMagickCore base: 0x00007fab6f45b000
      1 libfreetype6_base base: 0x00007fab6ddf5000
      4 24 ce8ad6caa1ab6c244852df9f1c362eaf048a627d65c9a36e05b667e882c79c63
      8 25 8c0ed9a104fc92776592e7270298bdb8461d549017fbe55a215de8bbac53f29e
     12 19 6d0e75661cbeccedbbbaae7e00b2ee47895a29fec91047a4928dd8affd2a1828
     16 26 895b906331c514c8ca735be3e81279b42541ed4389d31e8e8c3d05648434a629
     20 27 382114e0f6b570b38b5791962e275036a3348368862158a7bc4be165a1a66965
     24 23 09313b4ec7dbfbff501b6b5de74ca7290ffbd051dcd221c8346f04bd9460b82a
     28 27 abb61fedfe606159064ca3ad37d66cb0cb4a2fa8b4660844ae3dab152347d71b
...
    368 25 1d64a9e980fb425b5c557a5066e720cc295c1702747def102af1c31e856ab34c
    372 22 05fd9986c890dc62828735add3c94f3555ceddf48724c95434e235a80892e6b5
    376 23 bd0939942ec1741048cea9cdd2f3e1d7333c98e0a2bc8156b202449b9ec69f42

Except for a couple of outliers, we got a pretty good picture here: for each n from 1 to 94 we have a fingerprint repeating n * 4 times. This provided a rather convincing proof of our assumptions and we were now sure we could build a dictionary by mapping a fingerprint to a character with index = number of repetitions divided by 4. We still don’t know why exactly the function was called 4 times for each character, not just 1. It was very unlikely that something other than our test characters generated exactly n * c fingerprints, though.

We tested our dictionary by generating a trace of convert rendering a known string, and checking that the string is properly reconstructed.

It should be noted that before getting a working dictionary we got one that worked with our own test strings but did not work with trace from the task. By that point we did not use “latex” part of task description in any way. We had a hypothesis that installation of latex could change character rendering in some way, so we installed latex. We installed texlive-full to be sure we don’t miss something which turned out to occupy ~3.5G disk space. After installing it we noticed that rendered text indeed stated to look a bit different and our fingerprints also changed.

After that the dictionary worked correctly with perf.data from the challenge and we finally got a flag.

/usr/lib/linux-tools/4.8.0-54-generic/perf  script -f | gzip > orig.gz # as noted earlier, we stored outputs of perf script in gzip due to their size

zcat orig.gz | python3 extract_trace.py | cut -d ' ' -f 2 | python2 decodeinput.py 
*****flag{1df9e1d99ff7ea50bbe782492430b223}flag{1df9e1d99ff7ea50bbe782492430b223}flag{1df9e1d99ff7ea50bbe782492430b223}flag{1df9e1d99ff7ea50bbe782492430b223}

flag_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Files