I was working on reverse engineering a format for a side project (which I may post about later), and I figured I’d write about the basic process here. I was talking to some coworkers about my side project, and I got blank stares once I got to this part.

To demonstrate the technique, I’m going to use a nice, known format that I won’t get in trouble for picking apart. Applying this to closed format is an exercise left for the reader 😉
So let’s take a look at an h.264 encoded MP4.

These are both widely documented, and if you’re familiar with either of these, you’re unlikely to find anything new here. The point is to demonstrate my method of reverse engineering a format.

Let’s start out by assuming we don’t have a specification or any documentation on the format, it’s a black box.

I worked with three video files.
Video A – A short video
Video B – The first half of video A
Video C – A long, unrelated video

I picked these to have something similar to a known plaintext attack. Here we have two videos with the same initial content and different lengths, and another video. This gives us some basic separation between what is affected by file length and what is affected by file content.

OK, now we need to get these files open in some way where we can see the binary representation. I’ve been using Notepad++ with the Hex Editor plugin, but you can use whatever you want.

Typically (but not always) the first few bytes of a file is essentially a signature for the file type, it may also contain an internal type. This is where having two similar files is helpful. Here is the beginning of one of my files:

00 00 00 18 66 74 79 70 6d 70 34 32 00 00 00 00 ....ftypmp42....
6d 70 34 32 6d 70 34 31 00 00 12 02 6d 6f 6f 76 mp42mp41....moov
00 00 00 6c 6d 76 68 64 00 00 00 00 cb f3 0d b9 ...lmvhd....Ëó.¹
cb f3 0d b9 00 01 5f 90 00 09 37 2f 00 01 00 00 Ëó.¹.._...7/....

The left section is the hexadecimal representation, the right is the ASCII encoded representation. Anything displaying as a dot doesn’t have an ASCII representation.

From even this, it’s clear that this is a big-endian 32-bit format. What that means is that data is going to be stored in groups of 4 of these byte-pairs (or eight bytes if your editor doesn’t use pairs). We know it’s big-endian since it’s stored as “mp42” and not “24pm”.

You can also see that the string “mp42” is there twice, and “mp41” is there once. This looks like a signature (or header) to me.

Here is the data from movie B’s first 64 bytes, with differences highlighted.

00 00 00 18 66 74 79 70 6d 70 34 32 00 00 00 00 
6d 70 34 32 6d 70 34 31 00 00 0c 96 6d 6f 6f 76
00 00 00 6c 6d 76 68 64 00 00 00 00 cb f3 0d df 
cb f3 0d df 00 01 5f 90 00 05 63 98 00 01 00 00

Now we know where the signature ends. Now, remember, the data is stored in 32 bit pieces (let’s call them integers, or ints), so for video A the first data gets broken down like this:

00 00 00 18  ....
66 74 79 70  ftyp
6d 70 34 32  mp42
00 00 00 00  ....
6d 70 34 32  mp42
6d 70 34 31  mp41
00 00 12 02  ....

Based on this, we can assume that the last the last int is _not_ a part of the signature. Assuming we know nothing else about the format, we can assume this signature will tell a decoder or player what format they can expect, and perhaps a fallback – the mp41.

Now onto the next section. Once you start to see any kind of variable data, you’ll want to start by trying to determine the way the data is laid out. Basically, we want to figure out how the computer answers the question of “What is this?” for any particular piece of data.

I’ve found that most (not all) formats have their data stored in a more generic data structure. That way over time they can add things without necessarily breaking older readers/players. Because of that, the first thing I look for to determine the general structure of a file is pointers, offsets, and lengths.

  • Pointer – An address referring to a specific place in a file
  • Offset – Similar to a pointer, but relative to the current location in the file
  • Length – Opposite of an offset, it’s the offset minus current location
My rule of thumb is this:
If it’s relatively small, it’s likely a length
If not, check that address in the file and see if it looks significant, if it does it’s probably a pointer
If it doesn’t, add the current offset and check there, if it looks significant it’s probably an offset
If it doesn’t, this is likely some other variable, unrelated to the structure of the file.
In our example above, the next two ints are:
00 00 12 02   ....
6d 6f 6f 76   moov

Since we’re at the beginning of a file, I’ll check the location 1202. (Again, if this was little-endian, it would be location 02 12 00 00). Here’s what I have around the location 1202 in my file:

000011e0  00 00 00 3a 75 64 74 61 00 00 00 17 a9 54 49 4d  ...:udta....©TIM
000011f0  00 0b 00 00 30 30 3a 30 30 3a 30 30 3a 30 30 00  ....00:00:00:00.
00001200  00 00 0e a9 54 53 43 00 02 00 00 33 30 00 00 00  ...©TSC....30...
00001210  0d a9 54 53 5a 00 01 00 00 31 00 00 1e 13 75 75  .©TSZ....1....uu
00001220  69 64 be 7a cf cb 97 a9 42 e8 9c 71 99 94 91 e3  id¾zÏË-©Bèœq™"`a

It looks like we’re in the middle of a section, and the location is even in the middle of an int, so it seems unlikely this is a pointer. Maybe a length then? Let’s check what’s at 00001202 + 18 = 0000121a:

00001210  0d a9 54 53 5a 00 01 00 00 31 00 00 1e 13 75 75  .©TSZ....1....uu
00001220  69 64 be 7a cf cb 97 a9 42 e8 9c 71 99 94 91 e3  id¾zÏË-©Bèœq™"`a

That seems to make more sense. So 00 00 12 02 is probably a length. Let’s look at the next int: 6d 6f 6f 76 ascii: moov. This has to be a type, it’s a string and the idea of a type called “moov” in a video makes sense to me.

Woo! We have a section header. The next 12 02 bytes are part of the “moov” section. Now, if we go to 00 00 12 1a, we can see that this is another [length][type] pair. The next type being “uuid” Just to see if the theory holds up, let’s check out 0000121a + 00001e13 = 0000302D:

00003010  20 20 20 20 20 20 20 20 20 0a 3c 3f 78 70 61 63            .<?xpac
00003020  6b 65 74 20 65 6e 64 3d 22 77 22 3f 3e 00 00 00   ket end="w"?>...
00003030  01 6d 64 61 74 00 00 00 00 00 09 f6 85 00 00 03   .mdat......ö…...

This looks similar. The only difference is that 00 00 00 01 cannot be a length. Scrolling down a bit shows a wildly different looking type of data, so I think it’s safe to assume “mdat” means something like “movie data” meaning what follows is our movie data. Since mp4s can be streamed, it’s safe to assume that this is our h264 encoded video and nothing follows that.

So right now, from what we can see, the structure based on video A looks like this:

|---------------------------|
| Signature (outer wrapper) |
| |----------------         |
| | moov section  |         |
| |----------------         |
| |----------------         |
| | uuid section  |         |
| |----------------         |
| |---------------|         |
| | mdat section  |         |
| | |------------||         |
| | | video data ||         |
| | |------------||         |
| |---------------|         |
|---------------------------|

A quick check of the other two files shows the same superstructure.

Hmm, OK, so let’s dig deeper into those moov and uuid sections and see what we can come up with.

moov section

The next 16 bytes seem easy enough to figure out:

00 00 00 6c 6d 76 68 64  ...lmvhd

OK, we have a section called “mvhd” of length 6c. Simple enough, let’s go forward 6c:

00 00 06 c0 74 72 61 6b  ...Àtrak

A section called “trak” of length 6c0. Continuing on 6c0, we find:

00 00 05 28 74 72 61 6b  ...(trak

That’s interesting, another “trak” section. That must mean that not all of these are IDs, but some (or all) of them might just be descriptors that don’t need to be unique. Let’s keep going another 528:

00 00 00 3a 75 64 74 61  ...:udta

A “udta” section. Another 3a bytes later, we’re back here:

00 00 1e 0c 75 75 69 64  ....uuid

So that’s our “moov” section. Looking at the other two files shows the exact same format. So, from what we can guess so far, our format looks like this:

|---------------------|
| moov section header |
| |--------------|    |
| | mvhd section |    |
| |--------------|    |
| |--------------|    |
| | trak section |    |
| |--------------|    |
| |--------------|    |
| | trak section |    |
| |--------------|    |
| |--------------|    |
| | udta section |    |
| |--------------|    |
|---------------------|

We need to go deeper. Let’s go all the way down each subsection of the moov section.

At a glance, it doesn’t look like the mvhd section contains any more structural information, so let’s move on for now.

The first trak section is more interesting. The first 16 bytes in the trak section look like this:

00 00 00 5c 74 6b 68 64  ...\tkhd

Simple enough. Again, the tkhd section doesn’t appear to have any structural information in it. So lets move another 5c bytes to this:

00 00 00 24 65 64 74 73  ...$edts

The next 16 bytes look like another subsection, so let’s go down another level:

00 00 00 1c 65 6c 73 74  ....elst

Which appears to be the only section in edts, since 1c + 8 = 24. Moving on we have this

00 00 09 5c 6d 64 69 61  ...\mdia

Again, this clearly has some subsections, the next 8 bytes are:

00 00 00 20 6d 64 68 64  ... mdhd

mdhd doesn’t look like it has subsections, 20 bytes on we have:

00 00 00 44 68 64 6c 72  ...Dhdlr

This hdlr section actually has some human readable text! It does not, however, have any subsections. 44 bytes later:

00 00 00 14 76 6d 68 64  ....vmhd

No subsections, Next section is:

00 00 00 24 64 69 6e 66  ...$dinf

This one subsection:

00 00 00 1c 64 72 65 66  ....dref

Looks like like this one has a subsection:

00 00 00 0c 75 72 6c 20  ....url

Kind of a strange name, url[space], but it fits our expectation of a section header. By now you might be losing track of where we are, currently we’re still in moov->mdia

At this point, this post is getting far too long and repetitive, so I’ll just summarize the rest of this moov section. Using the same method we’ve been using so far, we find the moov section has the following format:

|----------------------------------|
| moov header                      |
| |---------------|                |
| | mvhd section  |                |
| |---------------|                |
| |------------------------------| |
| | trak section                 | |
| | |------------------|         | |
| | | tkhd section     |         | |
| | |------------------|         | |
| | |------------------|         | |
| | | edts section     |         | |
| | | |--------------| |         | |
| | | | elst section | |         | |
| | | |--------------| |         | |
| | |--------------------------| | |
| | | mdia section             | | |
| | | |--------------|         | | |
| | | | mdhd section |         | | |
| | | |--------------|         | | |
| | | |--------------|         | | |
| | | | hdlr section |         | | |
| | | |--------------|         | | |
| | | |--------------|         | | |
| | | | vmhd section |         | | |
| | | |--------------|         | | |
| | | |----------------------| | | |
| | | | dinf section         | | | |
| | | | |------------------| | | | |
| | | | | dref section     | | | | |
| | | | | |--------------| | | | | |
| | | | | | url section  | | | | | |
| | | | | |--------------| | | | | |
| | | | |------------------| | | | |
| | | |----------------------| | | |
| | | |----------------------| | | |
| | | | stbl section         | | | |
| | | | |------------------| | | | |
| | | | | stsd section     | | | | |
| | | | | |--------------| | | | | |
| | | | | | avc1 section | | | | | |
| | | | | |--------------| | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stss section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | ctts section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stss section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | sdtp section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stsc section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stsz section     | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stco section     | | | | |
| | | | |------------------| | | | |
| | | |----------------------| | | |
| | |--------------------------| | |
| |------------------------------| |
| |------------------------------| |
| | trak section                 | |
| | |------------------|         | |
| | | tkhd section     |         | |
| | |------------------|         | |
| | |------------------|         | |
| | | edts section     |         | |
| | | |--------------| |         | |
| | | | elst section | |         | |
| | | |--------------| |         | |
| | |--------------------------| | |
| | | mdia section             | | |
| | | |--------------|         | | |
| | | | mdhd section |         | | |
| | | |--------------|         | | |
| | | |--------------|         | | |
| | | | hdlr section |         | | |
| | | | |--------------|       | | |
| | | | | soun section |       | | |
| | | |----------------------| | | |
| | | | minf section         | | | |
| | | | |------------------| | | | |
| | | | | smhd             | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | dinf             | | | | |
| | | | | |--------------| | | | | |
| | | | | | dref         | | | | | |
| | | | | | |----------| | | | | | |
| | | | | | | url      | | | | | | |
| | | | | | |----------| | | | | | |
| | | | |------------------| | | | |
| | | | |------------------| | | | |
| | | | | stbl             | | | | |
| | | | | |--------------| | | | | |
| | | | | | stsd         | | | | | |
| | | | | | |----------| | | | | | |
| | | | | | | mp4a     | | | | | | |
| | | | | | |----------| | | | | | |
| | | | | |--------------| | | | | |
| | | | | |--------------| | | | | |
| | | | | | stts         | | | | | |
| | | | | |--------------| | | | | |
| | | | | |--------------| | | | | |
| | | | | | stsc         | | | | | |
| | | | | |--------------| | | | | |
| | | | | |--------------| | | | | |
| | | | | | stsz         | | | | | |
| | | | | |--------------| | | | | |
| | | | | |--------------| | | | | |
| | | | | | stco         | | | | | |
| | | | | |--------------| | | | | |
| | | | |------------------| | | | |
| | | |----------------------| | | |
| | |--------------------------| | |
| |------------------------------| |
| ---------------------------------|
| |----------|                     |
| | udta     |                     |
| |----------|                     |
|----------------------------------|

Ok, that took a while. It’s fairly clear to me that some of these could change based on the specifics of what audio and video codecs you’ve used to encode.

The uuid and mdat sections do not contain any subsections. That is, the h264 blob inside of our mdat is still a black box, but we’ll get to that later. For now, I think this post has explained the basics of the process in reverse engineering a file format. Not all of them will be this simple, but most of them are easy enough to figure out.

Here are a couple of other basic ways I’ve seen data structured in a file:

  • An xml file, with none of this fancy pointer stuff. These are the easiest to figure out, since they are self-describing
  • A pointer to a directory which has a set of fixed-size entries containing the headers instead of a forward-read style of header
    • See the DOOM .wad format, which is widely documented
    • This is better when you need random access to some part of the file, instead of needing to read the file serially, like a video

I’m planning on writing up the h264 part of the file as well at some point. But I think we’re done for now. Due to what I know about mp4 since it isn’t a black box, I know that there can be more sections and subsections. The point wasn’t to show anything about the mp4 format specifically, but to show how easy it can be to piece together the structure of a file with nothing more than the file.

Have fun tearing apart files!

Share →