Deep Dive on the PE Format
Introduction Link to heading
PE files (Portable Executable) are executable files respecting the PE format norm defined by Windows. On that OS, every executable have to respect the PE format, including standalone .exe
files, custom libraries .dll
, kernel modules .srv
…
Here’s the overview of the different components of a PE file.
Note: I’m trying to use C built-in types when possible to represent the data types. You can convert to windows types with the following table
built-in | Windows |
---|---|
unsigned char |
BYTE |
unsigned short |
WORD |
unsigned long |
DWORD |
unsigned long long |
ULONGLONG |
DOS Header Link to heading
The DOS Header is mostly used for backwards compatibility, it does not have an important role on modern Windows PE files, but still has to be present because of e_lfanew
that will inform the offset towards the NT headers
section.
It’s a 64 byte long structure defined as the following:
//0x40 bytes (sizeof)
typedef struct _IMAGE_DOS_HEADER {
unsigned short e_magic; // 00: Magic Number ("MZ")
unsigned short e_cblp; // 02: Bytes on last page of file
unsigned short e_cp; // 04: Pages in file
unsigned short e_crlc; // 06: Relocations
unsigned short e_cparhdr; // 08: Size of header in paragraphs
unsigned short e_minalloc; // 0a: Minimum extra paragraphs needed
unsigned short e_maxalloc; // 0c: Maximum extra paragraphs needed
unsigned short e_ss; // 0e: Initial (relative) SS value
unsigned short e_sp; // 10: Initial SP value
unsigned short e_csum; // 12: Checksum
unsigned short e_ip; // 14: Initial IP value
unsigned short e_cs; // 16: Initial (relative) CS value
unsigned short e_lfarlc; // 18: File address of relocation table
unsigned short e_ovno; // 1a: Overlay number
unsigned short e_res[4]; // 1c: Reserved words
unsigned short e_oemid; // 24: OEM identifier (for e_oeminfo)
unsigned short e_oeminfo; // 26: OEM information; e_oemid specific
unsigned short e_res2[10]; // 28: Reserved words
unsigned long e_lfanew; // 3c: Offset to NT headers
}
Note: “image” is simply referring the binary file
DOS Stub Link to heading
The DOS Stub is simply a DOS compatible program that prints out This program can not be run in DOS mode
and exit without trying to run the PE file. The sole purpose of this section is to display this error message when someone try to run a PE file in a DOS environment.
Note: The DOS Stub is sometimes refereed as
e_program
and stored along the_IMAGE_DOS_HEADER
.
The DOS header & Stub in Ghidra
NT Headers Link to heading
The NT header is defined like this:
//0x108 bytes (sizeof)
struct _IMAGE_NT_HEADERS64 {
unsigned long Signature; // 0x0
IMAGE_FILE_HEADER FileHeader; // 0x4
IMAGE_OPTIONAL_HEADER64 OptionalHeader; // 0x18
};
NT Headers -> Signature Link to heading
The Signature
is always at the address base + _IMAGE_DOS_HEADER.e_lfanew
. The signature is a 4 byte long value, similar to a magic number.
Its value is usually 0x00004550
(PE\0\0
), in some case it will change to other value for certain type of file, but I’ve yet to see exemples of it.
NT Headers -> FileHeader Link to heading
//0x14 bytes (sizeof)
struct _IMAGE_FILE_HEADER {
unsigned short Machine; // 0x0
unsigned short NumberOfSections; // 0x2
unsigned long TimeDateStamp; // 0x4
unsigned long PointerToSymbolTable; // 0x8
unsigned long NumberOfSymbols; // 0xc
unsigned short SizeOfOptionalHeader; // 0x10
unsigned short Characteristics; // 0x12
Attribute | Description |
---|---|
Machine |
Specifies what type of machine is used (0x14C = Intel64, 0xaa64 = ARM64…) |
NumberOfSections |
The number of sections that exists in the binary, it’s used to determine the size of the Section Table |
TimeDateStamp |
Timestamp of the file’s creation date |
PointerToSymbolTable |
An offset pointer to the table storing the symbols of the binary (will be 0 if the binary is stripped) |
NumberOfSymbols |
The size of the SymbolTable (0 if the binary is stripped) |
SizeOfOptionalHeader |
The size of the OptionalHeader struct. (0 for object files since they don’t have OptionalHeader s) |
Characteristics |
Flags indicating special attributes for the binary |
There’s a few Characteristics
flag that are importants, like:
0x0100
:IMAGE_FILE_32BIT_MACHINE
informs that the executable is based on 32 bits0x0200
:IMAGE_FILE_DEBUG_STRIPPED
the executable does not have any debug symbols0x1000
:IMAGE_FILE_SYSTEM
the file is not an executable but a file system0x2000
:IMAGE_FILE_DLL
the file is not a plain executable but a DLL
NT Headers -> OptionalHeader Link to heading
Every executable has this section, it’s called “optional” because some files (object files) does not have it. For executable, it is required.
The struct in itself does not have a fixed size across executable, thus the presence of SizeOfOptionalHeader
in the FileHeader
struct.
Here’s the definition of the OptionalHeader
struct:
struct _IMAGE_OPTIONAL_HEADER64 {
unsigned short Magic; // 0x0
unsigned char MajorLinkerVersion; // 0x2
unsigned char MinorLinkerVersion; // 0x3
unsigned long SizeOfCode; // 0x4
unsigned long SizeOfInitializedData; // 0x8
unsigned long SizeOfUninitializedData; // 0xc
unsigned long AddressOfEntryPoint; // 0x10
unsigned long BaseOfCode; // 0x14
unsigned __int64 ImageBase; // 0x18
unsigned long SectionAlignment; // 0x20
unsigned long FileAlignment; // 0x24
unsigned short MajorOperatingSystemVersion; // 0x28
unsigned short MinorOperatingSystemVersion; // 0x2a
unsigned short MajorImageVersion; // 0x2c
unsigned short MinorImageVersion; // 0x2e
unsigned short MajorSubsystemVersion; // 0x30
unsigned short MinorSubsystemVersion; // 0x32
unsigned long Win32VersionValue; // 0x34
unsigned long SizeOfImage; // 0x38
unsigned long SizeOfHeaders; // 0x3c
unsigned long CheckSum; // 0x40
unsigned short Subsystem; // 0x44
unsigned short DllCharacteristics; // 0x46
unsigned __int64 SizeOfStackReserve; // 0x48
unsigned __int64 SizeOfStackCommit; // 0x50
unsigned __int64 SizeOfHeapReserve; // 0x58
unsigned __int64 SizeOfHeapCommit; // 0x60
unsigned long LoaderFlags; // 0x68
unsigned long NumberOfRvaAndSizes; // 0x6c
IMAGE_DATA_DIRECTORY DataDirectory[xx]; // 0x70
};
Note:
xx
is equal toIMAGE_NUMBEROF_DIRECTORY_ENTRIES
, often 16 but can change, that’s why this struct has not a fixed size (thus the need of_IMAGE_FILE_HEADER.SizeOfOptionalHeader
).
Attribute | Description |
---|---|
Magic |
Determines is the executable is 64 bit or 32 bit (0x20B = 64, 0x10B = 32). It has priority over _IMAGE_FILE_HEADER.Machine |
MajorLinkerVersion |
self-explanatory |
MinorLinkerVersion |
self-explanatory |
SizeOfCode |
The size of the .text segment, or the sum of all code sections if there are multiple |
SizeOfInitializedData |
The size of the initialized data in the .data segment (or the sum if there are multiple) |
SizeOfUninitializedData |
The size of the uninitialized data in the .data segment (or the sum if there are multiple) |
AddressOfEntryPoint |
The offset pointing to the entry function of the executable. It is optional for DLLs and must be equal to 0 if not specified |
BaseOfCode |
The offset pointing to the start of the .text section. |
ImageBase |
The preferred virtual address to start mapping the executable. Default will usually be 0x00400000 for standalone executable and 0x10000000 for DLLs. In practice, this fiels is not really used since ASLR kicks in. |
SectionAlignment |
Used to align sections of the binary, by default copies the default page size of the current architecture |
FileAlignment |
Same as SectionAlignment , but this time to align sections on the disk. Default is 512, must be a power of 2 between 512 and 64K. If SectionAlignment is less than the architecture’s page size, then FileAlignment must match SectionAlignment |
MajorOperatingSystemVersion |
self-explanatory |
MinorOperatingSystemVersion |
self-explanatory |
MajorImageVersion |
self-explanatory |
MinorImageVersion |
self-explanatory |
MajorSubsystemVersion |
self-explanatory |
MinorSubsystemVersion |
self-explanatory |
Win32VersionValue |
Reserved, must be 0 . |
SizeOfImage |
The complete size of the image including all headers. Rounded up to a multiple of FileAlignment |
SizeOfHeaders |
The combined size of an MS-DOS stub, PE header, and section headers rounded up to a multiple of FileAlignment . |
CheckSum |
A basic form of integrity check. Used on every drivers, DLLs loaded at boot time and DLLs loaded into critical Windows processes. |
Subsystem |
Determines which subsystem does the image need to be loaded on. list of values |
DllCharacteristics |
Defines characteristics and behavior of the DLL, is the image is a DLL. list of values |
SizeOfStackReserve |
Specifies stack max size |
SizeOfStackCommit |
Specifies stack default size |
SizeOfHeapReserve |
Specifies heap max size |
SizeOfHeapCommit |
Specifies heap default size |
LoaderFlags |
Reserved, must be 0 . |
NumberOfRvaAndSizes |
Size of DataDirectory |
DataDirectories |
List of IMAGE_DATA_DIRECTORY |
Note:
DllCharacteristics
is notably where we can retrieve information about the security in places for the image, like NX, SEH… See more Also it’s namedDllCharacteristics
but even non-DLL have it.
Data Directories Link to heading
As said, the maximum size of DataDirectories
is 16. Here is the corresponding type of each entry:
Index | Value |
---|---|
0 | Export Directory |
1 | Import Directory |
2 | Resource Directory |
3 | Exception Directory |
4 | Certificate Directory |
5 | Base Relocation Directory |
6 | Debug Data |
7 | Architecture (reserved) |
8 | Global PTR |
9 | TLS Table |
10 | Load Config Table |
11 | Bound Import |
12 | IAT |
13 | Delay Import Descriptor |
14 | CLR Runtime Header |
15 | (reserved) |
Each entry is an IMAGE_DATA_DIRECTORY
structure:
struct _IMAGE_DATA_DIRECTORY {
unsigned long VirtualAddress;
unsigned long Size;
};
Data Directories ⇾ Export Directory Link to heading
The export directory, stored in the .edata
section, contains all the needed information about symbols that a DLL (or an executable) exports.
The Export Directory plays a crucial role in the process of loading a DLL, the loader uses the export table in order to retrieve details about functions and variables exported by the loaded DLL.
Note:
ordinal
is just a fancy way to talk about indexes, andRVA
are offsets from the base address
typedef struct _IMAGE_EXPORT_DIRECTORY {
unsigned long Characteristics;
unsigned long TimeDateStamp;
unsigned short MajorVersion;
unsigned short MinorVersion;
unsigned long Name; // Pointer (RVA) to DLL name
unsigned long Base; // Starting ordinal number
unsigned long NumberOfFunctions; // = Size of the Export Table
unsigned long NumberOfNames; // = Size of the Export Name Table
unsigned long AddressOfFunctions; // Pointer (RVA) to Export Address Table
unsigned long AddressOfNames; // Pointer (RVA) to Export Name Table
unsigned long AddressOfNameOrdinals; // Pointer (RVA) to the Ordinal Table
}
To understand better how exporting works, let’s take a higher level example:
ExportTable = [0x200, 0x100, 0x500];
ExportNameTable = ["func_A", "func_B", "func_C"];
ExportNameOrdinalTable = [1, 0, 2];
/*
"func_A" is ordinal 1 and is stored @ Dll_base + ExportTable[1] = Dll_base + 0x100
"func_B" is ordinal 0 and is stored @ Dll_base + ExportTable[0] = Dll_base + 0x200
"func_C" is ordinal 2 and is stored @ Dll_base + ExportTable[2] = Dll_base + 0x500
*/
You can see an example function that get the exported function by name on this reactos file
Data Directories ⇾ Import Directory Link to heading
The import directory stores a list of _IMAGE_IMPORT_DESCRIPTOR
, each entry representing a library currently loaded.
struct IMAGE_IMPORT_DESCRIPTOR {
unsigned long Characteristics;
unsigned long OriginalFirstThunk;
unsigned long TimeDateStamp;
unsigned long ForwarderChain;
unsigned long Name;
unsigned long FirstThunk;
};
The 3 important attributes of this structure are:
Name
: This is the name of the library, ex:KERNEL32.dll
OriginalFirstThunk
: This is where is stored the address (RVA) of the list of all the function’s name imported from the library.FirstThunk
: This is where is stored the address (RVA) of the list of all the function’s addresses. It is stored in the same order as forOriginalFirstThunk
In order to retrieve the address of an imported function, you need to loop through all the libraries (IMAGE_IMPORT_DESCRIPTOR
) and for each one, loop through OriginalFirstThunk
until you find the corresponding name.
Section Table Link to heading
The number of entries defined on this table is specified in the _IMAGE_FILE_HEADER.NumberOfSections
attribute. Each entry in this table is defined as a “section header” with the associated struct:
//0x28 bytes (sizeof)
struct _IMAGE_SECTION_HEADER
{
unsigned char Name[8]; //0x0
union
{
unsigned long PhysicalAddress; //0x8
unsigned long VirtualSize; //0x8
} Misc; //0x8
unsigned long VirtualAddress; //0xc
unsigned long SizeOfRawData; //0x10
unsigned long PointerToRawData; //0x14
unsigned long PointerToRelocations; //0x18
unsigned long PointerToLinenumbers; //0x1c
unsigned short NumberOfRelocations; //0x20
unsigned short NumberOfLinenumbers; //0x22
unsigned long Characteristics; //0x24
};
Attribute | Description |
---|---|
Name |
Name of the section (.text , .code …) |
PhysicalAddress / VirtualSize |
The total size of the section when loaded into memory |
VirtualAddress |
The offset for the first byte of the section, relative to the image base address |
SizeOfRawData |
The size rounded up to a FileAlignment multiple of the size of the section on disk |
PointerToRawData |
A pointer to the first page of the section on the disk |
PointerToRelocations |
A pointer to the beginning of relocation entries for the section. This is set to zero for executable images or if there are no relocations. |
PointerToLinenumbers |
The file pointer to the beginning of line-number entries for the section. This should be zero because this is linked to COFF debugging and it’s considered deprecated |
NumberOfRelocations |
The number of relocation entries for the section, set to zero for executable images |
NumberOfLinenumbers |
The number of line-number entries, should also be set to zero because deprecated |
Characteristics |
Characteristics of the section, like RWX permissions (see more) |
After the Section Table, we will find our defined section in order.
Conclusion Link to heading
We now should have a better understanding of the PE format and it’s insights, for example, we now know that the address of the entry point of the image file should always be at the offset 0xa8
, defined in _IMAGE_OPTIONAL_HEADER64.AddressOfEntryPoint
… As you can see from my conclusion, that was far from being the most interesting stuff out there, but I think making this little doc-like paper helps me having a better understanding than simply reading the official doc and that will be useful for future projects.