Deep Dive on the PE Format

Introduction Link to heading

PE files (Portable Executable) are executable files respecting the PE format norm defined by Windows. On that OS, every executable have to respect the PE format, including standalone .exe files, custom libraries .dll, kernel modules .srv

Here’s the overview of the different components of a PE file.

source

Note: I’m trying to use C built-in types when possible to represent the data types. You can convert to windows types with the following table

built-in Windows
unsigned char BYTE
unsigned short WORD
unsigned long DWORD
unsigned long long ULONGLONG

DOS Header Link to heading

The DOS Header is mostly used for backwards compatibility, it does not have an important role on modern Windows PE files, but still has to be present because of e_lfanew that will inform the offset towards the NT headers section.

It’s a 64 byte long structure defined as the following:

//0x40 bytes (sizeof)
typedef struct _IMAGE_DOS_HEADER {
    unsigned short  e_magic;      // 00: Magic Number ("MZ")
    unsigned short  e_cblp;       // 02: Bytes on last page of file
    unsigned short  e_cp;         // 04: Pages in file
    unsigned short  e_crlc;       // 06: Relocations
    unsigned short  e_cparhdr;    // 08: Size of header in paragraphs
    unsigned short  e_minalloc;   // 0a: Minimum extra paragraphs needed
    unsigned short  e_maxalloc;   // 0c: Maximum extra paragraphs needed
    unsigned short  e_ss;         // 0e: Initial (relative) SS value
    unsigned short  e_sp;         // 10: Initial SP value
    unsigned short  e_csum;       // 12: Checksum
    unsigned short  e_ip;         // 14: Initial IP value
    unsigned short  e_cs;         // 16: Initial (relative) CS value
    unsigned short  e_lfarlc;     // 18: File address of relocation table
    unsigned short  e_ovno;       // 1a: Overlay number
    unsigned short  e_res[4];     // 1c: Reserved words
    unsigned short  e_oemid;      // 24: OEM identifier (for e_oeminfo)
    unsigned short  e_oeminfo;    // 26: OEM information; e_oemid specific
    unsigned short  e_res2[10];   // 28: Reserved words
    unsigned long   e_lfanew;     // 3c: Offset to NT headers
}

Note: “image” is simply referring the binary file

DOS Stub Link to heading

The DOS Stub is simply a DOS compatible program that prints out This program can not be run in DOS mode and exit without trying to run the PE file. The sole purpose of this section is to display this error message when someone try to run a PE file in a DOS environment.

Note: The DOS Stub is sometimes refereed as e_program and stored along the _IMAGE_DOS_HEADER.

The DOS header & Stub in Ghidra

NT Headers Link to heading

The NT header is defined like this:

//0x108 bytes (sizeof)
struct _IMAGE_NT_HEADERS64 {
    unsigned long Signature;                 // 0x0
    IMAGE_FILE_HEADER FileHeader;            // 0x4
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;  // 0x18
};

NT Headers -> Signature Link to heading

The Signature is always at the address base + _IMAGE_DOS_HEADER.e_lfanew. The signature is a 4 byte long value, similar to a magic number.

Its value is usually 0x00004550 (PE\0\0), in some case it will change to other value for certain type of file, but I’ve yet to see exemples of it.

NT Headers -> FileHeader Link to heading

//0x14 bytes (sizeof)
struct _IMAGE_FILE_HEADER {
    unsigned short Machine;                 // 0x0
    unsigned short NumberOfSections;        // 0x2
    unsigned long TimeDateStamp;            // 0x4
    unsigned long PointerToSymbolTable;     // 0x8
    unsigned long NumberOfSymbols;          // 0xc
    unsigned short SizeOfOptionalHeader;    // 0x10
    unsigned short Characteristics;         // 0x12
Attribute Description
Machine Specifies what type of machine is used (0x14C = Intel64, 0xaa64 = ARM64…)
NumberOfSections The number of sections that exists in the binary, it’s used to determine the size of the Section Table
TimeDateStamp Timestamp of the file’s creation date
PointerToSymbolTable An offset pointer to the table storing the symbols of the binary (will be 0 if the binary is stripped)
NumberOfSymbols The size of the SymbolTable (0 if the binary is stripped)
SizeOfOptionalHeader The size of the OptionalHeader struct. (0 for object files since they don’t have OptionalHeaders)
Characteristics Flags indicating special attributes for the binary

There’s a few Characteristics flag that are importants, like:

  • 0x0100: IMAGE_FILE_32BIT_MACHINE informs that the executable is based on 32 bits
  • 0x0200: IMAGE_FILE_DEBUG_STRIPPED the executable does not have any debug symbols
  • 0x1000: IMAGE_FILE_SYSTEM the file is not an executable but a file system
  • 0x2000: IMAGE_FILE_DLL the file is not a plain executable but a DLL

NT Headers -> OptionalHeader Link to heading

Every executable has this section, it’s called “optional” because some files (object files) does not have it. For executable, it is required.

The struct in itself does not have a fixed size across executable, thus the presence of SizeOfOptionalHeader in the FileHeader struct.

Here’s the definition of the OptionalHeader struct:


struct _IMAGE_OPTIONAL_HEADER64 {
    unsigned short Magic;                        // 0x0
    unsigned char MajorLinkerVersion;            // 0x2
    unsigned char MinorLinkerVersion;            // 0x3
    unsigned long SizeOfCode;                    // 0x4
    unsigned long SizeOfInitializedData;         // 0x8
    unsigned long SizeOfUninitializedData;       // 0xc
    unsigned long AddressOfEntryPoint;           // 0x10
    unsigned long BaseOfCode;                    // 0x14
    unsigned __int64 ImageBase;                  // 0x18
    unsigned long SectionAlignment;              // 0x20
    unsigned long FileAlignment;                 // 0x24
    unsigned short MajorOperatingSystemVersion;  // 0x28
    unsigned short MinorOperatingSystemVersion;  // 0x2a
    unsigned short MajorImageVersion;            // 0x2c
    unsigned short MinorImageVersion;            // 0x2e
    unsigned short MajorSubsystemVersion;        // 0x30
    unsigned short MinorSubsystemVersion;        // 0x32
    unsigned long Win32VersionValue;             // 0x34
    unsigned long SizeOfImage;                   // 0x38
    unsigned long SizeOfHeaders;                 // 0x3c
    unsigned long CheckSum;                      // 0x40
    unsigned short Subsystem;                    // 0x44
    unsigned short DllCharacteristics;           // 0x46
    unsigned __int64 SizeOfStackReserve;         // 0x48
    unsigned __int64 SizeOfStackCommit;          // 0x50
    unsigned __int64 SizeOfHeapReserve;          // 0x58
    unsigned __int64 SizeOfHeapCommit;           // 0x60
    unsigned long LoaderFlags;                   // 0x68
    unsigned long NumberOfRvaAndSizes;           // 0x6c
    IMAGE_DATA_DIRECTORY DataDirectory[xx];      // 0x70
};

Note: xx is equal to IMAGE_NUMBEROF_DIRECTORY_ENTRIES, often 16 but can change, that’s why this struct has not a fixed size (thus the need of _IMAGE_FILE_HEADER.SizeOfOptionalHeader).

Attribute Description
Magic Determines is the executable is 64 bit or 32 bit (0x20B = 64, 0x10B = 32). It has priority over _IMAGE_FILE_HEADER.Machine
MajorLinkerVersion self-explanatory
MinorLinkerVersion self-explanatory
SizeOfCode The size of the .text segment, or the sum of all code sections if there are multiple
SizeOfInitializedData The size of the initialized data in the .data segment (or the sum if there are multiple)
SizeOfUninitializedData The size of the uninitialized data in the .data segment (or the sum if there are multiple)
AddressOfEntryPoint The offset pointing to the entry function of the executable. It is optional for DLLs and must be equal to 0 if not specified
BaseOfCode The offset pointing to the start of the .text section.
ImageBase The preferred virtual address to start mapping the executable. Default will usually be 0x00400000 for standalone executable and 0x10000000 for DLLs. In practice, this fiels is not really used since ASLR kicks in.
SectionAlignment Used to align sections of the binary, by default copies the default page size of the current architecture
FileAlignment Same as SectionAlignment, but this time to align sections on the disk. Default is 512, must be a power of 2 between 512 and 64K. If SectionAlignment is less than the architecture’s page size, then FileAlignment must match SectionAlignment
MajorOperatingSystemVersion self-explanatory
MinorOperatingSystemVersion self-explanatory
MajorImageVersion self-explanatory
MinorImageVersion self-explanatory
MajorSubsystemVersion self-explanatory
MinorSubsystemVersion self-explanatory
Win32VersionValue Reserved, must be 0.
SizeOfImage The complete size of the image including all headers. Rounded up to a multiple of FileAlignment
SizeOfHeaders The combined size of an MS-DOS stub, PE header, and section headers rounded up to a multiple of FileAlignment.
CheckSum A basic form of integrity check. Used on every drivers, DLLs loaded at boot time and DLLs loaded into critical Windows processes.
Subsystem Determines which subsystem does the image need to be loaded on. list of values
DllCharacteristics Defines characteristics and behavior of the DLL, is the image is a DLL. list of values
SizeOfStackReserve Specifies stack max size
SizeOfStackCommit Specifies stack default size
SizeOfHeapReserve Specifies heap max size
SizeOfHeapCommit Specifies heap default size
LoaderFlags Reserved, must be 0.
NumberOfRvaAndSizes Size of DataDirectory
DataDirectories List of IMAGE_DATA_DIRECTORY

Note: DllCharacteristics is notably where we can retrieve information about the security in places for the image, like NX, SEH… See more Also it’s named DllCharacteristics but even non-DLL have it.

Data Directories Link to heading

As said, the maximum size of DataDirectories is 16. Here is the corresponding type of each entry:

Index Value
0 Export Directory
1 Import Directory
2 Resource Directory
3 Exception Directory
4 Certificate Directory
5 Base Relocation Directory
6 Debug Data
7 Architecture (reserved)
8 Global PTR
9 TLS Table
10 Load Config Table
11 Bound Import
12 IAT
13 Delay Import Descriptor
14 CLR Runtime Header
15 (reserved)

Each entry is an IMAGE_DATA_DIRECTORY structure:

struct _IMAGE_DATA_DIRECTORY {
    unsigned long   VirtualAddress;
    unsigned long   Size;
};

Data Directories ⇾ Export Directory Link to heading

The export directory, stored in the .edata section, contains all the needed information about symbols that a DLL (or an executable) exports.

The Export Directory plays a crucial role in the process of loading a DLL, the loader uses the export table in order to retrieve details about functions and variables exported by the loaded DLL.

Note: ordinal is just a fancy way to talk about indexes, and RVA are offsets from the base address

typedef struct _IMAGE_EXPORT_DIRECTORY {
  unsigned long Characteristics;
  unsigned long TimeDateStamp;
  unsigned short MajorVersion;
  unsigned short MinorVersion;
  unsigned long Name;                      // Pointer (RVA) to DLL name
  unsigned long Base;                      // Starting ordinal number
  unsigned long NumberOfFunctions;         //  = Size of the Export Table
  unsigned long NumberOfNames;             //  = Size of the Export Name Table
  unsigned long AddressOfFunctions;        // Pointer (RVA) to Export Address Table
  unsigned long AddressOfNames;            // Pointer (RVA) to Export Name Table
  unsigned long AddressOfNameOrdinals;     // Pointer (RVA) to the Ordinal Table
}

To understand better how exporting works, let’s take a higher level example:

ExportTable            = [0x200, 0x100, 0x500];

ExportNameTable        = ["func_A", "func_B", "func_C"];
ExportNameOrdinalTable = [1, 0, 2];

/*
"func_A" is ordinal 1 and is stored @ Dll_base + ExportTable[1] = Dll_base + 0x100
"func_B" is ordinal 0 and is stored @ Dll_base + ExportTable[0] = Dll_base + 0x200
"func_C" is ordinal 2 and is stored @ Dll_base + ExportTable[2] = Dll_base + 0x500
*/

You can see an example function that get the exported function by name on this reactos file

Data Directories ⇾ Import Directory Link to heading

The import directory stores a list of _IMAGE_IMPORT_DESCRIPTOR, each entry representing a library currently loaded.

struct IMAGE_IMPORT_DESCRIPTOR {
	unsigned long Characteristics;
    unsigned long OriginalFirstThunk;
	unsigned long TimeDateStamp;
	unsigned long ForwarderChain;
	unsigned long Name;
	unsigned long FirstThunk;
};

The 3 important attributes of this structure are:

  • Name: This is the name of the library, ex: KERNEL32.dll
  • OriginalFirstThunk: This is where is stored the address (RVA) of the list of all the function’s name imported from the library.
  • FirstThunk: This is where is stored the address (RVA) of the list of all the function’s addresses. It is stored in the same order as for OriginalFirstThunk

In order to retrieve the address of an imported function, you need to loop through all the libraries (IMAGE_IMPORT_DESCRIPTOR) and for each one, loop through OriginalFirstThunkuntil you find the corresponding name.

Section Table Link to heading

The number of entries defined on this table is specified in the _IMAGE_FILE_HEADER.NumberOfSections attribute. Each entry in this table is defined as a “section header” with the associated struct:

//0x28 bytes (sizeof)
struct _IMAGE_SECTION_HEADER
{
    unsigned char Name[8];                                     //0x0
    union
    {
        unsigned long PhysicalAddress;                         //0x8
        unsigned long VirtualSize;                             //0x8
    } Misc;                                                    //0x8
    unsigned long VirtualAddress;                              //0xc
    unsigned long SizeOfRawData;                               //0x10
    unsigned long PointerToRawData;                            //0x14
    unsigned long PointerToRelocations;                        //0x18
    unsigned long PointerToLinenumbers;                        //0x1c
    unsigned short NumberOfRelocations;                        //0x20
    unsigned short NumberOfLinenumbers;                        //0x22
    unsigned long Characteristics;                             //0x24
}; 
Attribute Description
Name Name of the section (.text, .code…)
PhysicalAddress / VirtualSize The total size of the section when loaded into memory
VirtualAddress The offset for the first byte of the section, relative to the image base address
SizeOfRawData The size rounded up to a FileAlignment multiple of the size of the section on disk
PointerToRawData A pointer to the first page of the section on the disk
PointerToRelocations A pointer to the beginning of relocation entries for the section. This is set to zero for executable images or if there are no relocations.
PointerToLinenumbers The file pointer to the beginning of line-number entries for the section. This should be zero because this is linked to COFF debugging and it’s considered deprecated
NumberOfRelocations The number of relocation entries for the section, set to zero for executable images
NumberOfLinenumbers The number of line-number entries, should also be set to zero because deprecated
Characteristics Characteristics of the section, like RWX permissions (see more)

After the Section Table, we will find our defined section in order.

Conclusion Link to heading

We now should have a better understanding of the PE format and it’s insights, for example, we now know that the address of the entry point of the image file should always be at the offset 0xa8, defined in _IMAGE_OPTIONAL_HEADER64.AddressOfEntryPoint… As you can see from my conclusion, that was far from being the most interesting stuff out there, but I think making this little doc-like paper helps me having a better understanding than simply reading the official doc and that will be useful for future projects.