Dll file specification
Such a record has a symbol name that is the name of a section such as. The auxiliary record provides information about the section to which it refers. Thus, it duplicates some of the information in the section header. It is used to associate a token with the COFF symbol table's namespace.
The position of this table is found by taking the symbol table address in the COFF header and adding the number of symbols multiplied by the size of a symbol. At the beginning of the COFF string table are 4 bytes that contain the total size in bytes of the rest of the string table. This size includes the size field itself, so that the value in this location would be 4 if no strings were present. Following the size are null-terminated strings that are pointed to by symbols in the COFF symbol table.
Attribute certificates can be associated with an image by adding an attribute certificate table. The attribute certificate table is composed of a set of contiguous, quadword-aligned attribute certificate entries. Zero padding is inserted between the original end of the file and the beginning of the attribute certificate table to achieve this alignment.
Each attribute certificate entry contains the following fields. The virtual address value from the Certificate Table entry in the Optional Header Data Directory is a file offset to the first attribute certificate entry.
Subsequent entries are accessed by advancing that entry's dwLength bytes, rounded up to an 8-byte multiple, from the start of the current attribute certificate entry. This continues until the sum of the rounded dwLength values equals the Size value from the Certificates Table entry in the Optional Header Data Directory.
If the sum of the rounded dwLength values does not equal the Size value, then either the attribute certificate table or the Size field is corrupted. The first certificate starts at offset 0x from the start of the file on disk. To advance through all the attribute certificate entries:.
Alternatively, you can enumerate the certificate entries by calling the Win32 ImageEnumerateCertificates function in a loop. For a link to the function's reference page, see References. Attribute certificate table entries can contain any certificate type, as long as the entry has the correct dwLength value, a unique wRevision value, and a unique wCertificateType value.
Note that some values are not currently supported. If the bCertificate content does not end on a quadword boundary, the attribute certificate entry is padded with zeros, from the end of bCertificate to the next quadword boundary.
As stated in the preceding section, the certificates in the attribute certificate table can contain any certificate type. Certificates that ensure a PE file's integrity may include a PE image hash. A PE image hash or file hash is similar to a file checksum in that the hash algorithm produces a message digest that is related to the integrity of a file. However, a checksum is produced by a simple algorithm and is used primarily to detect whether a block of memory on disk has gone bad and the values stored there have become corrupted.
A file hash is similar to a checksum in that it also detects file corruption. However, unlike most checksum algorithms, it is very difficult to modify a file without changing the file hash from its original unmodified value. A file hash can thus be used to detect intentional and even subtle modifications to a file, such as those introduced by viruses, hackers, or Trojan horse programs.
This is because the act of adding a Certificate changes these fields and would cause a different hash value to be calculated. This data stream remains consistent when certificates are added to or removed from a PE file. Based on the parameters that are passed to ImageGetDigestStream , other data from the PE image can be omitted from the hash computation. These tables were added to the image to support a uniform mechanism for applications to delay the loading of a DLL until the first call into that DLL.
The layout of the tables matches that of the traditional import tables that are described in section 6. The delay-load directory table is the counterpart to the import directory table. It can be retrieved through the Delay Import Descriptor entry in the optional header data directories list offset The table is arranged as follows:.
The tables that are referenced in this data structure are organized and sorted just as their counterparts are for traditional imports. For details, see The. As yet, no attribute flags are defined. The linker sets this field to zero in the image. This field can be used to extend the record by indicating the presence of new fields, or it can be used to indicate behaviors to the delay or unload helper functions.
The name of the DLL to be delay-loaded resides in the read-only data section of the image. It is referenced through the szName field. The handle of the DLL to be delay-loaded is in the data section of the image. The phmod field points to the handle. The supplied delay-load helper uses this location to store the handle to the loaded DLL. The delay-load helper updates these pointers with the real entry points so that the thunks are no longer in the calling loop.
The delay import name table INT contains the names of the imports that might require loading. They are ordered in the same fashion as the function pointers in the IAT. It consists of initialized data in the read-only section that is an exact copy of the original IAT that referred the code to the delay-load thunks.
Typical COFF sections contain code or data that linkers and Microsoft Win32 loaders process without special knowledge of the section contents. The contents are relevant only to the application that is being linked or executed. However, some COFF sections have special meanings when found in object files or image files. Tools and loaders recognize these sections because they have special flags set in the section header, because special locations in the image optional header point to them, or because the section name itself indicates a special function of the section.
Even if the section name itself does not indicate a special function of the section, the section name is dictated by convention, so the authors of this specification can refer to a section name in all cases. The reserved sections and their attributes are described in the table below, followed by detailed descriptions for the section types that are persisted into executables and the section types that contain metadata for extensions.
Some of the sections listed here are marked "object only" or "image only" to indicate that their special semantics are relevant only for object files or image files, respectively. A section that is marked "image only" might still appear in an object file as a way of getting into the image file, but the section has no special meaning to the linker, only to the image file loader.
This section describes the packaging of debug information in object and image files. The next section describes the format of the debug directory, which can be anywhere in the image.
Subsequent sections describe the "groups" in object files that contain debug information. The default for the linker is that debug information is not mapped into the address space of the image. Image files contain an optional debug directory that indicates what form of debug information is present and where it is. This directory consists of an array of debug directory entries whose location and size are indicated in the image optional header. The debug directory can be in a discardable.
Each debug directory entry identifies the location and size of a block of debug information. The specified RVA can be zero if the debug information is not covered by a section header that is, it resides in the image file and is not mapped into the run-time address space.
If it is mapped, the RVA is its address. Those functions that do not have FPO information are assumed to have normal stack frames. The format for FPO information is as follows:. If the input does not change, the output PE file is guaranteed to be bit-for-bit identical no matter when or where the PE is produced. The raw data of this debug entry may be empty, or may contain a calculated hash value preceded by a four-byte value that represents the hash value length. Object files can contain.
The linker recognizes these. These are shared types among all of the objects that were compiled by using the precompiled header that was generated with this object.
Gathers all relevant debug data from the. Processes that data along with the linker-generated debugging information into the PDB file, and creates a debug directory entry to refer to it.
The linker removes a. The directive string is a series of linker options that are separated by spaces. Each option contains a hyphen, the option name, and any appropriate attribute. If an option contains spaces, the option must be enclosed in quotes. The export data section, named. An overview of the general structure of the export section is described below.
The tables described are usually contiguous in the file in the order shown though this is not required. Only the export directory table and export address table are required to export symbols as ordinals.
An ordinal is an export that is accessed directly by its export address table index. The name pointer table, ordinal table, and export name table all exist to support use of export names. When another image file imports a symbol by name, the Win32 loader searches the name pointer table for a matching string.
If a matching string is found, the associated ordinal is identified by looking up the corresponding member in the ordinal table that is, the member of the ordinal table with the same index as the string pointer found in the name pointer table.
The resulting ordinal is an index into the export address table, which gives the actual location of the desired symbol. Every export symbol can be accessed by an ordinal. When another image file imports a symbol by ordinal, it is unnecessary to search the name pointer table for a matching string. Direct use of an ordinal is therefore more efficient. However, an export name is easier to remember and does not require the user to know the table index for the symbol.
The export symbol information begins with the export directory table, which describes the remainder of the export symbol information. The export directory table contains address information that is used to resolve imports to the entry points within this image.
The export address table contains the address of exported entry points and exported data and absolutes. An ordinal number is used as an index into the export address table. Each entry in the export address table is a field that uses one of two formats in the following table. If the address specified is not within the export section as defined by the address and length that are indicated in the optional header , the field is an export RVA, which is an actual address in code or data.
A forwarder RVA exports a definition from some other image, making it appear as if it were being exported by the current image. Thus, the symbol is simultaneously imported and exported. For example, in Kernel The application's import table refers only to Kernel Therefore, the application is not specific to Windows XP and can run on any Win32 system.
The export name pointer table is an array of addresses RVAs into the export name table. The pointers are 32 bits each and are relative to the image base. The pointers are ordered lexically to allow binary searches. The export ordinal table is an array of bit unbiased indexes into the export address table.
Ordinals are biased by the Ordinal Base field of the export directory table. In other words, the ordinal base must be subtracted from the ordinals to obtain true indexes into the export address table.
The export name pointer table and the export ordinal table form two parallel arrays that are separated to allow natural field alignment. These two tables, in effect, operate as one table, in which the Export Name Pointer column points to a public exported name and the Export Ordinal column gives the corresponding ordinal for that public name. A member of the export name pointer table and a member of the export ordinal table are associated by having the same position index in their respective arrays.
Thus, when the export name pointer table is searched and a matching string is found at position i, the algorithm for finding the symbol's RVA and biased ordinal is:.
When searching for a symbol by biased ordinal, the algorithm for finding the symbol's RVA and name is:. The export name table contains the actual string data that was pointed to by the export name pointer table. The strings in this table are public names that other images can use to import the symbols. These public export names are not necessarily the same as the private symbol names that the symbols have in their own image file and source code, although they can be.
Every exported symbol has an ordinal value, which is just the index into the export address table. Use of export names, however, is optional. Some, all, or none of the exported symbols can have export names. For exported symbols that do have export names, corresponding entries in the export name pointer table and export ordinal table work together to associate each name with an ordinal. The structure of the export name table is a series of null-terminated ASCII strings of variable length.
All image files that import symbols, including virtually all executable EXE files, have an. A typical file layout for the import information follows:. The import information begins with the import directory table, which describes the remainder of the import information. The import directory table contains address information that is used to resolve fixup references to the entry points within a DLL image.
The import directory table consists of an array of import directory entries, one entry for each DLL to which the image refers. The last directory entry is empty filled with null values , which indicates the end of the directory table. Each entry uses the bit-field format that is described in the following table. The collection of these entries describes all imports from a given DLL.
The last entry is set to zero NULL to indicate the end of the table. The structure and content of the import address table are identical to those of the import lookup table, until the file is bound.
These addresses are the actual memory addresses of the symbols, although technically they are still called "virtual addresses. It is pointed to by the exception table entry in the image data directory. The entries must be sorted according to the function addresses the first field in each structure before being emitted into the final image. The target platform determines which of the three function table entry format variations described below is used.
The base relocation table contains entries for all base relocations in the image. The Base Relocation Table field in the optional header data directories gives the number of bytes in the base relocation table.
The base relocation table is divided into blocks. Each block represents the base relocations for a 4K page. Each block must start on a bit boundary. The loader is not required to process base relocations that are resolved by the linker, unless the load image cannot be loaded at the image base that is specified in the PE header. The Block Size field is then followed by any number of Type or Offset field entries. Each entry is a WORD 2 bytes and has the following structure:. To apply a base relocation, the difference is calculated between the preferred base address and the base where the image is actually loaded.
If the image is loaded at its preferred base, the difference is zero and thus the base relocations do not have to be applied. TLS is a special storage class that Windows supports in which a data object is not an automatic stack variable, yet is local to each individual thread that runs the code.
Thus, each thread can maintain a different value for a variable declared by using TLS. This implementation enables TLS data to be defined and initialized similarly to ordinary static variables in a program. Statically declared TLS data objects can be used only in statically loaded image files. This field points to a location where the program expects to receive the TLS index. The linker looks for this memory image and uses the data there to create the TLS directory.
Other compilers that support TLS and work with the Microsoft linker must use this same technique. When a thread is created, the loader communicates the address of the thread's TLS array by placing the address of the thread environment block TEB in the FS register.
This behavior is Intel xspecific. The loader assigns the value of the TLS index to the place that was indicated by the Address of Index field. The code uses the TLS index and the TLS array location multiplying the index by 4 and using it as an offset to the array to get the address of the TLS data area for the given program and module.
Each thread has its own TLS data area, but this is transparent to the program, which does not need to know how data is allocated for individual threads. The TLS array is an array of addresses that the system maintains for each thread. The TLS index indicates which member of the array to use. The index is a number meaningful only to the system that identifies the module. The program can provide one or more TLS callback functions to support additional initialization and termination for TLS data objects.
A typical use for such a callback function would be to call constructors and destructors for objects. Although there is typically no more than one callback function, a callback is implemented as an array to make it possible to add additional callback functions if desired. If there is more than one callback function, each function is called in the order in which its address appears in the array.
A null pointer terminates the array. It is perfectly valid to have an empty list no callback supported , in which case the callback array has exactly one member-a null pointer. The Reserved parameter should be set to zero.
The Reason parameter can take the following values:. Current versions of the Microsoft linker and Windows XP and later versions of Windows use a new version of this structure for bit xbased systems that include reserved SEH technology.
This provides a list of safe structured exception handlers that the operating system uses during exception dispatching. Otherwise, the operating system terminates the application. This helps prevent the "x86 exception handler hijacking" exploit that has been used in the past to take control of the operating system. The Microsoft linker automatically provides a default load configuration structure to include the reserved SEH data. If the user code already provides a load configuration structure, it must include the new reserved SEH fields.
The data directory entry for a pre-reserved SEH load configuration structure must specify a particular size of the load configuration structure because the operating system loader always expects it to be a certain value. In that regard, the size is really only a version check. For compatibility with Windows XP and earlier versions of Windows, the size must be 64 for x86 images. Delayload import table in its own. Module contains suppressed export information.
This also infers that the address taken IAT table is also present in the load config. Mask for the subfield that contains the stride of Control Flow Guard function table entries that is, the additional count of bytes per table entry.
Additionally, the Windows SDK winnt. Resources are indexed by a multiple-level binary-sorted tree structure. By convention, however, Windows uses three levels:. A series of resource directory tables relates all of the levels in the following way: Each directory table is followed by a series of directory entries that give the name or identifier ID for that level Type, Name, or Language level and an address of either a data description or another directory table.
If the address points to a data description, then the data is a leaf in the tree. If the address points to another directory table, then that table lists directory entries at the next level down. A leaf's Type, Name, and Language IDs are determined by the path that is taken through directory tables to reach the leaf. The first table determines Type ID, the second table pointed to by the directory entry in the first table determines Name ID, and the third table determines Language ID.
Each resource directory table has the following format. This data structure should be considered the heading of a table because the table actually consists of directory entries described in section 6. The directory entries make up the rows of a table.
Each resource directory entry has the following format. Whether the entry is a Name or ID entry is indicated by the resource directory table, which indicates how many Name and ID entries follow it remember that all the Name entries precede all the ID entries for the table. All entries for the table are sorted in ascending order: the Name entries by case-sensitive string and the ID entries by numeric value. The resource directory string area consists of Unicode strings, which are word-aligned.
These strings are stored together after the last Resource Directory entry and before the first Resource Data entry. This minimizes the impact of these variable-length strings on the alignment of the fixed-size directory entries.
Each resource directory string has the following format:. Each Resource Data entry describes an actual unit of raw data in the Resource Data area.
A Resource Data entry has the following format:. CLR metadata is stored in this section. It is used to indicate that the object file contains managed code. The format of the metadata is not documented, but can be handed to the CLR interfaces for handling metadata. The valid exception handlers of an object are listed in the.
It contains the COFF symbol index of each valid handler, using 4 bytes per index. The COFF archive format provides a standard mechanism for storing collections of object files. These collections are commonly called libraries in programming documentation.
The first 8 bytes of an archive consist of the file signature. The rest of the archive consists of a series of archive members, as follows:. The first and second members are "linker members. You can definitely see that the idata section contains two lists - one of function ptrs and one of function names. Read the links I gave you, especially Undocumented Windows.
It explains how ImageHlp will help you do the disk-to-memory mappings. It seems that I was mistaken - the structure of the PE file seems to be identical in memory. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Privacy policy. You can use them to export and import functions, data, and objects to or from a DLL.
These attributes explicitly define the DLL's interface to its client, which can be the executable file or another DLL. Declaring functions as dllexport eliminates the need for a module-definition. If a class is marked declspec dllexport , any specializations of class templates in the class hierarchy are implicitly marked as declspec dllexport.
When you use code or data from another DLL, you're importing it. When any PE file loads, one of the jobs of the Windows loader is to locate all the imported functions and data and make those addresses available to the file being loaded. I'll save the detailed discussion of data structures used to accomplish this for Part 2 of this article, but it's worth going over the concepts here at a high level. You don't have to do anything to make the addresses of the imported APIs available to your code.
The loader takes care of it all. The alternative is explicit linking. Likewise, if you import from GDI Visual Basic 6. When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs when the program first starts. If there are any problems for example, a referenced DLL that can't be found , the process is aborted.
When you delayload against a DLL, the linker emits something that looks very similar to the data for a regular imported DLL. However, the operating system ignores this data.
Instead, the first time a call to one of the delayloaded APIs occurs, special stubs added by the linker cause the DLL to be loaded if it's not already in memory , followed by a call to GetProcAddress to locate the called API.
Additional magic makes it so that subsequent calls to the API are just as efficient as if the API had been imported normally. Each of these structures gives the name of the imported DLL and points to an array of function pointers.
The array of function pointers is known as the import address table IAT. This last point is particularly important: once a module is loaded, the IAT contains the address that is invoked when calling imported APIs. No matter how many source files you scatter calls to a given API through, all the calls go through the same function pointer in the IAT. Let's examine what the call to an imported API looks like. There are two cases to consider: the efficient way and inefficient way.
In the best case, a call to an imported API looks like this:. If you're not familiar with x86 assembly language, this is a call through a function pointer. In the previous example, address 0x lies within the IAT. The less efficient call to an imported API looks like this:. In this situation, the CALL transfers control to a small stub. The stub is a JMP to the address whose value is at 0x Again, remember that 0x is an entry within the IAT. In a nutshell, the less efficient imported API call uses five bytes of additional code, and takes longer to execute because of the extra JMP.
You're probably wondering why the less efficient method would ever be used. There's a good explanation. Left to its own devices, the compiler can't distinguish between imported API calls and ordinary functions within the same module.
As such, the compiler emits a CALL instruction of the form. Note that this last CALL instruction isn't through a function pointer. Rather, it's an actual code address. The simplest way to do this is to make the call point to a JMP stub, like you just saw. Where does the JMP stub come from? Surprisingly, it comes from the import library for the imported function.
If you were to examine an import library, and examine the code associated with the imported API name, you'd see that it's a JMP stub like the one just shown. What this means is that by default, in the absence of any intervention, imported API calls will use the less efficient form. Logically, the next question to ask is how to get the optimized form. The answer comes in the form of a hint you give to the compiler.
So what does this mean in your everyday life? If you're writing exported functions and providing a. H, and which is used in files such as WinBase. Now let's dig into the actual format of PE files. I'll start from the beginning of the file, and describe the data structures that are present in every PE file. Afterwards, I'll describe the more specialized data structures such as imports or resources that reside within a PE's sections. H, unless otherwise noted.
These structures are almost always identical, except for some widened fields in the bit versions. You should only need to use the 32 or bit specific versions of the structures if you're working with a PE file with size characteristics that are different from those of the platform you're compiling for.
The need for this stub executable arose in the early days of Windows, before a significant number of consumers were running it. When executed on a machine without Windows, the program could at least print out a message saying that Windows was required to run the executable. The differences are so minor that I'll consider them to be the same for the purposes of this discussion.
0コメント