Geoff Chappell - Software Analyst
Late 1987 brought the formal release of Microsoft’s first extension of DOS into a pre-emptive multi-tasking operating system. That this got very little attention at the time is not insignificant for the history of personal computing but is not the point of this note. The extension has two implementations that differ slightly in that each is specialised to one type of display adapter. These files are named CGA.386 and EGA.386 on the distribution media but are renamed to WIN386.386 when installed. The file format was novel, or at least looks to have been, even after many decades. It was retained for these and other *.386 files in the remaining Windows/386 releases and then seems to have been discarded. This file format’s presentation is very much the point of this note.
Through the whole history of Windows/386, the only files that are known to have this format are:
The four from Windows/386 version 2.10 are also distributed with version 2.11 but they are identical, byte for byte.
Files in this .386 format were surely produced by some sort of linker and the definitive format would be discoverable from studying this linker. What Microsoft used as this linker, however, is not known—well, not to me. Unlike all others of Microsoft’s formats for executables, these have no two-character signature such as MZ, NE, LX, LE or PE.
Instead, if only for now, all that is known of the file format is what relatively little is needed for getting a file loaded. This is done by files that are named *.3EX on the distribution media but are renamed to WIN386.EXE when installed. Let me stress that it is not the business of this note to infer any meaning from observation of the files’ content, only to present what can be deduced with reasonable confidence from the corresponding loaders.
The WIN386.386 file begins with 20h bytes:
|04h||three dwords||each is a size; total must fit available memory|
|18h||word||offset of entry point, from start of loaded program|
|1Ch||byte||low six bits must be 0Ah|
|1Eh||word||0800h bit must be set|
The entry point executes as 16-bit code with real-mode addressing. Registers are undefined, except for cs:ip, necessarily, and ss:sp. This stack is prepared such that execution can return as if from a far call. The loader treats such a return as an error. Although no parameters are passed in registers, the code at this entry point assumes that some data in the loaded program has been prepared by the loader using the symbol table (see below). This data includes a GDT, IDT, TSS and page directory for the program’s protected-mode execution. None of these details are shared through the file format.
This first header is followed immediately by a second, which is 2Ch bytes:
|14h||dword||offset of object table from start of file|
The object table, for want of a better name, is an array of 20h-byte entries.
|00h||word||type||types 0002h and 0003h only|
|02h||byte||flags||type 0002h only|
|08h||dword||offset of contents from start of file||types 0002h and 0003h only|
|0Ch||dword||size, in bytes, of contents in file||types 0002h and 0003h only|
|10h||dword||size, in bytes, of loaded object||type 0002h only|
|14h||dword||linear address of loaded object||type 0002h only|
Only one object table entry is sought of each type 0002h and 0003h. Object table entries before the first of type 0002h are ignored. Presence of an object table entry of type 0002h is simply assumed. Absence of an object table entry of type 0003h is explicitly an error.
The object table entry of type 0002h must have the 3Bh bits all clear in the flags. The object of type 0002h is the whole of the program’s loaded image, as linked from segments (or sections) such as _TEXT and _DATA. In particular, it may contain code and data, and the code can be both 16-bit and 32-bit. Nothing in the object table entry distinguishes these different types of content.
The object of type 0003h is a symbol table.
Very many items are labelled symbolically, not just items of code and data but sections and constants. Each symbol has a name and a value. For an item that occupies memory, the symbol evaluates to the loaded item’s linear address.
|08h||varies||name as null-terminated string|
The loader knows of many symbols (nearly a hundred) to find by name in this symbol table. It uses this knowledge to seed the program’s loaded image with information determined in real mode, especially about the BIOS and DOS, and with addresses either in the loader or of items that have been prepared by the loader. These addresses are depended on for the program’s protected-mode execution, for its temporary returns to virtual-8086 execution, and for an eventual exit to the loader. All this is here treated as an interface between the loader and the program, not itself as part of the file format except for depending heavily on the symbol table.