CURRENT WORK ITEM - PREVIEW ONLY

Binary Search Bug in MmGetSystemRoutineAddress

The MmGetSystemRoutineAddress function has a defect in its coding of a binary search through a module’s exported names. The coding error was first corrected, chronologically speaking, for the version 5.2 from Windows Server 2003 SP1, and is also corrected in the version 5.1 from Windows XP SP3. On affected versions, calling this function to get the address of any routine whose name is alphabetically lower than ExAcquireFastMutex will crash Windows with a bugcheck, if the routine happens not to be exported from whichever kernel or HAL happens to be present.

Though this coding error seems not to have got documented by Microsoft, its capability of causing a bugcheck certainly has been well circulated among driver developers. I learnt of it from visiting the OSR Online site, which presents it as an “Interoffice Memorandum” Re: MmGetSystemRoutineAddress is BROKEN from 2007, apparently to summarise discussions dating back to 2006. The function is the sort of thing that you might easily go a whole career without ever thinking to use, depending on what kernel-mode drivers you write. Indeed, Windows did not even offer this function for the five years of early history before Windows 2000, and even when the function was introduced it wasn’t documented immediately. Yet when the function is needed, it’s likely needed very much and will certainly be needed to work properly.

Defective Code

The coding error is actually in a subroutine, named MiFindExportedRoutineByName, which exists only as a helper to the MmGetSystemRoutineAddress function. The following representation in C will be very like what Microsoft had for this subroutine’s source code when the function was introduced for Windows 2000. I assume that definitions for using the export directory were available from an early version of the same header file, NTIMAGE.H, that Microsoft eventually made public in the Windows XP DDK. The code depends on a structure and an exported function that remain undocumented. Microsoft’s source code would presumably pick up the definition and declaration from yet more header files.

#define     _NTSYSTEM_
#include    <ntddk.h>
#include    <ntimage.h>         // not supplied with Windows 2000 DDK

typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY InLoadOrderLinks;
    LIST_ENTRY InMemoryOrderLinks;
    LIST_ENTRY InInitializationOrderLinks;
    PVOID DllBase;
    /*  etc  */
} LDR_DATA_TABLE_ENTRY;

NTSYSAPI
PVOID
NTAPI
RtlImageDirectoryEntryToData (
    PVOID,
    BOOLEAN,
    ULONG,
    ULONG *);

#define RvaToPtr(base,rva) ((PCHAR) (base) + (rva))

PVOID
MiFindExportedRoutineByName (
    LDR_DATA_TABLE_ENTRY *Ldr,
    PANSI_STRING Name)
{
    PVOID base;
    IMAGE_EXPORT_DIRECTORY *exp;
    ULONG size;
    ULONG *names;
    USHORT *ords;
    ULONG low, high, n;
    int cmp;
    USHORT ord;
    ULONG *funcs;

    base = Ldr -> DllBase;
    exp = (IMAGE_EXPORT_DIRECTORY *) RtlImageDirectoryEntryToData (
            base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &size);

    names = (ULONG *) RvaToPtr (base, exp -> AddressOfNames);
    ords = (USHORT *) RvaToPtr (base, exp -> AddressOfNameOrdinals);

    low = 0;
    high = exp -> NumberOfNames - 1;
    while (high >= low) {
        n = (low + high) >> 1;
        cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (base, names [n]));
        if (cmp < 0) {
            high = n - 1;
        }
        else if (cmp > 0) {
            low = n + 1;
        }
        else {
            break;
        }
    }
    if (high < low) return NULL;

    ord = ords [n];
    if (ord >= exp -> NumberOfFunctions) return NULL;

    funcs = (ULONG *) RvaToPtr (base, exp -> AddressOfFunctions);
    return (PVOID) RvaToPtr (base, funcs [ord]);
}

If you fancy that bugs are best spotted by reading source code, then please indulge me by spotting what’s wrong with this code before reading on. If you think the defect stands out, then consider whether that’s because you have been alerted to its presence.

The main problem is the use of unsigned variables for the low and high indices in the binary search. Different programmers have very different attitudes to the use of signed verus unsigned integral types. To me, perhaps from having started my programming with 8086 assembly language, the indices into the export directory’s array of RVAs for names look intrinsically unsigned, as much as do the RVAs themselves. Yet some algorithms are more easily coded if indices are signed. A binary search is one such case. The coding above is natural if using signed indices. But with unsigned indices, the subtraction when the string comparison is negative risks an underflow, which will not be caught by the unsigned comparison that governs the loop. If the indices are to be unsigned, then something must be done to guard against the underflow. Alternatively, the function should switch to signed indices.

Confused Signage

This seems as good a place as any for a diversion on the potential usefulness of reading binary code even when source code is available. Really, this article exists mainly as an excuse to wheel out this hobby horse.

In binary code, it can’t help but stand out that the loop’s governing condition is not tested on entry but is tested twice at exit. Indeed, the reader of binary code who translates to C, if only mentally, would most naturally interpret this code in terms of a do loop:

do {
    n = (low + high) >> 1;
    cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (base, names [n]));
    if (cmp < 0) {
        high = n - 1;
    }
    else if (cmp > 0) {
        low = n + 1;
    }
    else {
        break;
    }
} while (high >= low);
if (high < low) return NULL;

and immediately wonder at the clumsiness. Any programmer who could write the preceding code, truly intending that the loop condition not be tested on entry, might have noticed the duplication of comparisons at the end and have simplified to:

for (;;) {
    n = (low + high) >> 1;
    cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (base, names [n]));
    if (cmp < 0) {
        high = n - 1;
    }
    else if (cmp > 0) {
        low = n + 1;
    }
    else {
        break;
    }
    if (high < low) return NULL;
}

Since this evidently did not happen, there is a suggestion that the programmer actually did intend to test the loop condition on entry, in a while loop, and thought it meaningful to do so. See that this is distinct from a programmer who codes a while or for loop knowing full well that the governing condition is satisfied trivially on entry. Many programmers do that, sometimes thinking it’s clever but more often just as a preference in style (for while and for loops over do loops). Here, the inference is that the programmer may not have realised that the governing condition is satisfied trivially on entry, which suggests in turn that the programmer may not have been fully alert to the implications of using unsigned rather than signed integers. That the condition is not tested on entry but is tested twice on exit is not of itself an error, but it is a cue for the reverse engineer or code reviewer to look deeper.

You can get to the same cue, of course, from the source code—but not nearly as easily and reliably. Just to know whether comparisons are unsigned or signed requires the source-code reviewer to check the types of both operands. The careful reviewer will aim to do that, always, but it does require sustained attention and is therefore prone to get overlooked. Only by seeing that the comparison is unsigned and that one operand is zero will the source-code reviewer realise that the while loop’s condition is satisfied trivially on entry and may as well not be there. Even then, nothing will seem strange unless the reviewer realises the implication that the condition is redundantly tested twice on exit. Only now has the source-code reviewer caught up, to begin inferring that the code may not be exactly what its programmer intended.

Note, by the way, that the triviality of the loop condition on entry is arguably a coding lapse by itself, specifically for allowing an underflow when computing the initial value of the high index for the search. This is unimportant in practice because the code will execute only for the kernel and HAL: if the export directory for either of these has no names, then this underflow will be the least of anyone’s problems. Still, solid code would defend against this.

Attempted Fix

That something was wrong with the initial code’s binary-search algorithm, and specifically with whether to use signed or unsigned indices, appears to have been realised for Windows XP. The significant editing is highlighted. See that the code has changed in a few places for various reasons.

PVOID
MiFindExportedRoutineByName (
    PVOID Base,
    PANSI_STRING Name)
{
    IMAGE_EXPORT_DIRECTORY *exp;
    ULONG size;
    ULONG *names;
    USHORT *ords;
    ULONG low, high, n;
    int cmp;
    USHORT ord;
    ULONG *funcs;

    exp = (IMAGE_EXPORT_DIRECTORY *) RtlImageDirectoryEntryToData (
            Base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &size);
    if (exp == NULL) return NULL;

    names = (ULONG *) RvaToPtr (Base, exp -> AddressOfNames);
    ords = (USHORT *) RvaToPtr (Base, exp -> AddressOfNameOrdinals);

    low = 0;
    high = exp -> NumberOfNames - 1;
    while (high >= low) {
        n = (low + high) >> 1;
        cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (Base, names [n]));
        if (cmp < 0) {
            high = n - 1;
        }
        else if (cmp > 0) {
            low = n + 1;
        }
        else {
            break;
        }
    }
    if ((LONG) high < (LONG) low) return NULL;

    ord = ords [n];
    if (ord >= exp -> NumberOfFunctions) return NULL;

    funcs = (ULONG *) RvaToPtr (Base, exp -> AddressOfFunctions);
    return (PVOID) RvaToPtr (Base, funcs [ord]);
}

Notably, the interface between MmGetSystemRoutineAddress and MiFindExportedRoutineByName is tidied, so that the latter is given just what it needs from the undocumented structure that holds data about loaded modules. Less notably, someone has thought it better to check that the module actually does have an export directory. But what stands out are those casts to LONG when testing the loop’s exit. They stand out so much that my first thought when representing this function as source code was that there must be an inlined subroutine so that the casting can be implicit. For example:

__forceinline
static
ULONG
SearchExportedNames (
    PCSTR Name,
    PVOID Base,
    ULONG *ExportedNames,
    ULONG *Low,
    ULONG *High)
{
    ULONG n;
    int cmp;
    while (*High >= *Low) {
        n = (*Low + *High) >> 1;
        cmp = strcmp (Name, (PSTR) RvaToPtr (Base, ExportedNames [n]));
        if (cmp < 0) {
            *High = n - 1;
        }
        else if (cmp > 0) {
            *Low = n + 1;
        }
        else {
            break;
        }
    }
    return n;
}

PVOID
__stdcall
MiFindExportedRoutineByName (
    PVOID Base,
    PANSI_STRING Name)
{
    IMAGE_EXPORT_DIRECTORY *exp;
    ULONG size;
    ULONG *names;
    USHORT *ords;
    LONG low, high, n;
    USHORT ord;
    ULONG *funcs;

    exp = (IMAGE_EXPORT_DIRECTORY *) RtlImageDirectoryEntryToData (
            Base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &size);
    if (exp == NULL) return NULL;

    names = (ULONG *) RvaToPtr (Base, exp -> AddressOfNames);
    ords = (USHORT *) RvaToPtr (Base, exp -> AddressOfNameOrdinals);

    low = 0;
    high = exp -> NumberOfNames - 1;
    n = SearchExportedNames (Name -> Buffer, Base, names, &low, &high);
    if (high < low) return NULL;

    ord = ords [n];
    if (ord >= exp -> NumberOfFunctions) return NULL;

    funcs = (ULONG *) RvaToPtr (Base, exp -> AddressOfFunctions);
    return (PVOID) RvaToPtr (Base, funcs [ord]);
}

Here, the inlined subroutine uses only unsigned indices and its caller uses only signed indices. The mismatch might easily be overlooked, both when writing the code and in any amount of reviewing the source code. Though the stricter type checking of C++ would make an error (C2664) of the mismatch, the only alert from the C compiler is a warning (C4057) at level 4, which would be noticed by hardly anyone. Even this mismatch of argument types is avoidable through a further variation:

__forceinline
static
LONG
SearchExportedNames (
    PCSTR Name,
    PVOID Base,
    ULONG *ExportedNames,
    LONG *Low,
    LONG *High)
{
    ULONG low = *Low, n, high = *High;
    int cmp;
    while (high >= low) {
        n = (low + high) >> 1;
        cmp = strcmp (Name, (PSTR) RvaToPtr (Base, ExportedNames [n]));
        if (cmp < 0) {
            high = n - 1;
        }
        else if (cmp > 0) {
            low = n + 1;
        }
        else {
            break;
        }
    }
    *Low = low;
    *High = high;
    return n;
}

Compile with this for the inlined subroutine, whether as C or C++, and the only warnings even at level 4 are from Microsoft’s own header files. The hypothesis for this last variation would be that the declaration of local variables as unsigned in the inlined subroutine is just one careless mistake by a programmer who did intend to use signed arithmetic. It’s even imaginable that the programmer started with code that uses unsigned indices throughout, realised the algorithm depends on signed arithmetic for correct operation, set about fixing it by changing ULONG to LONG for all indices, but overlooked one occurrence. We’ve all done something similar in a quick edit, and been grateful to have the compiler pick it up. This is just an editing oversight that the compiler won’t have caught.

All the preceding variations are plausible. If you compile any of them with /Oxs optimisation (and /Gz to have __stdcall as the default calling convention) using the compiler and headers from the DDK version 2600.1106 for Windows XP SP1, then the assembly-language listing you get is an exact match with the binary code in the NTOSKRNL.EXE build 5.1.2600.1106 from Windows XP SP1.1

Of course, the variations with the inlined subroutine are undeniably contrived, if not downright ugly. Though the binary search is entirely in the subroutine, the caller is left to test for failure by inspecting how the subroutine has changed the indices that were given as the search’s range. Yet removal of a binary search algorithm to an inlined subroutine does have a precedent elsewhere in the kernel, specifically in the LookupEntryPoint subroutine which the kernel uses when loading NTDLL.DLL. Despite the contrivance then, the inlined subroutine is not so easy to reject, not least for its merit of making the mixing of signed and unsigned relatively easy to explain as one or another simple oversight.

Whatever the coding for these early builds of Windows XP, note again that the reviewer of binary code has potential advantages over the source-code reviewer. You could not sensibly think yourself competent as a reverse engineer if you could read the binary code for MiFindExportedRoutineByName in these builds but miss the mixing of signed and unsigned arithmetic. It stands out so much when reading assembly-language mnemonics because the conditional jump for a comparison is a different instruction if the arithmetic is signed rather than unsigned. The difference in treatment for the same variables in successive comparisons is an immediate alert. If the difference comes from implicit casts in source code, then it will be spotted only with careful attention. Even with explicit casts, the potential is that source code will have comments: the programmer who inserted those casts may have left an explanation, which a reviewer might easily accept too readily.

First Fix

Remember that the main problem is that the binary search algorithm is coded with unsigned arithmetic when signed is needed, with a side-effect that the high index for the search is computed without allowing for an underflow in the unusual case that the module has no exported names. The straightforward fix is to change to signed arithmetic, which is what Microsoft did for Windows Server 2003 SP1:

PVOID
MiFindExportedRoutineByName (
    PVOID Base,
    PANSI_STRING Name)
{
    IMAGE_EXPORT_DIRECTORY *exp;
    ULONG size;
    ULONG *names;
    USHORT *ords;
    LONG low, high, n;
    int cmp;
    USHORT ord;
    ULONG *funcs;

    exp = (IMAGE_EXPORT_DIRECTORY *) RtlImageDirectoryEntryToData (
            Base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &size);
    if (exp == NULL) return NULL;

    names = (ULONG *) RvaToPtr (Base, exp -> AddressOfNames);
    ords = (USHORT *) RvaToPtr (Base, exp -> AddressOfNameOrdinals);

    low = 0;
    high = exp -> NumberOfNames - 1;
    while (high >= low) {
        n = (low + high) >> 1;
        cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (Base, names [n]));
        if (cmp < 0) {
            high = n - 1;
        }
        else if (cmp > 0) {
            low = n + 1;
        }
        else {
            break;
        }
    }
    if (high < low) return NULL;

    ord = ords [n];
    if (ord >= exp -> NumberOfFunctions) return NULL;

    funcs = (ULONG *) RvaToPtr (Base, exp -> AddressOfFunctions);
    return (PVOID) RvaToPtr (Base, funcs [ord]);
}

Using the compiler and headers from the DDK version 3790.1830 for Windows Server 2003 SP1, the assembly-language listing you get for the x86 architecture is an exact match with the binary code in the NTOSKRNL.EXE build 5.2.3790.1830 from Windows Server 2003 SP1, subject only to rearrangement of a few branches which I presume is an outcome of profile-guided optimisation (PGO). In the 64-bit kernel from both SP1 and SP2 of Windows Server 2003, MiFindExportedRoutineByName is itself inlined into MmGetSystemRoutineAddress, as are several other functions, and matching exactly with binary code is impractical enough that I have not attempted it.

Second Fix

A coding error that can cause MmGetSystemRoutineAddress to crash Windows clearly needed to be fixed not just in the first service pack of version 5.2 to follow the bug’s first successful fix but also in the next service pack of version 5.1, i.e., for Windows XP SP3. Compile the following with /Oxs optimisation using the compiler and headers from the DDK version 3790.1830 for Windows Server 2003 SP1 and the binary code you get is exactly what’s in the NTOSKRNL.EXE build 5.1.2600.5512 from Windows XP SP3, subject to rearrangement of a few branches because of PGO.

PVOID
MiFindExportedRoutineByName (
    PVOID Base,
    PANSI_STRING Name)
{
    IMAGE_EXPORT_DIRECTORY *exp;
    ULONG size;
    ULONG *names;
    USHORT *ords;
    ULONG low, high, n;
    int cmp;
    USHORT ord;
    ULONG *funcs;

    exp = (IMAGE_EXPORT_DIRECTORY *) RtlImageDirectoryEntryToData (
            Base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &size);
    if (exp == NULL) return NULL;

    names = (ULONG *) RvaToPtr (Base, exp -> AddressOfNames);
    ords = (USHORT *) RvaToPtr (Base, exp -> AddressOfNameOrdinals);

    low = 0;
    high = exp -> NumberOfNames - 1;
    while (high >= low) {
        n = (low + high) >> 1;
        cmp = strcmp (Name -> Buffer, (PSTR) RvaToPtr (Base, names [n]));
        if (cmp < 0) {
            if (n == 0) return NULL;
            high = n - 1;
        }
        else if (cmp > 0) {
            low = n + 1;
        }
        else {
            break;
        }
    }
    if ((LONG) high < (LONG) low) return NULL;

    ord = ords [n];
    if (ord >= exp -> NumberOfFunctions) return NULL;

    funcs = (ULONG *) RvaToPtr (Base, exp -> AddressOfFunctions);
    return (PVOID) RvaToPtr (Base, funcs [ord]);
}

This sees the retention of unsigned arithmetic for the algorithm but with a check for underflow as a new sort of failure for the search. It also sees the retention of a signed comparison after the loop, which suggests to me that this fix was devised by editing the pre-existing code for version 5.1 without reference to the chronologically earlier fix for version 5.2.


[1] Windows XP SP1 is specially convenient for present purposes since MiFindExportedRoutineByName is contiguous as binary code. In most builds, the binary code is rearranged and scattered because of Profile Guided Optimization (PGO).