Lookaside Lists

Software doesn’t have to be very complex before its performance-aware programmer wonders whether some of its dynamic memory allocations come and go with such frequency that they might better be freed not by returning them to the operating system’s memory manager but instead by caching them for quick reuse. This is especially attractive when the allocations for a particular purpose are all the same size, so that the cache for that purpose need be nothing more sluggish than a single-linked list. In a simple implementation, allocations are sought from the cache first and only then from the memory manager, and freed memory goes only to the cache. This works well enough when the demand is persistent, but more sophistication is needed when demand has significant peaks, between which the cached allocations become a large amount of memory that is unavailable for other use.

The Windows kernel certainly counts as complex software, especially in combination with its ecology of drivers that execute in kernel mode, and so even version 3.10 provides some sort of caching of fixed-size memory allocations. That first draft, based around exported functions such as ExInitializeZone and macros such as ExAllocateFromZone, is perhaps better described as batching its memory allocations. It also is perhaps better passed over. Though the code that supports this early feature is retained to this day, the relevant functions and macros were documented as obsolete as early as the Device Driver Kit (DDK) for Windows 2000 in favour of the lookaside lists that had been introduced for Windows NT 4.0.

Supporting Structures

The functions that operate on general lookaside lists work through several structures:

The last of these was introduced for version 6.0, apparently to unify the others and supersede them. All are documented as opaque, much of the point being that all manipulation of them (if not all useful inspection of them) can be done through the documented functions. If you looked only at the sizes, you might think the structures have seen significant change:

Version NPAGED_LOOKASIDE_LIST PAGED_LOOKASIDE_LIST LOOKASIDE_LIST_EX
Size (x86) Size (x64) Size (x86) Size (x64) Size (x86) Size (x64)
4.0 to 5.0 0x50   0x60      
5.1 to early 5.2 0x0100   0x0100      
late 5.2 0xC0 0x80 0xC0 0x80    
6.0 and higher 0xC0 0x80 0xC0 0x80 0x48 0x60

It happens, however, that the variation is all explained by changes in alignment requirements of one member, starting at 0x80 bytes when introduced for Windows XP and then reduced to 0x40 for Windows Server 2003 SP1.

General Lookaside List

All three public structures for lookaside lists are based in some sense on what was originally the GENERAL_LOOKASIDE but has since Windows Vista been named the GENERAL_LOOKASIDE_POOL.

Version GENERAL_LOOKASIDE GENERAL_LOOKASIDE_POOL
Size (x86) Size (x64) Size (x86) Size (x64)
4.0 to 5.0 0x48      
5.1 to early 5.2 0x80 0x80    
6.0 and higher 0x80 0x80 0x48 0x60

The plain GENERAL_LOOKASIDE became larger in version 5.1 because the structure was made cache-aligned. The GENERAL_LOOKASIDE_POOL was introduced, without cache alignment, for Windows Vista in a reworking of lookaside lists that are dedicated to per-processor management of small pool allocations. In Windows Vista and since, each KPRCB for a processor has 0x40 or 0x60 of these lookaside lists in two or three arrays. Cache alignment of each structure in the array would waste a lot of space. Thus did the unaligned layout from version 4.0 become restored but need a new name.

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
SLIST_HEADER ListHead;
4.0 to 5.2
union {
    SLIST_HEADER ListHead;
    SINGLE_LIST_ENTRY SingleListHead;
};
6.0 and higher
0x08 0x10
USHORT Depth;
4.0 and higher
0x0A  
USHORT Pad;
4.0 only
0x12
USHORT MaximumDepth;
5.0 and higher
0x0C 0x14
ULONG TotalAllocates;
4.0 and higher
0x10  
ULONG AllocateMisses;
4.0 only
0x18
union {
    ULONG AllocateMisses;
    ULONG AllocateHits;
};
5.0 and higher
0x14 0x1C
ULONG TotalFrees;
4.0 and higher
0x18  
ULONG FreeMisses;
4.0 only
0x20
union {
    ULONG FreeMisses;
    ULONG FreeHits;
};
5.0 and higher
0x1C 0x24
POOL_TYPE Type;
4.0 and higher
0x20 0x28
ULONG Tag;
4.0 and higher
0x24 0x2C
ULONG Size;
4.0 and higher
0x28 0x30
PALLOCATE_FUNCTION Allocate;
4.0 to 5.2
union {
    PALLOCATE_FUNCTION_EX AllocateEx;
    PALLOCATE_FUNCTION Allocate;
};
6.0 and higher
0x2C 0x38
PFREE_FUNCTION Free;
4.0 to 5.2
union {
    PFREE_FUNCTION_EX FreeEx;
    PFREE_FUNCTION Free;
};
6.0 and higher
0x30 0x40
LIST_ENTRY ListEntry;
4.0 and higher
0x38 0x50
ULONG LastTotalAllocates;
4.0 and higher
0x3C  
ULONG LastAllocateMisses;
4.0 only
0x54
union {
    ULONG LastAllocateMisses;
    ULONG LastAllocateHits;
};
5.0 and higher
0x40 0x58
ULONG Future [2];
4.0 and higher

Though the word at offset 0x0A is defined as Pad for version 4.0, it is in fact used as the MaximumDepth all along.

The pool lookaside lists that are dedicated to per-processor management of small pool allocations date from version 5.0. They inherit a slightly different implementation from version 4.0 (see below) in which AllocateMisses, FreeMisses and LastAllocateMisses are instead AllocateHits, FreeHits and LastAllocateHits.

Initialisation of a lookaside list allows the specification of callback functions. In lookaside lists that are initialised by ExInitializeLookasideListEx, these callbacks are slightly different, such that Allocate and Free are instead AllocateEx and FreeEx.

The Non-Paged Lookaside List

The NPAGED_LOOKASIDE_LIST is for non-paged memory that can be allocated and freed even at DISPATCH_LEVEL. It elaborates on the GENERAL_LOOKASIDE by adding a spin lock that provided early versions with synchronisation for access to the ListHead.

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
GENERAL_LOOKASIDE L;
4.0 and higher
0x48 (4.0 to 5.0);
0x80
 
KSPIN_LOCK Lock;
4.0 to 5.0
KSPIN_LOCK Lock__ObsoleteButDoNotDelete;
5.1 and higher

The spin lock never was defined for 64-bit Windows, which never has needed external synchronisation for operations on an SLIST_HEADER. The spin lock became unnecessary for 32-bit Windows too, once Windows XP required the cmpxchg8b instruction. Though the spin lock could not simply be dropped from the definition (lest a new driver with a new NPAGED_LOOKASIDE_LIST find itself running on an older Windows version that would expect to initialise and use the spin lock), its explicit retention has been nothing but waste since space for the spin lock is covered by the contemporaneous introduction of cache alignment.

The Paged Lookaside List

Memory that is managed in a paged lookaside list is allocated and freed at no higher than APC_LEVEL. The PAGED_LOOKASIDE_LIST is a GENERAL_LOOKASIDE plus a mutex instead of a spin lock.

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
GENERAL_LOOKASIDE L;
4.0 and higher
0x48 (4.0 to 5.0);
0x80
 
FAST_MUTEX Lock;
4.0 to 5.0
FAST_MUTEX Lock__ObsoleteButDoNotDelete;
5.1 and higher

Again the mutex never was defined for 64-bit Windows and its explicit retention for 32-bit Windows XP and higher is nothing but waste (because of the structure’s expansion for cache alignment).

The Extended Lookaside List

The LOOKASIDE_LIST_EX, which to some extent replaces both the NPAGED_LOOKASIDE_LIST and PAGED_LOOKASIDE_LIST, is accessed only through functions that are new for Windows Vista and therefore can throw away what were anyway the redundant provisions for backwards compatibility with Windows 2000 and earlier.

Offset (x86) Offset (x64) Definition Versions
0x00 0x00
GENERAL_LOOKASIDE_POOL L;
6.0 and higher

Perhaps more importantly for programming practice is that the LOOKASIDE_LIST_EX also throws away the other structures’ cache alignment. Drivers that want a cache-aligned lookaside list for performance must arrange the alignment themselves or revert to the older structures (and functions).

The Original Pool Lookaside List

If only for completeness, it’s perhaps as well to pick up some history. The lookaside lists that the kernel uses internally to optimise small pool allocations were originally reductions of the public structure. Microsoft’s name for the reduced structure is not known. Names and types of members are inferred by assuming that the GENERAL_LOOKASIDE for version 5.0 incorporated them straightforwardly.

Offset (x86) Definition
0x00
SLIST_HEADER ListHead;
0x08
USHORT Depth;
0x0A
USHORT MaximumDepth;
0x0C
ULONG TotalAllocates;
0x10
ULONG AllocateHits;
0x14
ULONG TotalFrees;
0x18
ULONG FreeHits;
0x1C
ULONG LastTotalAllocates;
0x20
ULONG LastAllocateHits;
0x24
KSPIN_LOCK Lock;

The version 4.0 kernel’s .data section has two arrays of these, for nonpaged and paged allocations respectively. The eight structures in each array cache successively larger allocations from 0x20 bytes to 0x0100.