HV_UINT128

From one perspective, this page’s existence to point out one small thing about one small structure—well, union—is undeniably petty. From another, it’s the simplest imaginable example that the real-world practice of reverse engineering Windows, even by its top practioners, is deeply deficient as scholarly research.

Of course, almost all study of Windows is much more aptly described as hacking than as research. If hacking is what you aim for or think is all the world needs for knowing about the extant operating system for the computing to which our ready and safe access is increasingy just assumed, then please do write off this page as petty.

The HV_UINT128 (formally _HV_UINT128) is a small type in the tradition of the age-old LARGE_INTEGER. It packages as a little like an integer something that is too large for the compiler of the time to treat as an actual integer. When the LARGE_INTEGER was devised, evidently well ahead of the release of Windows NT 3.1, Microsoft’s compiler did not yet have a 64-bit integral type, but the new Windows kernel needed to be able to work with physical (memory) addresses and byte offsets (into disk storage) that exceed the 4GB of a 32-bit integer. For a semblance of a 64-bit integer, Microsoft defined an aggregate of two 32-bit integers, one as the LowPart, the other as the HighPart, and made a union of their structure, perhaps primarily in anticipation of eventually having a 64-bit integer to define as the QuadPart but immediately to use the artifice of union with a double as the means of getting 8-byte alignment for their 64-bit pseudo-integer (the compiler being still many years from having a __declspec for specifying unusual alignment requirements). The HV_UINT128 is similar in that it composes a 128-bit type from two 64-bit integers with the intention that the whole may be treated vaguely like an integer.

The first publication that I know of the HV_UINT128 by Microsoft is in a header named HVGDK.H which Microsoft distributed in the Windows Driver Kit (WDK) for Windows 7. It was then a structure:

typedef struct DECLSPEC_ALIGN (16) _HV_UINT128 {

    UINT64 Low64;
    UINT64 High64;

} HV_UINT128, *PHV_UINT128;

The Windows 7 WDK looks to be the last that programmers in general got to see HVGDK.H from Microsoft. For the Windows 8 WDK, documentation was not included but was separately downloadable to merge with Visual Studio 2012. The reference pages for the Hypervisor mention the HV_UINT128 in documentation of larger (actually interesting) types. Throughout this documentation, programmers are directed to HVGDK.H as the header to #include for use of these types in their code, but the WDK itself has no such header. The oversight may have been not that the header was missing but that documentation was retained: it’s gone from what Microsoft supplied for merging with Visual Studio 2013.

Ask Google today, 24th November 2022, to search the web for mention of “hvgdk.h” and what you get first is nothing from Microsoft but is instead an attempted reconstruction of an updated HVGDK.H by Alex Ionescu for an (unofficial) Hyper-V Development Kit. Of course this can only have been worth Alex’s trouble because he too sees that Microsoft’s original programmatic support has been withdrawn. Others of Google’s search results confirm that programmers, though surely not many, have indeed wondered where Microsoft’s header has disappeared to. The answer has even picked up some sense of pursuit and rumour, what with talk of updated versions in something called the Singularity OS (whose availability is restricted in ways that do not ordinarily count as publication) and at the other end of the history a report that HVGDK.H dates from the “WDK 6.0” (not that I can find it in my archived copy). Whatever. Even if Microsoft’s HVGDK.H had wider distribution than just the Windows 7 WDK, the thing evidently has been unavailable for long enough that someone who wants programmatic interaction with modern versions of Hyper-V has found it worthwhile to write and publish an “unofficial” replacement. We might even call the effort public-spirited.

So, what’s my point of criticism?

First, the facts of the matter. The reconstructed HVGDK.H reproduces the definition from Microsoft’s HVGDK.H such as it’s known from the Windows 7 WDK. And what else would anyone do without evidence of a change! But as long ago as Windows 8, Microsoft had not only moved HV_UINT128 to a new header, named HVGDK_MINI.H, but had changed it from a struct to a union and given it a third member. The C-language definition ever since must be something very like:

typedef union DECLSPEC_ALIGN (16) _HV_UINT128 {
    
    struct {
        UINT64 Low64;
        UINT64 High64;
    };

    UINT32 Dword [4];

} HV_UINT128;

This is knowable from type information in a statically linked library named CLFSMGMT.LIB which Microsoft distributes with the Software Development Kit (SDK), starting as long ago as Windows Vista. This type information even tells that the two opening braces in the definition are two lines apart in Windows 8 through to the 1511 release of Windows 10 (by which time they’re on lines 82 and 84) but then become three lines apart. The distance to the definition of the next type (HV_UINT256) strongly suggests another two lines of white space around the new member.

Second, my appraisal. Does it matter that Alex’s reconstruction misses this change? Almost certainly not, especially if we assess the reconstruction less for whether it’s thorough as reverse engineering than for whether it’s useful to its target audience of programmers and security researchers. Definitions in the reconstructed header can stray in all sorts of ways from Microsoft’s without any programmer being much put out—even for types that are much more important than HV_UINT128. Not having Microsoft’s extra member for alternatively accessing the two qwords as as four dwords is in practice neither here nor there.

Neither does it matter that the reconstruction was written and presented without scouring all of Microsoft’s published output for obscure references to every type that has anything to do with Hyper-V. Type information in libraries for static linking surely does count as obscure. Even I tend to look for it only when pursuing some detail about Windows versions that predate Microsoft’s inclusion of type information in public symbol files.

My gripe is not with deficiency from overlooking a rich source of information. That happens. None of us can be on top of finding everything. It is instead with deficiency in citing sources. Readers, whether programmers, reviewers or other researchers, have a reasonable interest in knowing where the information came from. What passes for the literature on reverse engineering Windows has too much mystery to it. No matter how useful or desperately needed may be the information, even if it’s correct down to every extractable detail, nobody really gains from its appearing as if by magic.

Different readers will have different expectations of what counts as reliable or legitimate. Different writers will draw their lines differently about what needs to be disclosed. I, for instance, don’t spell out my methods of reading type information in symbol files or libraries, or of any other technique to my reverse engineering, any more than I would expect that a paper in higher mathematics will labour over how its author does basic algebra. I write for an advanced readership and I think that if anything about the technique looks like magic to a less advanced reader, then it’s for them to learn more and practise harder. There’s no mystery if what’s missing is skill, experience or perseverance. Conversely, I do not appear skillful to my readers if they suspect that my achievement owes anything to accessing secret sources. It’s one thing if readers have to work hard to keep up. It’s another if they can’t verify my work for not knowing what sources to check against or if they waste their time retreading my analyses for not knowing what I already covered.

But these, and much more, are just standard arguments for openness in research. How they apply to published research into Windows internals is that it’s nowhere near sufficient to say just that you got your information from public symbols meaning mainly to contrast with having got your information from private symbols or source code. Do so if your aim is to discourage the attention of copyright lawyers. Cite just public symbols if the only ones you mean are those of the binary that’s studied. Otherwise, good citation practice is to specify which other symbols your work relied on.

Alex’s reconstructed HVGDK.H itself makes no more informative disclosure than “Changes made based on symbols” and then “Changes made based on new symbol source”. Not only are possibly obscure symbol sources left unspecified but “based on” leaves the reader to differentiate what in the published research has come directly from symbols and what has been invented to fill gaps in what can be known from symbols. The separate page of introduction says more, and is indeed much better about citation practice than is almost any other work I’ve seen in this field, and yet “modified with”, “inferred data” and “certain user-mode binaries” can hardly be thought specific enough to advance the research. I’ll even venture that the reconstruction would already be more accurate as reverse engineering had its sources been better cited, if only for spurring readers to look wider for other sources with some assurance that they won’t just be reworking the seams that Alex has already mined.

Everyone who works in this field, me included, would better lift their game on this.