New and Updated in May 2019

A gauche attempt on Twitter to suggest that poor performance of a relatively new Windows feature—described by Microsoft as “a highly-optimized platform security feature”—may have other explanations than incompetent choice of algorithm led to my thinking that I might usefully get up-to-date about the feature and especially that I should put my money where my mouth is about how observation is not explanation.

One of the things that consistently surprises me about that side of the software industry that examines technical shortcomings is that we easily enough prize observation but see little commercial imperative in converting our observation of a fault into an understanding of it. Much of the reason, of course, is that when the fault is in somebody else’s software then the cost of understanding it is for the somebody else. Fair enough, too! Indeed, the disciplined observation that’s typically required to get the somebody else’s attention (even after you’ve been brushed off countless times by several levels of obtuse screeners) and to pinpoint for them where to start looking is not cost-free either. But I think this all disguises something: the cost of understanding, as most of us perceive this cost, is higher than it would be if we were more skilled at the understanding. Perhaps because we tend to think of this work as something to push off to others or to brush away, we tend not to cultivate the necessary investigative skills or to support the maintenance of huge amounts of background knowledge that give the skills more to feed on. Anything that we’re not well practised at inevitably seems more difficult and costly than it really is.

Now, this is not the place to talk of the particular observations that I investigated for an explanation—or for me to write up the explanation. To me, explaining what caused O(n^2) in CreateProcess from spending crazy amounts of time in a routine that is “too large to be easily reverse engineered” was incidental work that very plausibly took me less time than the author, Bruce Dawson, spent on making the observations (and writing them up). This is not at all to say that Bruce’s observations weren’t worth making or that they weren’t made well: disciplined observation is no small skill. It’s also not to say that an explanation isn’t worth having. Microsoft will easily enough fix its bug. Sufficiently interested “security researchers” will then pick apart what’s changed and get some idea of the problem’s cause. But I, for one, do count it as valuable that this time we have an explanation in advance. A relatively slight coding error is credibly the cause of numerous reports over roughly five years that this highly promoted security feature is a poor performer despite what its manufacturer claims is high optimisation. Our society needs more resources for independent evaluation of what our software does. Our industry ought to be better at delivering this.

Of course, I too have no end of excuses about how I could do this or that study but can’t regard it as practical in competition with everything else that seems more important. I very likely never will write up this passing curiosity of Control Flow Guard’s poor performance at something that doesn’t directly affect me. I did, however, take it as a spur to catch up on the feature overall as an important development in Windows which my busy life had somehow left neglected. In the process I had to draw on rather many wide-ranging notes on memory management in the Windows kernel, both old and new, some in good shape, most not, including many that I started piecing together in 2016 but did not publish before moving on to other things. Let’s see how far I can get with a publication effort this time.

Kernel

It will be some time—a few weekends, anyway—before this work settles. It will be even longer before I return to the documentation I started of Image File Execution Options, including the Global Flags. That was supposed to be new for April.