CL.EXE Command Line Parsing

The parsing of command-line text into tokens varies slightly with the source of the text. The actual command line is parsed by the C Run-Time (CRT) Library, specifically by the __getmainargs function in MSVCR70.DLL, according to Microsoft-specific C-language rules of parsing command-line arguments for a program’s main function. Command-line contributions from environment variables and command files are parsed by the compiler’s own code, which acts similarly but not identically to the CRT. The main difference is that wildcard expansion is a CRT feature and applies only to tokens on the actual command line.

White Space

In general, command-line text is broken into tokens at each occurrence of white space. On the actual command line, the white-space characters are specifically spaces and tabs. For command-line text from environment variables and command files, white space is understood in the sense of the CRT function _ismbcspace (so that it also includes carriage-returns, new-lines and form-feeds).

Quotes and Backslashes

To allow white space within a parsed token, there is a facility for enclosure by double-quotes. The parsing treats both the double-quote and backslash as special characters.

Where a double-quote is preceded by an odd number of backslashes, it is a literal double-quote. What passes into the token is one backslash for each whole pair of backslashes in the text, plus one double-quote.

Where a double-quote is preceded by an even number of backslashes, including by none, it is non-literal. What passes into the token is again one backslash for each pair, but the double-quote is discarded except to signify that until the next non-literal double-quote (if any), white space does not terminate the token but is instead part of the token. The matching non-literal double-quote also is discarded.

Wildcard Expansion

Tokens from the actual command line, but not from environment variables or command files, are subject to wildcard expansion. The eligible tokens are those that contain a * or ? but do not begin with a non-literal double-quote. Each such token is interpreted tentatively as a pathname. If at least some file is found to match the pathname, then the original token with wildcards is replaced by potentially many tokens, one for each matching file, none now containing wildcards.