NICTA Embedded Systems Public Seminar
Parsing All of C by Taming the Preprocessor
Dr Robert Grimm, NYU, New York, USA
Time/Venue
Monday 15 August, 12 noon
NICTA, Neville Roach Laboratory, Level 1 Seminar Room West, 223 Anzac Parade (Building L5), Kensington NSW 2052
Abstract
Given the continuing popularity of C for building large-scale programs, such as Linux, Apache, and Bind, it is critical to provide effective tool support, including, for example, code browsing, bug finding, and automated refactoring. Common to all such tools is a need to parse C. But C programs contain not only the C language proper but also preprocessor invocations for file inclusion (#include), conditional compilation (#if, #ifdef, and so on), and macro definition/expansion (#define). Worse, the preprocessor is a textual substitution system, which is oblivious to C constructs and operates on individual tokens. At the same time, the preprocessor is indispensable for improving C's expressivity, abstracting over software/hardware dependencies, and deriving variations from the same code base. The x86 version of the Linux kernel, for example, depends on about 7,600 header files for file inclusion, 7,000 configuration variables for conditional compilation, and 520,000 macros for code expansion.
In this talk, I present a new tool for parsing all of C, including arbitrary preprocessor use. Our tool, which is called SuperC, is based on a systematic analysis of all interactions between lexing, preprocessing, and parsing to ensure completeness. It first lexes and preprocesses source code while preserving conditionals. It then parses the result using a novel variant of LR parsing, which automatically splits parsers when encountering a conditional and joins them again when reaching the same input in the same state. The result is a well-formed AST, containing static choice nodes for conditionals. While the parsing algorithm and engine are new, neither grammar nor LR parser table generator need to change. I discuss the results of our problem analysis, the parsing algorithm itself, the pragmatics of building a real-world tool, and a demonstration on the x86 version of the Linux kernel.
This is joint work with Paul Gazzillo.
Biography:
Robert Grimm received his Ph.D. in 2002 from the University of Washington for his work on system support for pervasive applications. He then became a faculty member in the Department of Computer Science at New York University, where he continues to teach. His research covers both systems and languages, including stream processing and multilingual programming, respectively.

