Hello everyone, Google Summer of Code provided me the opportunity to work with Automake during the summers. Here is the final report on work done during the same.
___________________________________________ [GSoC'18] Parse Makefile.am using Abstract Syntax Tree - Final Report ___________________________________________ Branch:- experimental/gsoc/ast http://git.savannah.gnu.org/cgit/automake.git/log/?h=experimental/gsoc/ast Commit History:- cf633fb61 to 5aebec2cc 1. Introduction ============ The goal of the project was to generate an abstract syntax tree after parsing the input Makefile.am . It has separate lexer, parser and tree generation module so that new features can be added incrementally. For debugging purpose the AST generated is displayed as image using graph description language. The current implementation can parse the following type of statements: * Primaries : All the primaries are identified. * Multiline statements : Statements on multiple lines ending with backslash and its combination with other type of statements * Conditional statements : Single or nested conditional statements [ if-else-endif block ] * Subdirs directive : The parser parses the Makefile.am file in the said subdirectory and generates the tree in that subdirectory. * Include directive : The parser recursively parses the included file and merge the generated tree to the parent tree by serializing at child process and deserializing at parent process. 2. Technical Description =================== * Lexer :- It converts the input file line by line into tokens using perl regex. Whenever called by parser, it returns an array of tokens. Each token has a name and if required, a value. * Bison Grammar :- The grammar of Automake is written in GNU Bison. It is used to generate state description and state transition diagram. Writing in Bison is easy and exact syntax can be captured. Adding new features and debugging them becomes easy as no change is required in parser. The tool is only required at maintainer end to generate the parsing table. * Parsing Table :- It is an array of hashes. Each array index indicate the state and hash consist of key value pair (Key for token acceptable for that state and value is the next transition state). For reduction of grammar and generation of tree, a special 'reduce' key is used in Hash. It points to array containing number of tokens to reduce and corresponding function to call in Tree module. The start state is 0 and acceptance state is stored in variable with value decided by converter. * Converter :- It takes as input the state description and state transition file generated from Bison and outputs the parsing table in a format understood by perl. * Parser and AST :- It parses the input file using tokens provided by lexer and the parsing table provided by converter. LR parsing algorithm similar to this [https://goo.gl/images/a4NWva] is implemented. At each reduction of the grammar, the node is generated for the tree by calling its corresponding function in tree module specified in parsing table. After successful parsing, the parser outputs the generated tree in DOT format. * Graph Description Language and dot utility :- For visualization of abstract syntax tree, it is printed in the graph description language. It provide an easy interface to display directed and undirected graphs. As the size of input file increases, the tree increases in width significantly as compared to height as all the statements identified during parsing are attached to single master node. Unflatten utility is used to fix this. At last, dot utility is used to convert the graph to image file. * Testing :- Test cases for features implemented are provided in âtâ directory A simple script is executed which generates AST for the test files. Currently visual inspection of tree is required for validation. 3. Future Work ============= * Make rules :- Currently only basic make rules are identified by parser. Grammar needs to be extended to handle complex make rules. * Using AST to generate Makefile.in :- Current project was to generate AST from input file. This AST can be used to generate Makefile.in. Converting AST to the output file is another big task. It would require to implement how different variables are handled, what to output for corresponding statement and how to integrate with the current code base. 4. Conclusion ============ Learnt and enjoyed a lot. Applying the compiler theory learnt in class to practical work with a language I was knowing only some basics was a wonderfull experience. It also helped me to understand and follow good coding practices. Most of the automake features are implemented. I will be working to fix the grammar to identify complex make rules. With that, AST can be generated for most of the files which are identified by Automake. As this is a new feature, in the current state it cannot be merged in codebase. When generation of Makefile.in from AST be implemented can it be merged. I would like to continue working with that part, but some guidance in that direction will be required. Sorry for delay in presenting the report as I was not well. --- Vishal Gupta