On Donnerstag, 13. August 2020 07:49:52 CEST Giacinto Cifelli wrote: > Hi all, > > I am wondering if it is possible to interpret a c-preprocessor (the second > preprocessor, not the one expanding trigrams and removing "\\\n") or an m4 > grammar through bison, and in case if it has already been done. > I think this kind of tool does not produce a type-2 Chomsky grammar, > rather a type-1 or even type-0.
The common classification of languages like C I think is "attributed context- free language", and it is in chomsky-2. If you just need to handle the preprocessor part, then all you need is a lexer with stack enabled. A parser (e.g. Bison) only becomes relevant if you also need to process the aspects that come after the preprocessor. > Any idea how to build something like an AST from it? > > The purpose would be to use in a text editor, to know how to format for > example a block between #if/#endif (according to the condition, for example > could be greyed out if false), Just to give you a basic idea how this can be done e.g. with Flex, *very* roughly (i.e. you have to complete it yourself): /* enable functions yy_push_state(), yy_pop_state(), yy_top_state() */ %option stack /* inclusive scanner conditions */ %s PREPROC_BODY_USE /* exclusive scanner conditions */ %x PREPROC_DEFINE PREPROC_DEFINE_BODY PREPROC_IF PREPROC_BODY_EAT DIGIT [0-9] ID [a-zA-Z][a-zA-Z0-9_]* %% /* #define <name> <body> */ <*>"#define"[ \t]* { yy_push_state(PREPROC_DEFINE, yyscanner); yyextra->token = PreprocessorToken(yytext); return PREPROC_TOKEN_TYPE; } <PREPROC_DEFINE>{ID} { yy_pop_state(yyscanner); yy_push_state(PREPROC_DEFINE_BODY, yyscanner); yyextra->macro_name = yytext; yyextra->token = PreprocessorToken(yytext); return PREPROC_TOKEN_TYPE; } <PREPROC_DEFINE_BODY>[^$]* { yy_pop_state(yyscanner); yyextra->token = PreprocessorToken(yytext); yyextra->macro_table[yyextra->macro_name] = yytext; return PREPROC_TOKEN_TYPE; } /* #if <condition> <body> #endif */ <*>#if[ \t]* { yy_push_state(PREPROC_IF, yyscanner); yyextra->token = PreprocessorToken(yytext); return PREPROC_TOKEN_TYPE; } <PREPROC_IF>{ID} { yy_pop_state(yyscanner); if (evaluate(yyextra->macro_table[yytext])) yy_push_state(PREPROC_BODY_USE, yyscanner); else yy_push_state(PREPROC_BODY_EAT, yyscanner); yyextra->token = PreprocessorToken(yytext); return PREPROC_TOKEN_TYPE; } <PREPROC_BODY_EAT>.* /* eat up code block filtered out by preprocessor */ <*>.*"#endif" { yy_pop_state(yyscanner); yyextra->token = PreprocessorToken(yytext); return PREPROC_TOKEN_TYPE; } /* Language keywords */ if|else|const|switch|case|int|unsigned { yyextra->token = KeywordToken(yytext); return KEYWORD_TOKEN_TYPE; } /* String literal */ \"[^"]*\" { yyextra->token = StringLiteralToken(yytext); return STRING_LITERAL_TYPE; } /* Number literal */ {DIGIT}+("."{DIGIT}+)? { yyextra->token = NumberLiteralToken(yytext); return NUMBER_LITERAL_TYPE; } /* Other tokens */ <*>. { yyextra->token = OtherToken(yytext); return OTHER_TOKEN_TYPE; } %% Best regards, Christian Schoenebeck