Re: C preprocessor

2020-08-14 Thread Ervin Hegedüs
Hi all,

On Fri, Aug 14, 2020 at 11:09:06AM +0200, Hans Åberg wrote:
> 
> > On 13 Aug 2020, at 07:49, Giacinto Cifelli  wrote:
> > 
> > I am wondering if it is possible to interpret a c-preprocessor (the second
> > preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> > grammar through bison, and in case if it has already been done.
> > I think  this kind of tool does not produce a type-2 Chomsky grammar,
> > rather a type-1 or even type-0.
> > Any idea how to build something like an AST from it?
> 
> There is a Yaccable C-grammar:
>   http://www.quut.com/c/ANSI-C-grammar-y.html

IMHO C macro's are not part of C grammar.


a.




Re: C preprocessor

2020-08-14 Thread Hans Åberg


> On 13 Aug 2020, at 07:49, Giacinto Cifelli  wrote:
> 
> I am wondering if it is possible to interpret a c-preprocessor (the second
> preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> grammar through bison, and in case if it has already been done.
> I think  this kind of tool does not produce a type-2 Chomsky grammar,
> rather a type-1 or even type-0.
> Any idea how to build something like an AST from it?

There is a Yaccable C-grammar:
  http://www.quut.com/c/ANSI-C-grammar-y.html





Re: C preprocessor

2020-08-14 Thread Christian Schoenebeck
On Donnerstag, 13. August 2020 07:49:52 CEST Giacinto Cifelli wrote:
> Hi all,
> 
> I am wondering if it is possible to interpret a c-preprocessor (the second
> preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> grammar through bison, and in case if it has already been done.
> I think  this kind of tool does not produce a type-2 Chomsky grammar,
> rather a type-1 or even type-0.

The common classification of languages like C I think is "attributed context-
free language", and it is in chomsky-2.

If you just need to handle the preprocessor part, then all you need is a lexer
with stack enabled. A parser (e.g. Bison) only becomes relevant if you also
need to process the aspects that come after the preprocessor.

> Any idea how to build something like an AST from it?
> 
> The purpose would be to use in a text editor, to know how to format for
> example a block between #if/#endif (according to the condition, for example
> could be greyed out if false),

Just to give you a basic idea how this can be done e.g. with Flex, *very*
roughly (i.e. you have to complete it yourself):


/* enable functions yy_push_state(), yy_pop_state(), yy_top_state() */
%option stack

/* inclusive scanner conditions */
%s PREPROC_BODY_USE
/* exclusive scanner conditions */
%x PREPROC_DEFINE PREPROC_DEFINE_BODY PREPROC_IF PREPROC_BODY_EAT

DIGIT[0-9]
ID   [a-zA-Z][a-zA-Z0-9_]*

%%

 /* #define   */

<*>"#define"[ \t]* {
yy_push_state(PREPROC_DEFINE, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}

{ID} {
   yy_pop_state(yyscanner);
yy_push_state(PREPROC_DEFINE_BODY, yyscanner);
yyextra->macro_name = yytext;
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}

[^$]* {
yy_pop_state(yyscanner);
yyextra->token = PreprocessorToken(yytext);
yyextra->macro_table[yyextra->macro_name] = yytext;
return PREPROC_TOKEN_TYPE;
}


 /*
#if 
 
#endif
 */

<*>#if[ \t]* {
yy_push_state(PREPROC_IF, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}

{ID} {
yy_pop_state(yyscanner);
if (evaluate(yyextra->macro_table[yytext]))
yy_push_state(PREPROC_BODY_USE, yyscanner);
else
yy_push_state(PREPROC_BODY_EAT, yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}

.* /* eat up code block filtered out by preprocessor */

<*>.*"#endif" {
yy_pop_state(yyscanner);
yyextra->token = PreprocessorToken(yytext);
return PREPROC_TOKEN_TYPE;
}

 /* Language keywords */

if|else|const|switch|case|int|unsigned {
yyextra->token = KeywordToken(yytext);
return KEYWORD_TOKEN_TYPE;
}

 /* String literal */

\"[^"]*\" {
yyextra->token = StringLiteralToken(yytext);
return STRING_LITERAL_TYPE;
}

 /* Number literal */

{DIGIT}+("."{DIGIT}+)? {
yyextra->token = NumberLiteralToken(yytext);
return NUMBER_LITERAL_TYPE;
}

 /* Other tokens */

<*>. {
yyextra->token = OtherToken(yytext);
return OTHER_TOKEN_TYPE;
}

%%


Best regards,
Christian Schoenebeck