May I please ping this? I am just about ready with the followup patch that fixes PR87299, but it depends on this one. Thanks! https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623364.html
-Lewis On Fri, Jun 30, 2023 at 6:59 PM Lewis Hyatt <lhy...@gmail.com> wrote: > > In order to support processing #pragma in preprocess-only mode (-E or > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from > libcpp. In full compilation modes, this is accomplished by calling > pragma_lex (), which is a symbol that must be exported by the frontend, and > which is currently implemented for C and C++. Neither of those frontends > initializes its parser machinery in preprocess-only mode, and consequently > pragma_lex () does not work in this case. > > Address that by adding a new function c_init_preprocess () for the frontends > to implement, which arranges for pragma_lex () to work in preprocess-only > mode, and adjusting pragma_lex () accordingly. > > In preprocess-only mode, the preprocessor is accustomed to controlling the > interaction with libcpp, and it only knows about tokens that it has called > into libcpp itself to obtain. Since it still needs to see the tokens > obtained by pragma_lex () so that they can be streamed to the output, also > add a new libcpp callback, on_token_lex (), that ensures the preprocessor > sees these tokens too. > > Currently, there is one place where we are already supporting #pragma in > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'. That > was done by directly interfacing with libcpp, rather than making use of > pragma_lex (). Now that pragma_lex () works, that code is no longer > necessary; remove it. > > gcc/c-family/ChangeLog: > > * c-common.h (c_init_preprocess): Declare new function. > * c-opts.cc (c_common_init): Call it. > * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to... > (pragma_diagnostic_lex): ...this. > (pragma_diagnostic_lex_pp): Remove. > (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in > all modes. > (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex () > usage. > * c-pragma.h (pragma_lex_discard_to_eol): Declare new function. > > gcc/c/ChangeLog: > > * c-parser.cc (pragma_lex): Support preprocess-only mode. > (pragma_lex_discard_to_eol): New function. > (c_init_preprocess): New function. > > gcc/cp/ChangeLog: > > * parser.cc (c_init_preprocess): New function. > (maybe_read_tokens_for_pragma_lex): New function. > (pragma_lex): Support preprocess-only mode. > (pragma_lex_discard_to_eol): New funtion. > > libcpp/ChangeLog: > > * include/cpplib.h (struct cpp_callbacks): Add new callback > on_token_lex. > * macro.cc (cpp_get_token_1): Support new callback. > --- > > Notes: > Hello- > > In r13-1544, I added support for processing `#pragma GCC diagnostic' in > preprocess-only mode. Because pragma_lex () doesn't work in that mode, in > that patch I called into libcpp directly to obtain the tokens needed to > process the pragma. As part of the review, Jason noted that it would > probably be better to make pragma_lex () usable in preprocess-only mode, > and > we decided just to add a comment about that for the time being, and to go > ahead and implement that in the future, if it became necessary to support > other pragmas during preprocessing. > > I think now is a good time to proceed with that plan, because I would like > to fix PR87299, which is about another pragma (#pragma GCC target) not > working in preprocess-only mode. This patch makes the necessary changes > for > pragma_lex () to work in preprocess-only mode. > > I have also added a new callback, on_token_lex (), to libcpp. This is so > the > preprocessor can see and stream out all the tokens that pragma_lex () gets > from libcpp, since it won't otherwise see them. This seemed the simplest > approach to me. Another possibility would be to add a wrapper function in > c-family/c-lex.cc, which would call cpp_get_token_with_location(), and > then > also stream the token in preprocess-only mode, and then change all calls > into libcpp in that file to use the wrapper function. The libcpp callback > seemed cleaner to me FWIW. > > There are no new tests added here, since it's just a change of > implementation covered by existing tests. Bootstrap + regtest all > languages > looks good on x86-64 Linux. > > Please let me know what you think? Thanks! > > -Lewis > > gcc/c-family/c-common.h | 3 +++ > gcc/c-family/c-opts.cc | 1 + > gcc/c-family/c-pragma.cc | 56 ++++++---------------------------------- > gcc/c-family/c-pragma.h | 2 ++ > gcc/c/c-parser.cc | 34 ++++++++++++++++++++++++ > gcc/cp/parser.cc | 50 +++++++++++++++++++++++++++++++++++ > libcpp/include/cpplib.h | 4 +++ > libcpp/macro.cc | 3 +++ > 8 files changed, 105 insertions(+), 48 deletions(-) > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h > index b5ef5ff6b2c..78fc5248ba6 100644 > --- a/gcc/c-family/c-common.h > +++ b/gcc/c-family/c-common.h > @@ -990,6 +990,9 @@ extern void c_parse_file (void); > > extern void c_parse_final_cleanups (void); > > +/* This initializes for preprocess-only mode. */ > +extern void c_init_preprocess (void); > + > /* These macros provide convenient access to the various _STMT nodes. */ > > /* Nonzero if a given STATEMENT_LIST represents the outermost binding > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc > index af19140e382..4961af63de8 100644 > --- a/gcc/c-family/c-opts.cc > +++ b/gcc/c-family/c-opts.cc > @@ -1232,6 +1232,7 @@ c_common_init (void) > if (flag_preprocess_only) > { > c_finish_options (); > + c_init_preprocess (); > preprocess_file (parse_in); > return false; > } > diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc > index 0d2b333cebb..73d59df3bf4 100644 > --- a/gcc/c-family/c-pragma.cc > +++ b/gcc/c-family/c-pragma.cc > @@ -840,11 +840,11 @@ public: > > }; > > -/* When compiling normally, use pragma_lex () to obtain the needed tokens. > - This will call into either the C or C++ frontends as appropriate. */ > +/* This will call into either the C or C++ frontends as appropriate to get > + tokens from libcpp for the pragma. */ > > static void > -pragma_diagnostic_lex_normal (pragma_diagnostic_data *result) > +pragma_diagnostic_lex (pragma_diagnostic_data *result) > { > result->clear (); > tree x; > @@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data > *result) > result->valid = true; > } > > -/* When preprocessing only, pragma_lex () is not available, so obtain the > - tokens directly from libcpp. We also need to inform the token streamer > - about all tokens we lex ourselves here, so it outputs them too; this is > - done by calling c_pp_stream_token () for each. > - > - ??? If we need to support more pragmas in the future, maybe initialize > - this_parser with the pragma tokens and call pragma_lex () instead? */ > - > -static void > -pragma_diagnostic_lex_pp (pragma_diagnostic_data *result) > -{ > - result->clear (); > - > - auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind); > - c_pp_stream_token (parse_in, tok, result->loc_kind); > - if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD)) > - return; > - const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok); > - result->set_kind ((const char *)kind_u); > - if (result->pd_kind == pragma_diagnostic_data::PK_INVALID) > - return; > - > - if (result->needs_option ()) > - { > - tok = cpp_get_token_with_location (parse_in, &result->loc_option); > - c_pp_stream_token (parse_in, tok, result->loc_option); > - if (tok->type != CPP_STRING) > - return; > - cpp_string str; > - if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, > &str, > - CPP_STRING) > - || !str.len) > - return; > - result->option_str = (const char *)str.text; > - result->own_option_str = true; > - } > - > - result->valid = true; > -} > - > /* Handle #pragma GCC diagnostic. Early mode is used by frontends (such as > C++) > that do not process the deferred pragma while they are consuming tokens; > they > can use early mode to make sure diagnostics affecting the preprocessor > itself > @@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl () > static const bool want_diagnostics = (is_pp || !early); > > pragma_diagnostic_data data; > - if (is_pp) > - pragma_diagnostic_lex_pp (&data); > - else > - pragma_diagnostic_lex_normal (&data); > + pragma_diagnostic_lex (&data); > > if (!data.kind_str) > { > @@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id) > { > const auto data = ®istered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL]; > if (data->early_handler) > - data->early_handler (parse_in); > + { > + data->early_handler (parse_in); > + pragma_lex_discard_to_eol (); > + } > } > > /* Set up front-end pragmas. */ > diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h > index 9cc95ab3ee3..198fa7723e5 100644 > --- a/gcc/c-family/c-pragma.h > +++ b/gcc/c-family/c-pragma.h > @@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree); > extern void maybe_apply_pragma_scalar_storage_order (tree); > extern void add_to_renaming_pragma_list (tree, tree); > > +/* These are to be implemented in each frontend that needs them. */ > extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL); > +extern void pragma_lex_discard_to_eol (); > > /* Flags for use with c_lex_with_flags. The values here were picked > so that 0 means to translate and join strings. */ > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc > index 24a6eb6e459..aaf6d704fe6 100644 > --- a/gcc/c/c-parser.cc > +++ b/gcc/c/c-parser.cc > @@ -13355,6 +13355,11 @@ c_parser_pragma (c_parser *parser, enum > pragma_context context, bool *if_p) > enum cpp_ttype > pragma_lex (tree *value, location_t *loc) > { > + if (flag_preprocess_only) > + /* Arrange for the preprocessor to see the tokens we're about to read, > + since it won't see them later. */ > + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; > + > c_token *tok = c_parser_peek_token (the_parser); > enum cpp_ttype ret = tok->type; > > @@ -13373,9 +13378,29 @@ pragma_lex (tree *value, location_t *loc) > c_parser_consume_token (the_parser); > } > > + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; > return ret; > } > > +void > +pragma_lex_discard_to_eol () > +{ > + if (flag_preprocess_only) > + /* Arrange for the preprocessor to see the tokens we're about to read, > + since it won't see them later. */ > + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; > + > + cpp_ttype type; > + do > + { > + type = c_parser_peek_token (the_parser)->type; > + gcc_assert (type != CPP_EOF); > + c_parser_consume_token (the_parser); > + } while (type != CPP_PRAGMA_EOL); > + > + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; > +} > + > static void > c_parser_pragma_pch_preprocess (c_parser *parser) > { > @@ -24756,6 +24781,15 @@ c_parse_file (void) > the_parser = NULL; > } > > +void > +c_init_preprocess (void) > +{ > + /* Create a parser for use by pragma_lex during preprocessing. */ > + the_parser = ggc_alloc<c_parser> (); > + memset (the_parser, 0, sizeof (c_parser)); > + the_parser->tokens = &the_parser->tokens_buf[0]; > +} > + > /* Parse the body of a function declaration marked with "__RTL". > > The RTL parser works on the level of characters read from a > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc > index 5e2b5cba57e..b2f2e222d81 100644 > --- a/gcc/cp/parser.cc > +++ b/gcc/cp/parser.cc > @@ -765,6 +765,15 @@ cp_lexer_new_main (void) > return lexer; > } > > +/* Create a lexer and parser to be used during preprocess-only mode. > + This will be filled with tokens to parse when needed by pragma_lex (). */ > +void > +c_init_preprocess () > +{ > + gcc_assert (!the_parser); > + the_parser = cp_parser_new (cp_lexer_alloc ()); > +} > + > /* Create a new lexer whose token stream is primed with the tokens in > CACHE. When these tokens are exhausted, no new tokens will be read. */ > > @@ -49683,11 +49692,42 @@ cp_parser_pragma (cp_parser *parser, enum > pragma_context context, bool *if_p) > return ret; > } > > +/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not > + populated the lexer with any tokens (the tokens rather being read by > + c-ppoutput.c's machinery), so we need to read enough tokens now to handle > + a pragma. */ > +static void > +maybe_read_tokens_for_pragma_lex () > +{ > + const auto lexer = the_parser->lexer; > + if (!lexer->buffer->is_empty ()) > + return; > + > + /* Arrange for the preprocessor to see the tokens we're about to read, > + since it won't see them later. */ > + cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token; > + > + /* Read the rest of the tokens comprising the pragma line. */ > + cp_token *tok; > + do > + { > + tok = vec_safe_push (lexer->buffer, cp_token ()); > + cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok); > + gcc_assert (tok->type != CPP_EOF); > + } while (tok->type != CPP_PRAGMA_EOL); > + lexer->next_token = lexer->buffer->address (); > + lexer->last_token = lexer->next_token + lexer->buffer->length () - 1; > + cpp_get_callbacks (parse_in)->on_token_lex = nullptr; > +} > + > /* The interface the pragma parsers have to the lexer. */ > > enum cpp_ttype > pragma_lex (tree *value, location_t *loc) > { > + if (flag_preprocess_only) > + maybe_read_tokens_for_pragma_lex (); > + > cp_token *tok = cp_lexer_peek_token (the_parser->lexer); > enum cpp_ttype ret = tok->type; > > @@ -49710,6 +49750,16 @@ pragma_lex (tree *value, location_t *loc) > return ret; > } > > +void > +pragma_lex_discard_to_eol () > +{ > + /* We have already read all the tokens, so we just need to discard > + them here. */ > + const auto lexer = the_parser->lexer; > + lexer->next_token = lexer->last_token; > + lexer->buffer->truncate (0); > +} > + > > /* External interface. */ > > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h > index aef703f8111..8b63204df0e 100644 > --- a/libcpp/include/cpplib.h > +++ b/libcpp/include/cpplib.h > @@ -784,6 +784,10 @@ struct cpp_callbacks > cpp_buffer containing the translation if translating. */ > char *(*translate_include) (cpp_reader *, line_maps *, location_t, > const char *path); > + > + /* Called when cpp_get_token() / cpp_get_token_with_location() > + have produced a token. */ > + void (*on_token_lex) (cpp_reader *, const cpp_token *, location_t); > }; > > #ifdef VMS > diff --git a/libcpp/macro.cc b/libcpp/macro.cc > index dada8fea835..ebbc1618a71 100644 > --- a/libcpp/macro.cc > +++ b/libcpp/macro.cc > @@ -3135,6 +3135,9 @@ cpp_get_token_1 (cpp_reader *pfile, location_t > *location) > } > } > > + if (pfile->cb.on_token_lex) > + pfile->cb.on_token_lex (pfile, result, > + location ? *location : result->src_loc); > return result; > } >