May I please ping this?
I am just about ready with the followup patch that fixes PR87299, but
it depends on this one. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623364.html

-Lewis

On Fri, Jun 30, 2023 at 6:59 PM Lewis Hyatt <lhy...@gmail.com> wrote:
>
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
>
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
>
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> sees these tokens too.
>
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
>
> gcc/c-family/ChangeLog:
>
>         * c-common.h (c_init_preprocess): Declare new function.
>         * c-opts.cc (c_common_init): Call it.
>         * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
>         (pragma_diagnostic_lex): ...this.
>         (pragma_diagnostic_lex_pp): Remove.
>         (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
>         all modes.
>         (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
>         usage.
>         * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
>
> gcc/c/ChangeLog:
>
>         * c-parser.cc (pragma_lex): Support preprocess-only mode.
>         (pragma_lex_discard_to_eol): New function.
>         (c_init_preprocess): New function.
>
> gcc/cp/ChangeLog:
>
>         * parser.cc (c_init_preprocess): New function.
>         (maybe_read_tokens_for_pragma_lex): New function.
>         (pragma_lex): Support preprocess-only mode.
>         (pragma_lex_discard_to_eol): New funtion.
>
> libcpp/ChangeLog:
>
>         * include/cpplib.h (struct cpp_callbacks): Add new callback
>         on_token_lex.
>         * macro.cc (cpp_get_token_1): Support new callback.
> ---
>
> Notes:
>     Hello-
>
>     In r13-1544, I added support for processing `#pragma GCC diagnostic' in
>     preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
>     that patch I called into libcpp directly to obtain the tokens needed to
>     process the pragma. As part of the review, Jason noted that it would
>     probably be better to make pragma_lex () usable in preprocess-only mode, 
> and
>     we decided just to add a comment about that for the time being, and to go
>     ahead and implement that in the future, if it became necessary to support
>     other pragmas during preprocessing.
>
>     I think now is a good time to proceed with that plan, because I would like
>     to fix PR87299, which is about another pragma (#pragma GCC target) not
>     working in preprocess-only mode. This patch makes the necessary changes 
> for
>     pragma_lex () to work in preprocess-only mode.
>
>     I have also added a new callback, on_token_lex (), to libcpp. This is so 
> the
>     preprocessor can see and stream out all the tokens that pragma_lex () gets
>     from libcpp, since it won't otherwise see them.  This seemed the simplest
>     approach to me. Another possibility would be to add a wrapper function in
>     c-family/c-lex.cc, which would call cpp_get_token_with_location(), and 
> then
>     also stream the token in preprocess-only mode, and then change all calls
>     into libcpp in that file to use the wrapper function.  The libcpp callback
>     seemed cleaner to me FWIW.
>
>     There are no new tests added here, since it's just a change of
>     implementation covered by existing tests. Bootstrap + regtest all 
> languages
>     looks good on x86-64 Linux.
>
>     Please let me know what you think? Thanks!
>
>     -Lewis
>
>  gcc/c-family/c-common.h  |  3 +++
>  gcc/c-family/c-opts.cc   |  1 +
>  gcc/c-family/c-pragma.cc | 56 ++++++----------------------------------
>  gcc/c-family/c-pragma.h  |  2 ++
>  gcc/c/c-parser.cc        | 34 ++++++++++++++++++++++++
>  gcc/cp/parser.cc         | 50 +++++++++++++++++++++++++++++++++++
>  libcpp/include/cpplib.h  |  4 +++
>  libcpp/macro.cc          |  3 +++
>  8 files changed, 105 insertions(+), 48 deletions(-)
>
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index b5ef5ff6b2c..78fc5248ba6 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -990,6 +990,9 @@ extern void c_parse_file (void);
>
>  extern void c_parse_final_cleanups (void);
>
> +/* This initializes for preprocess-only mode.  */
> +extern void c_init_preprocess (void);
> +
>  /* These macros provide convenient access to the various _STMT nodes.  */
>
>  /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index af19140e382..4961af63de8 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -1232,6 +1232,7 @@ c_common_init (void)
>    if (flag_preprocess_only)
>      {
>        c_finish_options ();
> +      c_init_preprocess ();
>        preprocess_file (parse_in);
>        return false;
>      }
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
> index 0d2b333cebb..73d59df3bf4 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -840,11 +840,11 @@ public:
>
>  };
>
> -/* When compiling normally, use pragma_lex () to obtain the needed tokens.
> -   This will call into either the C or C++ frontends as appropriate.  */
> +/* This will call into either the C or C++ frontends as appropriate to get
> +   tokens from libcpp for the pragma.  */
>
>  static void
> -pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
> +pragma_diagnostic_lex (pragma_diagnostic_data *result)
>  {
>    result->clear ();
>    tree x;
> @@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data 
> *result)
>    result->valid = true;
>  }
>
> -/* When preprocessing only, pragma_lex () is not available, so obtain the
> -   tokens directly from libcpp.  We also need to inform the token streamer
> -   about all tokens we lex ourselves here, so it outputs them too; this is
> -   done by calling c_pp_stream_token () for each.
> -
> -   ???  If we need to support more pragmas in the future, maybe initialize
> -   this_parser with the pragma tokens and call pragma_lex () instead?  */
> -
> -static void
> -pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
> -{
> -  result->clear ();
> -
> -  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
> -  c_pp_stream_token (parse_in, tok, result->loc_kind);
> -  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
> -    return;
> -  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
> -  result->set_kind ((const char *)kind_u);
> -  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
> -    return;
> -
> -  if (result->needs_option ())
> -    {
> -      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
> -      c_pp_stream_token (parse_in, tok, result->loc_option);
> -      if (tok->type != CPP_STRING)
> -       return;
> -      cpp_string str;
> -      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, 
> &str,
> -                                            CPP_STRING)
> -         || !str.len)
> -       return;
> -      result->option_str = (const char *)str.text;
> -      result->own_option_str = true;
> -    }
> -
> -  result->valid = true;
> -}
> -
>  /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as 
> C++)
>     that do not process the deferred pragma while they are consuming tokens; 
> they
>     can use early mode to make sure diagnostics affecting the preprocessor 
> itself
> @@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
>    static const bool want_diagnostics = (is_pp || !early);
>
>    pragma_diagnostic_data data;
> -  if (is_pp)
> -    pragma_diagnostic_lex_pp (&data);
> -  else
> -    pragma_diagnostic_lex_normal (&data);
> +  pragma_diagnostic_lex (&data);
>
>    if (!data.kind_str)
>      {
> @@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
>  {
>    const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
>    if (data->early_handler)
> -    data->early_handler (parse_in);
> +    {
> +      data->early_handler (parse_in);
> +      pragma_lex_discard_to_eol ();
> +    }
>  }
>
>  /* Set up front-end pragmas.  */
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> index 9cc95ab3ee3..198fa7723e5 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
>  extern void maybe_apply_pragma_scalar_storage_order (tree);
>  extern void add_to_renaming_pragma_list (tree, tree);
>
> +/* These are to be implemented in each frontend that needs them.  */
>  extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
> +extern void pragma_lex_discard_to_eol ();
>
>  /* Flags for use with c_lex_with_flags.  The values here were picked
>     so that 0 means to translate and join strings.  */
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 24a6eb6e459..aaf6d704fe6 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -13355,6 +13355,11 @@ c_parser_pragma (c_parser *parser, enum 
> pragma_context context, bool *if_p)
>  enum cpp_ttype
>  pragma_lex (tree *value, location_t *loc)
>  {
> +  if (flag_preprocess_only)
> +    /* Arrange for the preprocessor to see the tokens we're about to read,
> +       since it won't see them later.  */
> +    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
>    c_token *tok = c_parser_peek_token (the_parser);
>    enum cpp_ttype ret = tok->type;
>
> @@ -13373,9 +13378,29 @@ pragma_lex (tree *value, location_t *loc)
>        c_parser_consume_token (the_parser);
>      }
>
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
>    return ret;
>  }
>
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  if (flag_preprocess_only)
> +    /* Arrange for the preprocessor to see the tokens we're about to read,
> +       since it won't see them later.  */
> +    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
> +  cpp_ttype type;
> +  do
> +    {
> +      type = c_parser_peek_token (the_parser)->type;
> +      gcc_assert (type != CPP_EOF);
> +      c_parser_consume_token (the_parser);
> +    } while (type != CPP_PRAGMA_EOL);
> +
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
> +}
> +
>  static void
>  c_parser_pragma_pch_preprocess (c_parser *parser)
>  {
> @@ -24756,6 +24781,15 @@ c_parse_file (void)
>    the_parser = NULL;
>  }
>
> +void
> +c_init_preprocess (void)
> +{
> +  /* Create a parser for use by pragma_lex during preprocessing.  */
> +  the_parser = ggc_alloc<c_parser> ();
> +  memset (the_parser, 0, sizeof (c_parser));
> +  the_parser->tokens = &the_parser->tokens_buf[0];
> +}
> +
>  /* Parse the body of a function declaration marked with "__RTL".
>
>     The RTL parser works on the level of characters read from a
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 5e2b5cba57e..b2f2e222d81 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -765,6 +765,15 @@ cp_lexer_new_main (void)
>    return lexer;
>  }
>
> +/* Create a lexer and parser to be used during preprocess-only mode.
> +   This will be filled with tokens to parse when needed by pragma_lex ().  */
> +void
> +c_init_preprocess ()
> +{
> +  gcc_assert (!the_parser);
> +  the_parser = cp_parser_new (cp_lexer_alloc ());
> +}
> +
>  /* Create a new lexer whose token stream is primed with the tokens in
>     CACHE.  When these tokens are exhausted, no new tokens will be read.  */
>
> @@ -49683,11 +49692,42 @@ cp_parser_pragma (cp_parser *parser, enum 
> pragma_context context, bool *if_p)
>    return ret;
>  }
>
> +/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
> +   populated the lexer with any tokens (the tokens rather being read by
> +   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
> +   a pragma.  */
> +static void
> +maybe_read_tokens_for_pragma_lex ()
> +{
> +  const auto lexer = the_parser->lexer;
> +  if (!lexer->buffer->is_empty ())
> +    return;
> +
> +  /* Arrange for the preprocessor to see the tokens we're about to read,
> +     since it won't see them later.  */
> +  cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
> +  /* Read the rest of the tokens comprising the pragma line.  */
> +  cp_token *tok;
> +  do
> +    {
> +      tok = vec_safe_push (lexer->buffer, cp_token ());
> +      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
> +      gcc_assert (tok->type != CPP_EOF);
> +    } while (tok->type != CPP_PRAGMA_EOL);
> +  lexer->next_token = lexer->buffer->address ();
> +  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
> +}
> +
>  /* The interface the pragma parsers have to the lexer.  */
>
>  enum cpp_ttype
>  pragma_lex (tree *value, location_t *loc)
>  {
> +  if (flag_preprocess_only)
> +    maybe_read_tokens_for_pragma_lex ();
> +
>    cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
>    enum cpp_ttype ret = tok->type;
>
> @@ -49710,6 +49750,16 @@ pragma_lex (tree *value, location_t *loc)
>    return ret;
>  }
>
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  /* We have already read all the tokens, so we just need to discard
> +     them here.  */
> +  const auto lexer = the_parser->lexer;
> +  lexer->next_token = lexer->last_token;
> +  lexer->buffer->truncate (0);
> +}
> +
>
>  /* External interface.  */
>
> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index aef703f8111..8b63204df0e 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -784,6 +784,10 @@ struct cpp_callbacks
>       cpp_buffer containing the translation if translating.  */
>    char *(*translate_include) (cpp_reader *, line_maps *, location_t,
>                               const char *path);
> +
> +  /* Called when cpp_get_token() / cpp_get_token_with_location()
> +     have produced a token.  */
> +  void (*on_token_lex) (cpp_reader *, const cpp_token *, location_t);
>  };
>
>  #ifdef VMS
> diff --git a/libcpp/macro.cc b/libcpp/macro.cc
> index dada8fea835..ebbc1618a71 100644
> --- a/libcpp/macro.cc
> +++ b/libcpp/macro.cc
> @@ -3135,6 +3135,9 @@ cpp_get_token_1 (cpp_reader *pfile, location_t 
> *location)
>         }
>      }
>
> +  if (pfile->cb.on_token_lex)
> +    pfile->cb.on_token_lex (pfile, result,
> +                           location ? *location : result->src_loc);
>    return result;
>  }
>

Reply via email to