Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

David Malcolm via Gcc-patches Tue, 15 Aug 2023 09:15:46 -0700

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> This patch enhances location_get_source_line(), which is the primary
> interface provided by the diagnostics infrastructure to obtain the line of
> source code corresponding to a given location, so that it understands
> generated data locations in addition to normal file-based locations. This
> involves changing the argument to location_get_source_line() from a plain
> file name, to a source_id object that can represent either type of location.
> 
> gcc/ChangeLog:
> 
>         * input.cc (class data_cache_slot): New class.
>         (file_cache::lookup_data): New function.
>         (diagnostics_file_cache_forcibly_evict_data): New function.
>         (file_cache::forcibly_evict_data): New function.
>         (file_cache::evicted_cache_tab_entry): Generalize (via a template)
>         to work for both file_cache_slot and data_cache_slot.
>         (file_cache::add_file): Adapt for new interface to
>         evicted_cache_tab_entry.
>         (file_cache::add_data): New function.
>         (data_cache_slot::create): New function.
>         (file_cache::file_cache): Support the new m_data_slots member.
>         (file_cache::~file_cache): Likewise.
>         (file_cache::lookup_or_add_data): New function.
>         (file_cache::lookup_or_add): New function that calls either
>         lookup_or_add_data or lookup_or_add_file as appropriate.
>         (location_get_source_line): Change the FILE_PATH argument to a
>         source_id SRC, and use it to support obtaining source lines from
>         generated data as well as from files.
>         (location_compute_display_column): Support generated data using the
>         new features of location_get_source_line.
>         (dump_location_info): Likewise.
>         * input.h (location_get_source_line): Adjust prototype. Add a new
>         convenience overload taking an expanded_location.
>         (class cache_data_source): Declare.
>         (class data_cache_slot): Declare.
>         (class file_cache): Declare new members.
>         (diagnostics_file_cache_forcibly_evict_data): Declare.
> ---
>  gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
>  gcc/input.h  |  23 +++++--
>  2 files changed, 153 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 9377020b460..790279d4273 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -207,6 +207,28 @@ private:
>    void maybe_grow ();
>  };
>  
> +/* This is the implementation of cache_data_source for generated
> +   data that is already in memory.  */
> +class data_cache_slot final : public cache_data_source


It occurred to me: why are we caching accessing a buffer that's already
in memory - but we're also caching the line-splitting information, and
providing the line-splitting algorithm with a consistent interface to
the data, right?

[...snip...]

> @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
> *file_path)
>    global_dc->m_file_cache->forcibly_evict_file (file_path);
>  }
>  
> +void
> +diagnostics_file_cache_forcibly_evict_data (const char *data,
> +                                           unsigned int data_len)
> +{
> +  if (!global_dc->m_file_cache)
> +    return;
> +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);

Maybe we should rename diagnostic_context's m_file_cache to
m_source_cache?  (and class file_cache for that matter?)  But if so,
that can/should be a followup/separate patch.

[...snip...]
 
> @@ -525,10 +582,22 @@ file_cache_slot::create (const 
> file_cache::input_context &in_context,
>    return true;
>  }
>  
> +void
> +data_cache_slot::create (const char *data, unsigned int data_len,
> +                        unsigned int highest_use_count)
> +{
> +  reset ();
> +  on_create (highest_use_count + 1,
> +            total_lines_num (source_id {data, data_len}));
> +  m_data_begin = data;
> +  m_data_end = data + data_len;
> +}
> +
>  /* file_cache's ctor.  */
>  
>  file_cache::file_cache ()
> -: m_file_slots (new file_cache_slot[num_file_slots])
> +  : m_file_slots (new file_cache_slot[num_file_slots]),
> +    m_data_slots (new data_cache_slot[num_file_slots])

Should "num_file_slots" be renamed to "num_slots"?

I assume you're using the same value for both kinds of slot since the
file_cache::evicted_cache_tab_entry template uses this.  I suppose the
number could be passed in as an argument to that function if we wanted
to have different sizes for the two kinds, but I don't think it
matters.

[...snip...]

> @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
>     If the function fails, a NULL char_span is returned.  */
>  
>  char_span
> -location_get_source_line (const char *file_path, int line)
> +location_get_source_line (source_id src, int line)
>  {
> -  const char *buffer = NULL;
> -  ssize_t len;
> -
> -  if (line == 0)
> -    return char_span (NULL, 0);
> -
> -  if (file_path == NULL)
> -    return char_span (NULL, 0);
> +  const char_span fail (nullptr, 0);
> +  if (!src || line <= 0)
> +    return fail;

Looking at source_id's operator bool, are there effectively three kinds
of source_id?

(a) file names
(b) generated buffer
(c) NULL == m_filename_or_buffer

What does (c) mean?  Is it a "something's gone wrong/error" state?  Or
is this more a special-case of (a)? (in that the m_len for such a case
would be zero)

Should source_id's 2-param ctor have an assert that the ptr is non-
NULL?

[...snip...]

The patch is OK for trunk as-is, but note the question about the
source_id ctor above.

Thanks
Dave

Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

Reply via email to