Re: [PATCH 3/5] Implement token range tracking within libcpp and the C FE (v2)

David Malcolm Fri, 25 Sep 2015 07:32:21 -0700

On Fri, 2015-09-25 at 11:13 +0200, Dodji Seketeli wrote:
> Hello,
> 
> David Malcolm <dmalc...@redhat.com> a écrit:
> 
> > This is an updated version of:
> >   https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00736.html
> >
> > Changes in V2 of the patch:
> >   * c_lex_with_flags: don't write through the range ptr if it's NULL
> >   * don't add any fields to the C++ frontend's cp_token for now
> >   * libcpp/lex.c: prevent usage of stale/uninitialized data in
> >     _cpp_temp_token and _cpp_lex_direct.
> >
> > This patch adds source *range* information to libcpp's cpp_token, and to
> > c_token in the C frontend.
> >
> > As noted before, to minimize churn, I kept the existing location_t
> > fields, though in theory these are always just equal to the start of
> > the source range.
> 
> Funny; I first overlooked this comment of you, and then when reading the
> patch, I asked myself "why keep the existing location_t" ?  I mean, in
> here:
> 
>      /* A preprocessing token.  This has been carefully packed and should
>     -   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
>     +   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.
>     +   FIXME: the above comment is no longer true with this patch.  */
>      struct GTY(()) cpp_token {
>        source_location src_loc;       /* Location of first char of token.  */
>     +  source_range src_range;        /* Source range covered by the token.  
> */
>        ENUM_BITFIELD(cpp_ttype) type : CHAR_BIT;  /* token type */
>        unsigned short flags;          /* flags - see above */
>  
> You could just change the type of src_loc and make it be a source_range.


Interesting idea.

For the general case of expressions, I want a location to mean a
caret/point plus a range that contains it, but for tokens, the
caret/point is always at the start of the range.  So maybe a src_range
would do here.

That said, in patches 3 and 4 of this kit I'm experimenting with
representation; as I said in the blurb for patch 3: "See the
cover-letter for this patch kit which describes how we might go back to
using just a location_t, and stashing the range inside the location_t.
I'm doing it this way for now to allow for more flexibility as I
benchmark and explore implementation options."

So patches 3 and 4 are rather more experimental than patches 1,2 and 5,
as I find out what the different representations do to the time&memory
consumption of the compiler.

I like the idea of "source_location" and "location_t" becoming a
representation of "(point/caret + range)" rather than just a
point/caret, since this means that we can pass location_t around as
before, but then we can extract ranges from them, and all of the
existing diagnostics ought to "automagically" gain underlines "for
free".  I'm experimenting with ways to try to do that efficiently, with
strategies for packing things into the 32-bits compactly; see e.g.:
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html

If so, then cpp_token's src_loc would remain a source_location;
source_location itself becomes richer.

> Source range could take a converting constructor, that takes a
> source_location, so that the existing code that does "cpp_token.src_loc
> = a_source_location;" keeps working.
> 
> But then, in the previous patch of the series, I see:
> 
> +/* A range of source locations.
> +
> +   Ranges are closed:
> +   m_start is the first location within the range,
> +   m_finish is the last location within the range.
> +
> +   We may need a more compact way to store these, but for now,
> +   let's do it the simple way, as a pair.  */
> +struct GTY(()) source_range
> +{
> +  source_location m_start;
> +  source_location m_finish;
> +
> +  void debug (const char *msg) const;
> +
> +  /* We avoid using constructors, since various structs that
> +     don't yet have constructors will embed instances of
> +     source_range.  */
> +
> 
> But what if we define a default constructor for that one (as well as one
> that takes a source_location)?  Wouldn't that work for the embedding
> case that you talk about in that comment?

Perhaps, but I worry that it would lead to a cascade, where suddenly
we'd need constructors for various other types.  I can try it, I guess.

> The reason why I am asking all this is, memory consumption.  Maybe
> you've measured it and it's not relevant, but otherwise, if we could do
> away with duplicating the start location and still miminize the churn,
> maybe we should try.

(nods)   I'm experimenting with some different approaches here.

Thanks
Dave

[BTW, I'm about to disappear on a vacation from tomorrow until October
6th, and away from computers]

Re: [PATCH 3/5] Implement token range tracking within libcpp and the C FE (v2)

Reply via email to