C++ FEs

David Malcolm Wed, 16 Sep 2015 13:22:22 -0700

On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote:
> > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > >> > index 760467c..c7558a0 100644
> > >> > --- a/gcc/cp/parser.h
> > >> > +++ b/gcc/cp/parser.h
> > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> > >> >    BOOL_BITFIELD purged_p : 1;
> > >> >    /* The location at which this token was found.  */
> > >> >    location_t location;
> > >> > +  /* The source range at which this token was found.  */
> > >> > +  source_range range;
> > >>
> > >> Is it just me or does location now feel somewhat redundant with range?  
> > >> Can't we
> > >> compress that somehow?
> > >
> > > For a token I'd expect it is redundant, I don't see how it would be useful
> > > for a single preprocessing token to have more than start and end 
> > > locations.
> > > But generally, for expressions, 3 locations make sense.
> > > If you have
> > > abc + def
> > > ~~~~^~~~~
> > > then having a range is useful.  In any case, I'm surprised that the 
> > > ranges aren't encoded in
> > > location_t (the data structures behind it, where we already stick also
> > > BLOCK pointer).
> > 
> > Probably lack of encoding space ... I suppose upping location_t to
> > 64bits coud solve
> > some of that (with its own drawback on increasing size of core structures).
> 
> What I had in mind was just add
>   source_location start, end;
> to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent
> just plain locations without block and without range (including the cases
> where the range has both start and end equal to the locus) and IS_ADHOC_LOC
> locations for the cases where either we have non-NULL block, or we have
> some other range, or both.  But I haven't spent any time on that, so just
> wondering if such an encoding has been considered.


I've been attempting to implement that.

Am I right in thinking that the ad-hoc locations never get purged? i.e.
that once we've registered an ad-hoc location, then is that slot within
location_adhoc_data_map is forever associated with that (locus, block)
pair?  [or in the proposed model, the (locus, src_range, block)
triple?].

If so, it may make more sense to put the ranges into ad-hoc locations,
but only *after tokenization*: in this approach, the src_range would be
a field within the tokens (like in patch 07/22), in the hope that the
tokens are short-lived  (which AIUI is the case for libcpp and C, though
not for C++), presumably also killing the "location" field within
tokens.  We then stuff the range into the location_t when building trees
(maybe putting a src_range into c_expr to further delay populating
location_adhoc_data_map).

That way we avoid bloating the location_adhoc_data_map during lexing,
whilst preserving the range information, and we can stuff the ranges
into the 32-bit location_t within tree/gimple etc (albeit paying a cost
within the location_adhoc_data_map).

Thoughts?  Hope this sounds sane.
Dave

Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs

Reply via email to