On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote: > On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote: > > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <ja...@redhat.com> wrote: > > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote: > > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > > >> > index 760467c..c7558a0 100644 > > >> > --- a/gcc/cp/parser.h > > >> > +++ b/gcc/cp/parser.h > > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token { > > >> > BOOL_BITFIELD purged_p : 1; > > >> > /* The location at which this token was found. */ > > >> > location_t location; > > >> > + /* The source range at which this token was found. */ > > >> > + source_range range; > > >> > > >> Is it just me or does location now feel somewhat redundant with range? > > >> Can't we > > >> compress that somehow? > > > > > > For a token I'd expect it is redundant, I don't see how it would be useful > > > for a single preprocessing token to have more than start and end > > > locations. > > > But generally, for expressions, 3 locations make sense. > > > If you have > > > abc + def > > > ~~~~^~~~~ > > > then having a range is useful. In any case, I'm surprised that the > > > ranges aren't encoded in > > > location_t (the data structures behind it, where we already stick also > > > BLOCK pointer). > > > > Probably lack of encoding space ... I suppose upping location_t to > > 64bits coud solve > > some of that (with its own drawback on increasing size of core structures). > > What I had in mind was just add > source_location start, end; > to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent > just plain locations without block and without range (including the cases > where the range has both start and end equal to the locus) and IS_ADHOC_LOC > locations for the cases where either we have non-NULL block, or we have > some other range, or both. But I haven't spent any time on that, so just > wondering if such an encoding has been considered.
I've been attempting to implement that. Am I right in thinking that the ad-hoc locations never get purged? i.e. that once we've registered an ad-hoc location, then is that slot within location_adhoc_data_map is forever associated with that (locus, block) pair? [or in the proposed model, the (locus, src_range, block) triple?]. If so, it may make more sense to put the ranges into ad-hoc locations, but only *after tokenization*: in this approach, the src_range would be a field within the tokens (like in patch 07/22), in the hope that the tokens are short-lived (which AIUI is the case for libcpp and C, though not for C++), presumably also killing the "location" field within tokens. We then stuff the range into the location_t when building trees (maybe putting a src_range into c_expr to further delay populating location_adhoc_data_map). That way we avoid bloating the location_adhoc_data_map during lexing, whilst preserving the range information, and we can stuff the ranges into the 32-bit location_t within tree/gimple etc (albeit paying a cost within the location_adhoc_data_map). Thoughts? Hope this sounds sane. Dave