On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote:
> > Actually, I think cpp_token isn't that big deal, that should be short-lived
> > unless using huge macros.
> > cp_token in the C++ FE is more important, the FE uses a vector of those
> > and there is one cp_token per token read from libcpp.
> > Unfortunately, I'm afraid there is nothing that can be done there,
> > the struct has currently 29 bits of various flags, then 32 bit location_t
> > and then union with a single pointer in it, so nicely 16 bytes.
> > Now it will be 24 bytes, with 35 spare bits for flags.
> > And the vector is live across the whole parsing (pointer to it cleared at
> > the end of parsing, so GC collect can use it).
> 
> So cp_token[] could be split into two arrays to avoid the 32bit padding with
> the enlarged location_t.  Maybe that's even more cache efficient if one
> 32bit field is often the only accessed one when sweeping over a chain
> of tokens.

Not without rewriting the whole parser, cp_token is the basic structure it
uses everywhere, most of the code doesn't really care whether the tokens are
in a vector or where, they just peek a token or peek 2nd token or similarly.
And having to rewrite all cp_token -> accesses into some inline function
calls or macros that would for e.g. the 32-bit pointer to cp_token_flags
look up the corresponding location/u.value/u.tree_check_value would be a
nightmare, there are about 950 such cases.  What is worse, not all cp_tokens
live in the parser->lexer->buffer vector, some of them live in automatic
variables.

        Jakub

Reply via email to