On Tue, Nov 05, 2024 at 10:42:10AM +0100, Richard Biener wrote: > > Actually, I think cpp_token isn't that big deal, that should be short-lived > > unless using huge macros. > > cp_token in the C++ FE is more important, the FE uses a vector of those > > and there is one cp_token per token read from libcpp. > > Unfortunately, I'm afraid there is nothing that can be done there, > > the struct has currently 29 bits of various flags, then 32 bit location_t > > and then union with a single pointer in it, so nicely 16 bytes. > > Now it will be 24 bytes, with 35 spare bits for flags. > > And the vector is live across the whole parsing (pointer to it cleared at > > the end of parsing, so GC collect can use it). > > So cp_token[] could be split into two arrays to avoid the 32bit padding with > the enlarged location_t. Maybe that's even more cache efficient if one > 32bit field is often the only accessed one when sweeping over a chain > of tokens.
Not without rewriting the whole parser, cp_token is the basic structure it uses everywhere, most of the code doesn't really care whether the tokens are in a vector or where, they just peek a token or peek 2nd token or similarly. And having to rewrite all cp_token -> accesses into some inline function calls or macros that would for e.g. the 32-bit pointer to cp_token_flags look up the corresponding location/u.value/u.tree_check_value would be a nightmare, there are about 950 such cases. What is worse, not all cp_tokens live in the parser->lexer->buffer vector, some of them live in automatic variables. Jakub