On Sun, Nov 3, 2024 at 11:23 PM Lewis Hyatt <lhy...@gmail.com> wrote:
>
> Hello-
>
> There is no shortage of PRs complaining about things that go wrong when the
> line_maps data structure in libcpp starts to run into its limits. Being
> restricted to a 32-bit location_t to cover all source locations means it has
> the following limitations, among others:
>
>     -Column numbers larger than 2^12 are not tracked at all, so diagnostics
>      can only refer to the whole line, and things like 
> -Wmisleading-indentation
>      stop working with large column numbers. [PR89549]
>
>     -Tokens longer than 32 characters cannot be stored compactly in the line
>      map and have to be stored as ad-hoc locations with the associated
>      memory and locality overhead.
>
>     -Sufficiently large sources can exhaust the available location_t space.
>
> The current limitations are not encountered too much, but it does happen
> often enough, especially in testsuites or large PCH compiles that include,
> e.g., every header in a large project.
>
> Currently, the main workaround is to use the option -flarge-source-locations.
> This sets the range bits in the line_maps to 0, effectively meaning that
> every source code token is stored as an ad-hoc location. This does
> significantly increase the number of locations that can be stored, but it
> loses all the benefits of the line_maps infrastructure, and it does not help
> with the 4096 limit for column numbers.
>
> It seemed to me like there's no way to improve on that other than letting
> location_t be 64 bits instead of 32 bits. I thought I would take a look at
> what that would entail... the attached 15 (mostly small) patches are the
> result. It was unfortunately necessary to touch a lot of different parts of
> the compiler, since location_t is used pretty pervasively. This patch set
> adds a configure option --enable-large-source-locations that makes
> location_t be 64 bits. The option is off by default.
>
> There are decisions to be made about what to do with all those new bits. I
> made some choices in this patch, but it's easy to change if different
> tradeoffs are preferable. Here is what I did:
>
>     - Only 63 bits are used, the high bit is not used so that two location_t
>       can be safely subtracted and stored in an int64_t.
>
>     - As in the 32-bit case, half of the available locations are reserved
>       for ad-hoc locations, and the other half for macros and ordinary
>       locations.
>
>     - The limit on the maximum column number is removed, so it's now set to
>       2^31-1. (GCC right now pervasively assumes line and column numbers
>       within a file can be represented by a 32-bit signed int. That could be
>       changed too, but it is not part of this patch and would be another
>       similarly large change.)
>
>     - The default range bits is changed from 5 to 7. This means that
>       tokens up to 128 chars in length get ordinary locations, while longer
>       ones get ad-hoc locations. I think that's a big improvement from the
>       current max of 32 and will result in a lot fewer ad-hoc locations
>       being generated. There are probably enough bits to go around, though,
>       that this could be made even larger too.
>
> All that stuff is configured in line-map.h and is easy to change. The bulk
> of the patch is dealing with the fallout from making location_t's type
> configurable; in particular, C++ modules needed a lot of tweaks, since they
> deal with line_map internals, and I had to add a new field to RTL which
> necessitated a lot of small changes. I was not previously familiar with
> RTL... I think I generally grokked the idea there and I think the changes I
> made are safe, but I am pretty sure some of them are unnecessary so would
> appreciate feedback on that too.
>
> I tried to test on a variety of platforms. For each of them, I did three
> bootstrap + regtest (without the patch, with the patch and without
> --enable-large-source-locations, and with the patch and with
> --enable-large-source-locations) with --enable-checking=yes,extra,rtl and
> verified no change in the testsuite other than the expected new PASSes. Here
> is what I have tested so far:
>
> all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135)
> c/c++ only: sparc 32-bit (cfarm216)
>
> The testing on sparc 32-bit was very productive and in particular identified
> a few issues with padding and alignment that have not been apparent until
> now. That platform differs from x86 32-bit in that uint64_t has a larger
> alignment than a pointer does.
>
> Also it's worth noting, even if there is no interest in supporting this
> feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless;
> these are small bug fixes that came up in testing and are independent of
> whether location_t is changed or not... they could be triggered by other
> changes in the future too.
>
> Regarding memory usage implications of this change. It does make a lot of
> data structures larger. Here is what it does on x86-64. The base sizes of some
> types change as follows:
>
> location_t:              From 4 to 8 (+100%)
> line_map_ordinary:       From 24 to 32 (+33%)
> line_map_macro:          From 32 to 40 (+25%)
> cpp_token:               From 24 to 32 (+33%)
> cpp_macro:               From 48 to 56 (+17%)
> tree_exp:                From 32 to 40 (+25%)
> gimple:                  From 40 to 48 (+20%)

All GENERIC decls (tree_decl_minimal) will grow by 8 bytes (and we have 4 bytes
unused padding).  On the GIMPLE side it's probably a "hooray - more space for
flags!".  It does feel a bit backwards because the savings we got by making
the TREE_BLOCK part of the location.

I do think that there's no good way around 64bit location_t though, so
thanks for
tackling this.  What I'm less sure is whether we really want to make this
configurable - I think we're past the point of considering a 32bit host to be
a platform to compile arbitrary large sources.

So - thanks for proposing this.  I'll propose to make the change unconditionally
though - it's quite difficult to optimize structure layout when bits in it are
of unknown size and alignment.

Richard.

> The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%).
>
> Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see:
>
> size of the output module: From 21.9MB to 21.6MB (-1%)
> peak memory usage: From 211MB to 235MB (+11%)
>
> The reason the module gets smaller is that it streams ordinary locations
> more efficiently than adhoc locations, and with the 64-bit location_t, the
> number of adhoc locations was reduced from 189,891 to 101,571.
>
> I'm curious to hear your thoughts on this one please. I think it could be
> nice potentially to get it into GCC 15, and consider changing the default in
> GCC 16 or later, if you think it's worth the trouble. If it seems too
> invasive a change and not worth it, I'll certainly understand, and I have
> enjoyed the tour through the compiler internals in any case.
>
> FWIW I feel like this issue will need to be dealt with some time or other. I
> think users generally expect not to run into arbitrary limits, and compiling
> a lot of headers together into one module is going to become possibly more
> common as modules become available by default. I think any attempt to deal
> with it will have to share at least a lot of the features of this one.
>
> -Lewis
>
> 01/15: Support for 64-bit location_t: libcpp parts
> 02/15: libcpp: Fix potential unaligned access in cpp_buffer
> 03/15: tree-cfg: Fix call to next_discriminator_for_locus()
> 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args
> 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR
> 06/15: gimple: Handle tail padding when computing gimple_ops_offset
> 07/15: Support for 64-bit location_t: toplev parts
> 08/15: Support for 64-bit location_t: Analyzer parts
> 09/15: Support for 64-bit location_t: Frontend parts
> 10/15: Support for 64-bit location_t: C++ modules parts
> 11/15: Support for 64-bit location_t: RTL parts
> 12/15: Support for 64-bit location_t: Backend parts
> 13/15: Support for 64-bit location_t: Internal parts
> 14/15: Support for 64-bit location_t: Testsuite parts
> 15/15: Support for 64-bit location_t: Configury parts
>
> 64 files changed, 841 insertions(+), 386 deletions(-)

Reply via email to