On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote: > On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm <dmalc...@redhat.com> wrote: > > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote: > >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz <m...@suse.de> wrote: > >> > Hi, > >> > > >> > On Tue, 22 Sep 2015, David Malcolm wrote: > >> > > >> >> The drawback is that it could bloat the ad-hoc table. Can the ad-hoc > >> >> table ever get smaller, or does it only ever get inserted into? > >> > > >> > It only ever grows. > >> > > >> >> An idea I had is that we could stash short ranges directly into the 32 > >> >> bits of location_t, by offsetting the per-column-bits somewhat. > >> > > >> > It's certainly worth an experiment: let's say you restrict yourself to > >> > tokens less than 8 characters, you need an additional 3 bits (using one > >> > value, e.g. zero, as the escape value). That leaves 20 bits for the line > >> > numbers (for the normal 8 bit columns), which might be enough for most > >> > single-file compilations. For LTO compilation this often won't be > >> > enough. > >> > > >> >> My plan is to investigate the impact these patches have on the time and > >> >> memory consumption of the compiler, > >> > > >> > When you do so, make sure you're also measuring an LTO compilation with > >> > debug info of something big (firefox). I know that we already had issues > >> > with the size of the linemap data in the past for these cases (probably > >> > when we added columns). > >> > >> The issue we have with LTO is that the linemap gets populated in quite > >> random order and thus we repeatedly switch files (we've mitigated this > >> somewhat for GCC 5). We also considered dropping column info > >> (and would drop range info) as diagnostics are from optimizers only > >> with LTO and we keep locations merely for debug info. > > > > Thanks. Presumably the mitigation you're referring to is the > > lto_location_cache class in lto-streamer-in.c? > > > > Am I right in thinking that, right now, the LTO code doesn't support > > ad-hoc locations? (presumably the block pointers only need to exist > > during optimization, which happens after the serialization) > > LTO code does support ad-hoc locations but they are "restored" only > when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA). > > > The obvious simplification would be, as you suggest, to not bother > > storing range information with LTO, falling back to just the existing > > representation. Then there's no need to extend LTO to serialize ad-hoc > > data; simply store the underlying locus into the bit stream. I think > > that this happens already: lto-streamer-out.c calls expand_location and > > stores the result, so presumably any ad-hoc location_t values made by > > the v2 patches would have dropped their range data there when I ran the > > test suite. > > Yep. We only preserve BLOCKs, so if you don't add extra code to > preserve ranges they'll be "dropped". > > > If it's acceptable to not bother with ranges for LTO, one way to do the > > "stashing short ranges into the location_t" idea might be for the > > bits-per-range of location_t values to be a property of the line_table > > (or possibly the line map), set up when the struct line_maps is created. > > For non-LTO it could be some tuned value (maybe from a param?); for LTO > > it could be zero, so that we have as many bits as before for line/column > > data. > > That could be a possibility (likewise for column info?) > > Richard. > > > Hope this sounds sane > > Dave
I did some crude benchmarking of the patchkit, using these scripts: https://github.com/davidmalcolm/gcc-benchmarking (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30), using the "big-code.c" and "kdecore.cc" files Michael posted as: https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html and "influence.i", a preprocessed version of SPEC2006's 445.gobmk engine/influence.c (as an example of a moderate-sized pure C source file). This doesn't yet cover very large autogenerated C files, and the .cc file is only being measured to see the effect on the ad-hoc table (and tokenization). "control" was r227977. "experiment" was the same revision with the v2 patchkit applied. Recall that this patchkit captures ranges for tokens as an extra field within tokens within libcpp and the C FE, and adds ranges to the ad-hoc location lookaside, storing them for all tree nodes within the C FE that have a location_t, and passing them around within c_expr for all C expressions (including those that don't have a location_t). Both control and experiment were built with --enable-checking=release \ --disable-bootstrap \ --disable-multilib \ --enable-languages=c,ada,c++,fortran,go,java,lto,objc,obj-c++ The script measures: (a) wallclock time for "xgcc -S" so it's measuring the driver, parsing, optimimation, etc, rather than attempting to directly measure parsing. This is without -ftime-report, since Mikhail indicated it's sufficiently expensive to skew timings in this post: https://gcc.gnu.org/ml/gcc/2015-07/msg00165.html (b) memory usage: by performing a separate build with -ftime-report, extracting the "TOTAL" ggc value (actually 3 builds, but it's the same each time). Is this a fair way to measure things? It could be argued that by measuring totals I'm hiding the extra parsing cost in the overall cost. Full logs can be seen at: https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2.txt (v2 of the patchkit) I also investigated a version of the patchkit with the token tracking rewritten to build ad-hoc ranges for *every token*, without attempting any kind of optimization (e.g. for short ranges). A log of this can be seen at: https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2-plus-adhoc-ranges-for-tokens.txt (v2 of the patchkit, with token tracking rewritten to build ad-hoc ranges for *every token*). The nice thing about this approach is that lots of token-related diagnostics gain underlining of the relevant token "for free" simply from the location_t, without having to individually patch them. Without any optimization, the memory consumed by this approach is clearly larger. A summary comparing the two logs: Minimal wallclock time (s) over 10 iterations Control -> v2 Control -> v2+adhocloc+at+every+token kdecore.cc -g -O0 10.306548 -> 10.268712: 1.00x faster 10.247160 -> 10.444528: 1.02x slower kdecore.cc -g -O1 27.026285 -> 27.220654: 1.01x slower 27.280681 -> 27.622676: 1.01x slower kdecore.cc -g -O2 43.791668 -> 44.020270: 1.01x slower 43.904934 -> 44.248477: 1.01x slower kdecore.cc -g -O3 47.471836 -> 47.651101: 1.00x slower 47.645985 -> 48.005495: 1.01x slower kdecore.cc -g -Os 31.678652 -> 31.802829: 1.00x slower 31.741484 -> 32.033478: 1.01x slower empty.c -g -O0 0.012662 -> 0.011932: 1.06x faster 0.012888 -> 0.013143: 1.02x slower empty.c -g -O1 0.012685 -> 0.012558: 1.01x faster 0.013164 -> 0.012790: 1.03x faster empty.c -g -O2 0.012694 -> 0.012846: 1.01x slower 0.012912 -> 0.013175: 1.02x slower empty.c -g -O3 0.012654 -> 0.012699: 1.00x slower 0.012596 -> 0.012792: 1.02x slower empty.c -g -Os 0.013057 -> 0.012766: 1.02x faster 0.012691 -> 0.012885: 1.02x slower big-code.c -g -O0 3.292680 -> 3.325748: 1.01x slower 3.292948 -> 3.303049: 1.00x slower big-code.c -g -O1 15.701810 -> 15.765014: 1.00x slower 15.714116 -> 15.759254: 1.00x slower big-code.c -g -O2 22.575615 -> 22.620187: 1.00x slower 22.567406 -> 22.605435: 1.00x slower big-code.c -g -O3 52.423586 -> 52.590075: 1.00x slower 52.421460 -> 52.703835: 1.01x slower big-code.c -g -Os 21.153980 -> 21.253598: 1.00x slower 21.146266 -> 21.260138: 1.01x slower influence.i -g -O0 0.148229 -> 0.149518: 1.01x slower 0.148672 -> 0.156262: 1.05x slower influence.i -g -O1 0.387397 -> 0.389930: 1.01x slower 0.387734 -> 0.396655: 1.02x slower influence.i -g -O2 0.587514 -> 0.589604: 1.00x slower 0.588064 -> 0.596510: 1.01x slower influence.i -g -O3 1.273561 -> 1.280514: 1.01x slower 1.274599 -> 1.287596: 1.01x slower influence.i -g -Os 0.526045 -> 0.527579: 1.00x slower 0.526827 -> 0.535635: 1.02x slower Maximal ggc memory (kb) Control -> v2 Control -> v2+adhocloc+at+every+token kdecore.cc -g -O0 650337.000 -> 654435.000: 1.0063x larger 650337.000 -> 711775.000: 1.0945x larger kdecore.cc -g -O1 931966.000 -> 940144.000: 1.0088x larger 931951.000 -> 989384.000: 1.0616x larger kdecore.cc -g -O2 1125325.000 -> 1133514.000: 1.0073x larger 1125318.000 -> 1182384.000: 1.0507x larger kdecore.cc -g -O3 1221408.000 -> 1229596.000: 1.0067x larger 1221410.000 -> 1278658.000: 1.0469x larger kdecore.cc -g -Os 867140.000 -> 871235.000: 1.0047x larger 867141.000 -> 928700.000: 1.0710x larger empty.c -g -O0 1189.000 -> 1192.000: 1.0025x larger 1189.000 -> 1193.000: 1.0034x larger empty.c -g -O1 1189.000 -> 1192.000: 1.0025x larger 1189.000 -> 1193.000: 1.0034x larger empty.c -g -O2 1189.000 -> 1192.000: 1.0025x larger 1189.000 -> 1193.000: 1.0034x larger empty.c -g -O3 1189.000 -> 1192.000: 1.0025x larger 1189.000 -> 1193.000: 1.0034x larger empty.c -g -Os 1189.000 -> 1192.000: 1.0025x larger 1189.000 -> 1193.000: 1.0034x larger big-code.c -g -O0 166584.000 -> 172731.000: 1.0369x larger 166584.000 -> 172726.000: 1.0369x larger big-code.c -g -O1 279793.000 -> 285940.000: 1.0220x larger 279793.000 -> 285935.000: 1.0220x larger big-code.c -g -O2 400058.000 -> 406194.000: 1.0153x larger 400058.000 -> 406189.000: 1.0153x larger big-code.c -g -O3 903648.000 -> 909750.000: 1.0068x larger 903906.000 -> 910001.000: 1.0067x larger big-code.c -g -Os 357060.000 -> 363010.000: 1.0167x larger 357060.000 -> 363005.000: 1.0166x larger influence.i -g -O0 9273.000 -> 9719.000: 1.0481x larger 9273.000 -> 13303.000: 1.4346x larger influence.i -g -O1 12968.000 -> 13414.000: 1.0344x larger 12968.000 -> 16998.000: 1.3108x larger influence.i -g -O2 16386.000 -> 16768.000: 1.0233x larger 16386.000 -> 20352.000: 1.2420x larger influence.i -g -O3 35508.000 -> 35763.000: 1.0072x larger 35508.000 -> 39346.000: 1.1081x larger influence.i -g -Os 14287.000 -> 14669.000: 1.0267x larger 14287.000 -> 18253.000: 1.2776x larger Thoughts? Dave