On Tue, Oct 13, 2015 at 5:32 PM, David Malcolm <dmalc...@redhat.com> wrote: > On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote: >> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm <dmalc...@redhat.com> wrote: >> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote: >> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz <m...@suse.de> wrote: >> >> > Hi, >> >> > >> >> > On Tue, 22 Sep 2015, David Malcolm wrote: >> >> > >> >> >> The drawback is that it could bloat the ad-hoc table. Can the ad-hoc >> >> >> table ever get smaller, or does it only ever get inserted into? >> >> > >> >> > It only ever grows. >> >> > >> >> >> An idea I had is that we could stash short ranges directly into the 32 >> >> >> bits of location_t, by offsetting the per-column-bits somewhat. >> >> > >> >> > It's certainly worth an experiment: let's say you restrict yourself to >> >> > tokens less than 8 characters, you need an additional 3 bits (using one >> >> > value, e.g. zero, as the escape value). That leaves 20 bits for the >> >> > line >> >> > numbers (for the normal 8 bit columns), which might be enough for most >> >> > single-file compilations. For LTO compilation this often won't be >> >> > enough. >> >> > >> >> >> My plan is to investigate the impact these patches have on the time and >> >> >> memory consumption of the compiler, >> >> > >> >> > When you do so, make sure you're also measuring an LTO compilation with >> >> > debug info of something big (firefox). I know that we already had >> >> > issues >> >> > with the size of the linemap data in the past for these cases (probably >> >> > when we added columns). >> >> >> >> The issue we have with LTO is that the linemap gets populated in quite >> >> random order and thus we repeatedly switch files (we've mitigated this >> >> somewhat for GCC 5). We also considered dropping column info >> >> (and would drop range info) as diagnostics are from optimizers only >> >> with LTO and we keep locations merely for debug info. >> > >> > Thanks. Presumably the mitigation you're referring to is the >> > lto_location_cache class in lto-streamer-in.c? >> > >> > Am I right in thinking that, right now, the LTO code doesn't support >> > ad-hoc locations? (presumably the block pointers only need to exist >> > during optimization, which happens after the serialization) >> >> LTO code does support ad-hoc locations but they are "restored" only >> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA). >> >> > The obvious simplification would be, as you suggest, to not bother >> > storing range information with LTO, falling back to just the existing >> > representation. Then there's no need to extend LTO to serialize ad-hoc >> > data; simply store the underlying locus into the bit stream. I think >> > that this happens already: lto-streamer-out.c calls expand_location and >> > stores the result, so presumably any ad-hoc location_t values made by >> > the v2 patches would have dropped their range data there when I ran the >> > test suite. >> >> Yep. We only preserve BLOCKs, so if you don't add extra code to >> preserve ranges they'll be "dropped". >> >> > If it's acceptable to not bother with ranges for LTO, one way to do the >> > "stashing short ranges into the location_t" idea might be for the >> > bits-per-range of location_t values to be a property of the line_table >> > (or possibly the line map), set up when the struct line_maps is created. >> > For non-LTO it could be some tuned value (maybe from a param?); for LTO >> > it could be zero, so that we have as many bits as before for line/column >> > data. >> >> That could be a possibility (likewise for column info?) >> >> Richard. >> >> > Hope this sounds sane >> > Dave > > I did some crude benchmarking of the patchkit, using these scripts: > https://github.com/davidmalcolm/gcc-benchmarking > (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30), > using the "big-code.c" and "kdecore.cc" files Michael posted as: > https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html > and "influence.i", a preprocessed version of SPEC2006's 445.gobmk > engine/influence.c (as an example of a moderate-sized pure C source > file). > > This doesn't yet cover very large autogenerated C files, and the .cc > file is only being measured to see the effect on the ad-hoc table (and > tokenization). > > "control" was r227977. > "experiment" was the same revision with the v2 patchkit applied. > > Recall that this patchkit captures ranges for tokens as an extra field > within tokens within libcpp and the C FE, and adds ranges to the ad-hoc > location lookaside, storing them for all tree nodes within the C FE that > have a location_t, and passing them around within c_expr for all C > expressions (including those that don't have a location_t). > > Both control and experiment were built with > --enable-checking=release \ > --disable-bootstrap \ > --disable-multilib \ > --enable-languages=c,ada,c++,fortran,go,java,lto,objc,obj-c++ > > The script measures: > > (a) wallclock time for "xgcc -S" so it's measuring the driver, parsing, > optimimation, etc, rather than attempting to directly measure parsing. > This is without -ftime-report, since Mikhail indicated it's sufficiently > expensive to skew timings in this post: > https://gcc.gnu.org/ml/gcc/2015-07/msg00165.html > > (b) memory usage: by performing a separate build with -ftime-report, > extracting the "TOTAL" ggc value (actually 3 builds, but it's the same > each time). > > Is this a fair way to measure things? It could be argued that by > measuring totals I'm hiding the extra parsing cost in the overall cost.
Overall cost is what matters. Time to build the libstdc++ PCHs would be interesting as well ;) (and their size) One could have argued you should have used -fsyntax-only. > Full logs can be seen at: > https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2.txt > (v2 of the patchkit) > > I also investigated a version of the patchkit with the token tracking > rewritten to build ad-hoc ranges for *every token*, without attempting > any kind of optimization (e.g. for short ranges). > A log of this can be seen at: > https://dmalcolm.fedorapeople.org/gcc/2015-09-25/bmark-v2-plus-adhoc-ranges-for-tokens.txt > (v2 of the patchkit, with token tracking rewritten to build ad-hoc > ranges for *every token*). > The nice thing about this approach is that lots of token-related > diagnostics gain underlining of the relevant token "for free" simply > from the location_t, without having to individually patch them. Without > any optimization, the memory consumed by this approach is clearly > larger. > > A summary comparing the two logs: > > Minimal wallclock time (s) over 10 iterations > Control -> v2 > Control -> v2+adhocloc+at+every+token > kdecore.cc -g -O0 10.306548 -> 10.268712: 1.00x faster > 10.247160 -> 10.444528: 1.02x slower > kdecore.cc -g -O1 27.026285 -> 27.220654: 1.01x slower > 27.280681 -> 27.622676: 1.01x slower > kdecore.cc -g -O2 43.791668 -> 44.020270: 1.01x slower > 43.904934 -> 44.248477: 1.01x slower > kdecore.cc -g -O3 47.471836 -> 47.651101: 1.00x slower > 47.645985 -> 48.005495: 1.01x slower > kdecore.cc -g -Os 31.678652 -> 31.802829: 1.00x slower > 31.741484 -> 32.033478: 1.01x slower > empty.c -g -O0 0.012662 -> 0.011932: 1.06x faster > 0.012888 -> 0.013143: 1.02x slower > empty.c -g -O1 0.012685 -> 0.012558: 1.01x faster > 0.013164 -> 0.012790: 1.03x faster > empty.c -g -O2 0.012694 -> 0.012846: 1.01x slower > 0.012912 -> 0.013175: 1.02x slower > empty.c -g -O3 0.012654 -> 0.012699: 1.00x slower > 0.012596 -> 0.012792: 1.02x slower > empty.c -g -Os 0.013057 -> 0.012766: 1.02x faster > 0.012691 -> 0.012885: 1.02x slower > big-code.c -g -O0 3.292680 -> 3.325748: 1.01x slower > 3.292948 -> 3.303049: 1.00x slower > big-code.c -g -O1 15.701810 -> 15.765014: 1.00x slower > 15.714116 -> 15.759254: 1.00x slower > big-code.c -g -O2 22.575615 -> 22.620187: 1.00x slower > 22.567406 -> 22.605435: 1.00x slower > big-code.c -g -O3 52.423586 -> 52.590075: 1.00x slower > 52.421460 -> 52.703835: 1.01x slower > big-code.c -g -Os 21.153980 -> 21.253598: 1.00x slower > 21.146266 -> 21.260138: 1.01x slower > influence.i -g -O0 0.148229 -> 0.149518: 1.01x slower > 0.148672 -> 0.156262: 1.05x slower > influence.i -g -O1 0.387397 -> 0.389930: 1.01x slower > 0.387734 -> 0.396655: 1.02x slower > influence.i -g -O2 0.587514 -> 0.589604: 1.00x slower > 0.588064 -> 0.596510: 1.01x slower > influence.i -g -O3 1.273561 -> 1.280514: 1.01x slower > 1.274599 -> 1.287596: 1.01x slower > influence.i -g -Os 0.526045 -> 0.527579: 1.00x slower > 0.526827 -> 0.535635: 1.02x slower > > > Maximal ggc memory (kb) > Control -> v2 Control -> > v2+adhocloc+at+every+token > kdecore.cc -g -O0 650337.000 -> 654435.000: 1.0063x larger > 650337.000 -> 711775.000: 1.0945x larger > kdecore.cc -g -O1 931966.000 -> 940144.000: 1.0088x larger > 931951.000 -> 989384.000: 1.0616x larger > kdecore.cc -g -O2 1125325.000 -> 1133514.000: 1.0073x larger > 1125318.000 -> 1182384.000: 1.0507x larger > kdecore.cc -g -O3 1221408.000 -> 1229596.000: 1.0067x larger > 1221410.000 -> 1278658.000: 1.0469x larger > kdecore.cc -g -Os 867140.000 -> 871235.000: 1.0047x larger > 867141.000 -> 928700.000: 1.0710x larger > empty.c -g -O0 1189.000 -> 1192.000: 1.0025x larger > 1189.000 -> 1193.000: 1.0034x larger > empty.c -g -O1 1189.000 -> 1192.000: 1.0025x larger > 1189.000 -> 1193.000: 1.0034x larger > empty.c -g -O2 1189.000 -> 1192.000: 1.0025x larger > 1189.000 -> 1193.000: 1.0034x larger > empty.c -g -O3 1189.000 -> 1192.000: 1.0025x larger > 1189.000 -> 1193.000: 1.0034x larger > empty.c -g -Os 1189.000 -> 1192.000: 1.0025x larger > 1189.000 -> 1193.000: 1.0034x larger > big-code.c -g -O0 166584.000 -> 172731.000: 1.0369x larger > 166584.000 -> 172726.000: 1.0369x larger > big-code.c -g -O1 279793.000 -> 285940.000: 1.0220x larger > 279793.000 -> 285935.000: 1.0220x larger > big-code.c -g -O2 400058.000 -> 406194.000: 1.0153x larger > 400058.000 -> 406189.000: 1.0153x larger > big-code.c -g -O3 903648.000 -> 909750.000: 1.0068x larger > 903906.000 -> 910001.000: 1.0067x larger > big-code.c -g -Os 357060.000 -> 363010.000: 1.0167x larger > 357060.000 -> 363005.000: 1.0166x larger > influence.i -g -O0 9273.000 -> 9719.000: 1.0481x larger > 9273.000 -> 13303.000: 1.4346x larger > influence.i -g -O1 12968.000 -> 13414.000: 1.0344x larger > 12968.000 -> 16998.000: 1.3108x larger > influence.i -g -O2 16386.000 -> 16768.000: 1.0233x larger > 16386.000 -> 20352.000: 1.2420x larger > influence.i -g -O3 35508.000 -> 35763.000: 1.0072x larger > 35508.000 -> 39346.000: 1.1081x larger > influence.i -g -Os 14287.000 -> 14669.000: 1.0267x larger > 14287.000 -> 18253.000: 1.2776x larger > > Thoughts? The compile-time and memory-usage impact for the adhocloc at every token patchkit is quite big. Remember that gaining 1% in compile-time is hard and 20-40% memory increase for influence.i looks too much. I also wonder why you see differences in memory usage change for different -O levels. I think we should have a pretty "static" line table after parsing? Thus rather than percentages I'd like to see absolute changes (which I'd expect to be the same for all -O levels). Richard. > Dave > >