On Wed, Jun 2, 2021 at 11:15 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > As mentioned earlier, I abstracted the on-entry cache at the beginning > of stage1. This was to make it easier to port future changes back to > GCC11 so we could provide alternate representations to deal with memory > issues, or what have you. > > This patch introduces a sparse representation of the cache which is > used when the number of basic blocks gets too large. > > I commandeered the bitmap code since its efficient and has been working > a long time, and added 2 routines to get and set 4 bits (quads) at a > time. This allows me to use a bitmap like its a sparse array which can > contain a value between 0 and 15, and is conveniently pre-initialized to > values of 0 at no cost :-) This is then used as an index into a small > local table to store ranges for the name. Its limiting in that an > ssa-name will not be able to have more than 15 unique ranges, but this > happens in less than 8% of all cases in the data I collected, and most > of those are switches.. any ranges after the 15 slots are full revert to > VARYING. The values for VARYING and UNDEFINED are pre-populated, and > for pointers, I also pre-populate [0,0] and ~[0, 0]. > > This also adds --param=evrp-sparse-threshold= which allows the > threshold between the full vector and this new sparse representation to > be changed. It defaults to a value of 800. I've done various performance > runs, and this seems to be a reasonably balanced number. In fact its a > 28% improvement in EVRP compile time over 390 files from a gcc bootstrap > with minimal impact on missed opportunities. > > I've also tried to see if using less than 15 values has any significant > effect (The lookup is linear when setting), but it does not appear to. > > I've also bootstrapped with the sparse threshold at 0 to ensure there > aren't any issues. > > My thoughts are I would put this into trunk, and assuming nothing comes > up over the next couple of days, port it back to GCC11 to resolve > 100299 and other excessive memory consumption PRs there as well. given > that its reusing bitmap code for the sparse representation, it seems > like it would be low risk. > > Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad > routines in bitmap.c? It seems like they might be useful to others. > They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing > with 4 bits at a time. I could make them local if this is a problem, > but i don't have access to the bitmap internals there.
I think _quad is a bit too specific - it's aligned chunks so maybe void bitmap_set_aligned_chunk (bitmap, unsigned int chunk, unsigned int chunk_size, BITMAP_WORD chunk_value); and BITMAP_WORD bitmap_get_aligned_chunk (bitmap, unsigned int chunk, unsigned chunk_size); and assert that chunk_size is power-of-two and fits in BITMAP_WORD? (also note using unsigned ints and BITMAP_WORD for the data type) I've been using two-bit representations in a few places (but mostly setting/testing the respective bits independently), I suppose for example static dep_state query_loop_dependence (class loop *loop, im_mem_ref *ref, dep_kind kind) { unsigned first_bit = 6 * loop->num + kind * 2; if (bitmap_bit_p (&ref->dep_loop, first_bit)) return dep_independent; else if (bitmap_bit_p (&ref->dep_loop, first_bit + 1)) return dep_dependent; could use a chunk size of 2 and a single bitmap query. Incidentially this specific code uses 6 bits, so it's not fully aligned ... /* We use six bits per loop in the ref->dep_loop bitmap to record the dep_kind x dep_state combinations. */ enum dep_kind { lim_raw, sm_war, sm_waw }; enum dep_state { dep_unknown, dep_independent, dep_dependent }; ... but there's also at most a single bit set. Anyway, I'm OK with adding API to access aligned power-of-two sized chunks. Even not power-of-two sized unaligned chunks should be quite straight forward to implement if we limit the chunk size to BITMAP_WORD by simply advancing to the next bitmap word / element when necessary. An alternative low-level API would provide accesses to whole BITMAP_WORD entries and the quads could be implemented on top of that (bitmap_set_word/_get_word) Richard. > Bootstraps on x86_64-pc-linux-gnu with no regressions. > > Andrew > > PS in PR10299 we spend a fraction of a second in EVRP now. > > >