On Wed, Jun 2, 2021 at 11:15 PM Andrew MacLeod <amacl...@redhat.com> wrote:
>
> As mentioned earlier, I abstracted the on-entry cache at the beginning
> of stage1. This was to make it easier to port future changes back to
> GCC11 so we could provide alternate representations to deal with memory
> issues, or what have you.
>
> This patch introduces a sparse representation of the cache which is
> used  when the number of basic blocks gets too large.
>
> I commandeered the bitmap code since its efficient and has been working
> a long time, and added 2 routines to get and set 4 bits (quads) at a
> time.  This allows me to use a bitmap like its a sparse array which can
> contain a value between 0 and 15, and is conveniently pre-initialized to
> values of 0 at no cost :-)   This is then used as an index into a small
> local table to store ranges for the name.  Its limiting in that an
> ssa-name will not be able to have more than 15 unique ranges, but this
> happens in less than 8% of all cases in the data I collected, and most
> of those are switches.. any ranges after the 15 slots are full revert to
> VARYING.  The values for VARYING and UNDEFINED are pre-populated, and
> for pointers, I also pre-populate [0,0] and ~[0, 0].
>
> This also adds --param=evrp-sparse-threshold=  which allows the
> threshold between the full vector and this new sparse representation to
> be changed. It defaults to a value of 800. I've done various performance
> runs, and this seems to be a reasonably balanced number. In fact its a
> 28% improvement in EVRP compile time over 390 files from a gcc bootstrap
> with minimal impact on missed opportunities.
>
> I've also tried to see if using less than 15 values has any significant
> effect (The lookup is linear when setting), but it does not appear to.
>
> I've also bootstrapped with the sparse threshold at 0 to ensure there
> aren't any issues.
>
> My thoughts are I would put this into trunk, and assuming nothing comes
> up  over the next couple of days, port it back to GCC11 to resolve
> 100299 and other excessive memory consumption PRs there as well. given
> that its reusing bitmap code for the sparse representation, it seems
> like it would be low risk.
>
> Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad
> routines in bitmap.c?  It seems like they might be useful to others.
> They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing
> with 4 bits at a time.  I could make them local if this is a problem,
> but i don't have access to the bitmap internals there.

I think _quad is a bit too specific - it's aligned chunks so maybe

void bitmap_set_aligned_chunk (bitmap, unsigned int chunk, unsigned
int chunk_size, BITMAP_WORD chunk_value);

and

BITMAP_WORD bitmap_get_aligned_chunk (bitmap, unsigned int chunk,
unsigned chunk_size);

and assert that chunk_size is power-of-two and fits in BITMAP_WORD?

(also note using unsigned ints and BITMAP_WORD for the data type)

I've been using two-bit representations in a few places (but mostly
setting/testing the
respective bits independently), I suppose for example

static dep_state
query_loop_dependence (class loop *loop, im_mem_ref *ref, dep_kind kind)
{
  unsigned first_bit = 6 * loop->num + kind * 2;
  if (bitmap_bit_p (&ref->dep_loop, first_bit))
    return dep_independent;
  else if (bitmap_bit_p (&ref->dep_loop, first_bit + 1))
    return dep_dependent;

could use a chunk size of 2 and a single bitmap query.  Incidentially this
specific code uses 6 bits, so it's not fully aligned ...

/* We use six bits per loop in the ref->dep_loop bitmap to record
   the dep_kind x dep_state combinations.  */

enum dep_kind { lim_raw, sm_war, sm_waw };
enum dep_state { dep_unknown, dep_independent, dep_dependent };

... but there's also at most a single bit set.

Anyway, I'm OK with adding API to access aligned power-of-two sized chunks.
Even not power-of-two sized unaligned chunks should be quite straight
forward to implement if we limit the chunk size to BITMAP_WORD by
simply advancing to the next bitmap word / element when necessary.

An alternative low-level API would provide accesses to whole BITMAP_WORD
entries and the quads could be implemented on top of that
(bitmap_set_word/_get_word)

Richard.

> Bootstraps on x86_64-pc-linux-gnu with no regressions.
>
> Andrew
>
> PS in PR10299 we spend a fraction of a second in EVRP now.
>
>
>

Reply via email to