As mentioned earlier, I abstracted the on-entry cache at the beginning
of stage1. This was to make it easier to port future changes back to
GCC11 so we could provide alternate representations to deal with memory
issues, or what have you.
This patch introduces a sparse representation of the cache which is
used when the number of basic blocks gets too large.
I commandeered the bitmap code since its efficient and has been working
a long time, and added 2 routines to get and set 4 bits (quads) at a
time. This allows me to use a bitmap like its a sparse array which can
contain a value between 0 and 15, and is conveniently pre-initialized to
values of 0 at no cost :-) This is then used as an index into a small
local table to store ranges for the name. Its limiting in that an
ssa-name will not be able to have more than 15 unique ranges, but this
happens in less than 8% of all cases in the data I collected, and most
of those are switches.. any ranges after the 15 slots are full revert to
VARYING. The values for VARYING and UNDEFINED are pre-populated, and
for pointers, I also pre-populate [0,0] and ~[0, 0].
This also adds --param=evrp-sparse-threshold= which allows the
threshold between the full vector and this new sparse representation to
be changed. It defaults to a value of 800. I've done various performance
runs, and this seems to be a reasonably balanced number. In fact its a
28% improvement in EVRP compile time over 390 files from a gcc bootstrap
with minimal impact on missed opportunities.
I've also tried to see if using less than 15 values has any significant
effect (The lookup is linear when setting), but it does not appear to.
I've also bootstrapped with the sparse threshold at 0 to ensure there
aren't any issues.
My thoughts are I would put this into trunk, and assuming nothing comes
up over the next couple of days, port it back to GCC11 to resolve
100299 and other excessive memory consumption PRs there as well. given
that its reusing bitmap code for the sparse representation, it seems
like it would be low risk.
Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad
routines in bitmap.c? It seems like they might be useful to others.
They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing
with 4 bits at a time. I could make them local if this is a problem,
but i don't have access to the bitmap internals there.
Bootstraps on x86_64-pc-linux-gnu with no regressions.
Andrew
PS in PR10299 we spend a fraction of a second in EVRP now.
>From fb2d9360b0f347bf8af651e9e3382ceca9769787 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacl...@redhat.com>
Date: Wed, 2 Jun 2021 15:08:10 -0400
Subject: [PATCH 2/3] Implement "quad" bit accessors in bitmap for sparse
arrays of 4 bit values.
* bitmap.c (bitmap_set_quad): New.
(bitmap_get_quad): New.
(test_quad): New.
(test_quad): Call test_quad.
* bitmap.h (bitmap_set_quad, bitmap_get_quad): New prototypes.
---
gcc/bitmap.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++
gcc/bitmap.h | 7 +++++
2 files changed, 94 insertions(+)
diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index 5a650cdfc1d..0babcc7cb30 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -1004,6 +1004,68 @@ bitmap_bit_p (const_bitmap head, int bit)
return (ptr->bits[word_num] >> bit_num) & 1;
}
+/* Set 4 bits at a time in a bitmap.
+ store QUAD_VALUE at bits QUAD*4 through QUAD*4 + 3 in bitmap HEAD.
+ This is the set routine for viewing bitmap as a 4 bit sparse array. */
+
+void
+bitmap_set_quad (bitmap head, int quad, int quad_value)
+{
+ int bit = quad * 4;
+ gcc_checking_assert (quad_value >= 0 && quad_value <= 0x0F);
+ unsigned indx = bit / BITMAP_ELEMENT_ALL_BITS;
+ bitmap_element *ptr;
+ if (!head->tree_form)
+ ptr = bitmap_list_find_element (head, indx);
+ else
+ ptr = bitmap_tree_find_element (head, indx);
+ unsigned word_num = bit / BITMAP_WORD_BITS % BITMAP_ELEMENT_WORDS;
+ unsigned bit_num = bit % BITMAP_WORD_BITS;
+ BITMAP_WORD bit_val = ((BITMAP_WORD) quad_value) << bit_num;
+ BITMAP_WORD mask = ~(((BITMAP_WORD) 0x0F) << bit_num);
+
+ if (ptr != 0)
+ {
+ ptr->bits[word_num] &= mask;
+ ptr->bits[word_num] |= bit_val;
+ return;
+ }
+
+ ptr = bitmap_element_allocate (head);
+ ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
+ ptr->bits[word_num] = bit_val;
+ if (!head->tree_form)
+ bitmap_list_link_element (head, ptr);
+ else
+ bitmap_tree_link_element (head, ptr);
+}
+
+/* Return a set of 4 consecutive bits starting at bit QUAD * 4.
+ This is the get routine for viewing bitmap as a 4 bit sparse array. */
+
+int
+bitmap_get_quad (const_bitmap head, int quad)
+{
+ int bit = quad * 4;
+ unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
+ const bitmap_element *ptr;
+ unsigned bit_num;
+ unsigned word_num;
+
+ if (!head->tree_form)
+ ptr = bitmap_list_find_element (const_cast<bitmap> (head), indx);
+ else
+ ptr = bitmap_tree_find_element (const_cast<bitmap> (head), indx);
+ if (ptr == 0)
+ return 0;
+
+ bit_num = bit % BITMAP_WORD_BITS;
+ word_num = bit / BITMAP_WORD_BITS % BITMAP_ELEMENT_WORDS;
+
+ // Return 4 bits.
+ return (ptr->bits[word_num] >> bit_num) & 0x0F;
+}
+
#if GCC_VERSION < 3400
/* Table of number of set bits in a character, indexed by value of char. */
static const unsigned char popcount_table[] =
@@ -2857,6 +2919,30 @@ test_bitmap_single_bit_set_p ()
ASSERT_EQ (1066, bitmap_first_set_bit (b));
}
+/* Verify accessing 4 bit quad elements works. */
+
+static void
+test_quad ()
+{
+ bitmap b = bitmap_gc_alloc ();
+
+ int index = 3;
+ for (int x = 0; x < 16 ; x++)
+ {
+ bitmap_set_quad (b, index, x);
+ ASSERT_TRUE (bitmap_get_quad (b, index) == x);
+ ASSERT_TRUE (bitmap_get_quad (b, index + 1) == 0);
+ ASSERT_TRUE (bitmap_get_quad (b, index - 1) == 0);
+ index += 3;
+ }
+ index = 3;
+ for (int x = 0; x < 16 ; x++)
+ {
+ ASSERT_TRUE (bitmap_get_quad (b, index) == x);
+ index += 3;
+ }
+}
+
/* Run all of the selftests within this file. */
void
@@ -2867,6 +2953,7 @@ bitmap_c_tests ()
test_clear_bit_in_middle ();
test_copying ();
test_bitmap_single_bit_set_p ();
+ test_quad ();
}
} // namespace selftest
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 26138556b2a..f5e8bd3cce3 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -438,6 +438,13 @@ extern bool bitmap_set_bit (bitmap, int);
/* Return true if a bit is set in a bitmap. */
extern int bitmap_bit_p (const_bitmap, int);
+/* Set and get 4 bit values in a sparse bitmap. This allows a bitmap to
+ function as a sparse array of 4 bit values. This is more efficient than
+ performing this as 4 separate operations.
+ QUAD is the index, the values set and returned are 4 bit values. */
+extern void bitmap_set_quad (bitmap head, int quad, int quad_value);
+extern int bitmap_get_quad (const_bitmap head, int quad);
+
/* Debug functions to print a bitmap. */
extern void debug_bitmap (const_bitmap);
extern void debug_bitmap_file (FILE *, const_bitmap);
--
2.17.2
>From 9408efe462c7a664858d986f69082eb86f2a0007 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacl...@redhat.com>
Date: Wed, 2 Jun 2021 15:20:02 -0400
Subject: [PATCH 3/3] Implement a sparse bitmap represenation for Rangers
on-entry cache.
Use s sparse representation for the on entry cache, and utilize it when
the number of basic blocks in the function exceeds param_evrp_sparse_threshold.
PR tree-optimization/PR100299
* gimple-range-cache.cc (class sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::set_bb_range): New.
(sbr_sparse_bitmap::get_bb_range): New.
(sbr_sparse_bitmap::bb_range_p): New.
(block_range_cache::block_range_cache): initialize bitmap obstack.
(block_range_cache::~block_range_cache): Destruct obstack.
(block_range_cache::set_bb_range): Decide when to utilze the
sparse on entry cache.
* gimple-range-cache.h (block_range_cache): Add bitmap obstack.
* params.opt (-param=evrp-sparse-threshold): New.
---
gcc/gimple-range-cache.cc | 126 +++++++++++++++++++++++++++++++++++++-
gcc/gimple-range-cache.h | 1 +
gcc/params.opt | 4 ++
3 files changed, 128 insertions(+), 3 deletions(-)
diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 3d1ca3b94ca..658f1bc9342 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -236,12 +236,119 @@ sbr_vector::bb_range_p (const basic_block bb)
return m_tab[bb->index] != NULL;
}
+// This class implements the on entry cache via a sparse bitmap.
+// It uses the quad bit routines to access 4 bits at a time.
+// A value of 0 (the default) means there is no entry, and a value of
+// 1 thru SBR_NUM represents an element in the m_range vector.
+// Varying is given the first value (1) and pre-cached.
+// SBR_NUM + 1 represents the value of UNDEFINED, and is never stored.
+// SBR_NUM is the number of values that can be cached.
+// Indexes are 1..SBR_NUM and are stored locally at m_range[0..SBR_NUM-1]
+
+#define SBR_NUM 14
+#define SBR_UNDEF SBR_NUM + 1
+#define SBR_VARYING 1
+
+class sbr_sparse_bitmap : public ssa_block_ranges
+{
+public:
+ sbr_sparse_bitmap (tree t, irange_allocator *allocator, bitmap_obstack *bm);
+ virtual void set_bb_range (const basic_block bb, const irange &r) OVERRIDE;
+ virtual bool get_bb_range (irange &r, const basic_block bb) OVERRIDE;
+ virtual bool bb_range_p (const basic_block bb) OVERRIDE;
+private:
+ irange_allocator *m_irange_allocator;
+ irange *m_range[SBR_NUM];
+ bitmap bitvec;
+ tree m_type;
+};
+
+// Initialize a block cache for an ssa_name of type T.
+
+sbr_sparse_bitmap::sbr_sparse_bitmap (tree t, irange_allocator *allocator,
+ bitmap_obstack *bm)
+{
+ gcc_checking_assert (TYPE_P (t));
+ m_type = t;
+ bitvec = BITMAP_ALLOC (bm);
+ m_irange_allocator = allocator;
+ // Pre-cache varying.
+ m_range[0] = m_irange_allocator->allocate (2);
+ m_range[0]->set_varying (t);
+ // Pre-cache zero and non-zero values for pointers.
+ if (POINTER_TYPE_P (t))
+ {
+ m_range[1] = m_irange_allocator->allocate (2);
+ m_range[1]->set_nonzero (t);
+ m_range[2] = m_irange_allocator->allocate (2);
+ m_range[2]->set_zero (t);
+ }
+ else
+ m_range[1] = m_range[2] = NULL;
+ // Clear SBR_NUM entries.
+ for (int x = 3; x < SBR_NUM; x++)
+ m_range[x] = 0;
+}
+
+// Set the range on entry to basic block BB to R.
+
+void
+sbr_sparse_bitmap::set_bb_range (const basic_block bb, const irange &r)
+{
+ if (r.undefined_p ())
+ {
+ bitmap_set_quad (bitvec, bb->index, SBR_UNDEF);
+ return;
+ }
+
+ // Loop thru the values to see if R is already present.
+ for (int x = 0; x < SBR_NUM; x++)
+ if (!m_range[x] || r == *(m_range[x]))
+ {
+ if (!m_range[x])
+ m_range[x] = m_irange_allocator->allocate (r);
+ bitmap_set_quad (bitvec, bb->index, x + 1);
+ return;
+ }
+ // All values are taken, default to VARYING.
+ bitmap_set_quad (bitvec, bb->index, SBR_VARYING);
+ return;
+}
+
+// Return the range associated with block BB in R. Return false if
+// there is no range.
+
+bool
+sbr_sparse_bitmap::get_bb_range (irange &r, const basic_block bb)
+{
+ int value = bitmap_get_quad (bitvec, bb->index);
+
+ if (!value)
+ return false;
+
+ gcc_checking_assert (value <= SBR_UNDEF);
+ if (value == SBR_UNDEF)
+ r.set_undefined ();
+ else
+ r = *(m_range[value - 1]);
+ return true;
+}
+
+// Return true if a range is present.
+
+bool
+sbr_sparse_bitmap::bb_range_p (const basic_block bb)
+{
+ return (bitmap_get_quad (bitvec, bb->index) != 0);
+}
+
// -------------------------------------------------------------------------
// Initialize the block cache.
block_range_cache::block_range_cache ()
{
+ bitmap_obstack_initialize (&m_bitmaps);
m_ssa_ranges.create (0);
m_ssa_ranges.safe_grow_cleared (num_ssa_names);
m_irange_allocator = new irange_allocator;
@@ -254,6 +361,7 @@ block_range_cache::~block_range_cache ()
delete m_irange_allocator;
// Release the vector itself.
m_ssa_ranges.release ();
+ bitmap_obstack_release (&m_bitmaps);
}
// Set the range for NAME on entry to block BB to R.
@@ -269,9 +377,21 @@ block_range_cache::set_bb_range (tree name, const basic_block bb,
if (!m_ssa_ranges[v])
{
- void *r = m_irange_allocator->get_memory (sizeof (sbr_vector));
- m_ssa_ranges[v] = new (r) sbr_vector (TREE_TYPE (name),
- m_irange_allocator);
+ // Use sparse representation if there are too many basic blocks.
+ if (last_basic_block_for_fn (cfun) > param_evrp_sparse_threshold)
+ {
+ void *r = m_irange_allocator->get_memory (sizeof (sbr_sparse_bitmap));
+ m_ssa_ranges[v] = new (r) sbr_sparse_bitmap (TREE_TYPE (name),
+ m_irange_allocator,
+ &m_bitmaps);
+ }
+ else
+ {
+ // Otherwise use the default vector implemntation.
+ void *r = m_irange_allocator->get_memory (sizeof (sbr_vector));
+ m_ssa_ranges[v] = new (r) sbr_vector (TREE_TYPE (name),
+ m_irange_allocator);
+ }
}
m_ssa_ranges[v]->set_bb_range (bb, r);
}
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 4af461d2aa3..ce4449a08db 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -61,6 +61,7 @@ private:
ssa_block_ranges &get_block_ranges (tree name);
ssa_block_ranges *query_block_ranges (tree name);
irange_allocator *m_irange_allocator;
+ bitmap_obstack m_bitmaps;
};
// This global cache is used with the range engine as markers for what
diff --git a/gcc/params.opt b/gcc/params.opt
index 0d0dcd216f6..18e6036c4f4 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -126,6 +126,10 @@ Maximum size (in bytes) of objects tracked bytewise by dead store elimination.
Common Joined UInteger Var(param_early_inlining_insns) Init(6) Optimization Param
Maximal estimated growth of function body caused by early inlining of single call.
+-param=evrp-sparse-threshold=
+Common Joined UInteger Var(param_evrp_sparse_threshold) Init(800) Optimization Param
+Maximum number of basic blocks before EVRP uses a sparse cache.
+
-param=evrp-mode=
Common Joined Var(param_evrp_mode) Enum(evrp_mode) Init(EVRP_MODE_EVRP_FIRST) Param Optimization
--param=evrp-mode=[legacy|ranger|legacy-first|ranger-first|ranger-trace|ranger-debug|trace|debug] Specifies the mode Early VRP should operate in.
--
2.17.2