On 6/22/24 09:15, Richard Biener wrote:
On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacl...@redhat.com> wrote:
This patch adds
--param=vrp-block-limit=N
When the basic block counter for a function exceeded 'N' , VRP is
invoked with the new fast_vrp algorithm instead. This algorithm uses a
lot less memory and processing power, although it does get a few less
things.
Primary motivation is cases like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3 VRP
passes consume about 600 seconds of the compile time, and a lot of
memory. With fast_vrp, it spends less than 10 seconds total in the
3 passes of VRP. This test case has about 400,000 basic blocks.
The default for N in this patch is 150,000, arbitrarily chosen.
This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
as well) on x86_64-pc-linux-gnu, with no regressions.
What do you think, OK for trunk?
+ if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+ &data == &pass_data_fast_vrp)
|| goes to the next line.
Btw, we have -Wdisabled-optimization for these cases which should
say sth like "function has excess of %d number of basic blocks
(--param vrp-block-limit=%d), using fast VRP algorithm"
or so in this case.
As I wrote in the PR the priority should be -O1 compile-time
performance and memory use.
Yeah, I just wanted to use it as a model for "bad" cases for ranger.
Adjusted patch attached which now issues the warning.
I also found that the transitive relations were causing a small blowup
in time for relation processing now that I turned relations on for fast
VRP. I commited a patch and fast_vrp no longer does transitives.
If you want to experiment with enabling fast VRP at -O1, it should be
fast all the time now. I think :-) This testcase runs in about 95
seconds on my test machine. if I turn on VRP, a single VRP pass takes
about 2.5 seconds. Its all set up, you can just add:
NEXT_PASS (pass_fast_vrp);
at an appropriate spot.
Richard.
Andrew
PS sorry,. it doesn't help the threader in that PR :-(
It's probably one of the known bottlenecks there - IIRC the path range
queries make O(n^2) work. I can look at this at some point as I've
dealt with the slowness of this pass in the past.
There is param_max_jump_thread_paths that should limit searching
but there is IIRC no way to limit the work path ranger actually does
when resolving the query.
Yeah, Id like to talk to Aldy about revamping the threader now that some
of the newer facilities are available that fast_vrp uses.
We can calculate all the outgoing ranges for a block at once with :
// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
NULL);
This is what the fast_vrp routines uses. We can gather all range
restrictions generated from an edge efficiently just once and then
intersect them with a known range as we walk the different paths. We
don't need the gori exports , nor any of the other on-demand bits where
we calculate each export range dynamically.. I suspect it would reduce
the workload and memory impact quite a bit, but I'm not really familiar
with exactly how the threader uses those things.
It'd require some minor tweaking to the lazy_ssa_cache to make the
bitmap of names set accessible. This would provide similar
functionality to what the gori export () routine provides. Both
relations and inferred ranges should only need to be calculated once per
block as well and could/should/would be applied the same way if they are
present. I don't *think* the threader uses any of the def chains, but
Aldy can chip in.
Andrew
From 15f697aad90c35e42a4416d7db6e7289c0f5aae3 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacl...@redhat.com>
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 2/2] Add param for bb limit to invoke fast_vrp.
If the basic block count is too high, simply use fast_vrp for all
VRP passes.
gcc/doc/
* invoke.texi (vrp-block-limit): Document.
gcc/
* params.opt (-param=vrp-block-limit): New.
* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
count exceeds limit.
---
gcc/doc/invoke.texi | 3 +++
gcc/params.opt | 4 ++++
gcc/tree-vrp.cc | 16 ++++++++++++++--
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c790e2f3518..80da5e9d306 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@ this parameter. The default value of this parameter is 50.
@item vect-induction-float
Enable loop vectorization of floating point inductions.
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
@item vrp-sparse-threshold
Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
Enable loop vectorization of floating point inductions.
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
-param=vrp-sparse-threshold=
Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 26979b706e5..a81fcc696b7 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3. If not see
#include "ipa-cp.h"
#include "ipa-prop.h"
#include "attribs.h"
+#include "diagnostic-core.h"
// This class is utilized by VRP and ranger to remove __builtin_unreachable
// calls, and reflect any resulting global ranges.
@@ -1331,9 +1332,20 @@ public:
unsigned int execute (function *fun) final override
{
// Check for fast vrp.
- if (&data == &pass_data_fast_vrp)
- return execute_fast_vrp (fun, final_p);
+ bool use_fvrp = (&data == &pass_data_fast_vrp);
+ if (!use_fvrp && last_basic_block_for_fn (fun) > param_vrp_block_limit)
+ {
+ use_fvrp = true;
+ warning (OPT_Wdisabled_optimization,
+ "Using fast VRP algorithm. %d basic blocks"
+ " exceeds %s%d limit",
+ n_basic_blocks_for_fn (fun),
+ "--param=vrp-block-limit=",
+ param_vrp_block_limit);
+ }
+ if (use_fvrp)
+ return execute_fast_vrp (fun, final_p);
return execute_ranger_vrp (fun, final_p);
}
--
2.45.0