> On Mon, May 29, 2023 at 6:20 PM Martin Jambor wrote:
> >
> > Hi,
> >
> > there have been concerns that linear searches through DECL_ARGUMENTS
> > that are often necessary to compute the index of a particular
> > PARM_DECL which is the key to results of IPA-CP can happen often
> > enough to be a
> On Thu, 1 Jun 2023, Andre Vieira (lists) wrote:
>
> > Hi,
> >
> > This is a follow-up of the internal function patch to add widening and
> > narrowing patterns. This patch improves the inliner cost estimation for
> > internal functions.
>
> I have no idea why calls are special in IPA analyze_
Hi,
as discussed this patch moves profile updating to tree-ssa-loop-ch.cc since it
is
now quite ch specific. There are no functional changes.
Boostrapped/regtesed x86_64-linux, comitted.
gcc/ChangeLog:
* tree-cfg.cc (gimple_duplicate_sese_region): Rename to ...
(gimple_duplicate
Hi,
loop-ch currently does analysis using ranger for all loops to identify
candidates and then follows by phase where headers are duplicated (which
breaks SSA and ranger). The second stage does more analysis (to see how
many BBs we want to duplicate) but can't use ranger and thus misses
informatio
Hi,
currently we rebuild profile_counts from profile_probability after inlining,
because there is a chance that producing large loop nests may get
unrealistically
large profile_count values. This is much less of concern when we switched to
new profile_count representation while back.
This propag
Hi,
While looking into sphinx3 regression I noticed that vectorizer produces
BBs with overall probability count 120%. This patch fixes it.
Richi, I don't know how to create a testcase, but having one would
be nice.
Bootstrapped/regtested x86_64-linux, commited last night (sorry for
late email)
g
Hi,
when vectorizing 4 times, we sometimes do
for
<4x vectorized body>
for
<2x vectorized body>
for
<1x vectorized body>
Here the second two fors handling epilogue never iterates.
Currently vecotrizer thinks that the middle for itrates twice.
This turns out to be scale_profile_fo
Hi,
try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts
the profile
from the original loop. However then it tries to scale the profile in a wrong
way
(it forces header count to be entry count).
This eliminates to profile misupdates in the internal loop of sphinx3.
gcc/C
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%. This patch fixes it.
> > Richi, I don't know
Hi,
this patch makes tree-ssa-loop-ch to understand if-combined conditionals (which
are quite common) and remove the IV-derived heuristics. That heuristics is
quite dubious because every variable with PHI in header of integral or pointer
type is seen as IV, so in the first basic block we match all
> Tamar Christina writes:
> > Hi All,
> >
> > The resulting predicate register of a whilelo is not
> > restricted to the lower half of the predicate register file.
> >
> > As such these tests started failing after recent changes
> > because the whilelo outside the loop is getting assigned p15.
>
Hi,
this patch cleanups API for determining expected loop iteraitons from profile.
We started with having expected_loop_iterations and only source was the integer
represented BB counts. It did some work on guessing number of iteration if
profile was absent or bogus. Later we introduced loop_info a
Hi,
we have flow_loop_dump and print_loop. While print_loop was extended to dump
stuff from loop structure we added over years (loop info), flow_loop_dump was
not.
-fdump-tree-all files contains flow_loop_dump which makes it hard to see what
metadata we have attached to loop.
This patch unifies d
Hi,
we have finite_p flag in loop structure. finite_loop_p already know to
use it, but we also may set the flag when we prove loop to be finite by
SCEV analysis to avoid duplicated work.
Bootstrapped/regtested x86_64-linux, OK?
gcc/ChangeLog:
* tree-ssa-loop-niter.cc (finite_loop_p): Re
Hi,
currently loop-ch skips all do-while loops. But when loop is not do-while
in addition to original goal of turining it to do-while it can do additional
things:
1) move out loop invariant computations
2) duplicate loop invariant conditionals and eliminate them in loop body.
3) prove that some
> > The patch requires bit of testsuite changes
> > - I disabled ch in loop-unswitch-17.c since it tests unswitching of
> >loop invariant conditional.
> > - pr103079.c needs ch disabled to trigger vrp situation it tests for
> >(otherwise we optimize stuff earlier and better)
> > - copy-h
Fix sreal::to_int and implement sreal::to_nearest_int
while exploring new loop estimate dumps, I noticed that loop iterating 1.8
times by profile is etimated as iterating once instead of 2 by nb_estimate.
While nb_estimate should really be a sreal and I will convert it incrementally,
I found probl
Hi,
this patch adds maybe_flat_loop_profile which can be used in loop profile udpate
to detect situation where the profile may be unrealistically flat and should
not be dwonscalled after vectorizing, unrolling and other transforms that
assume that loop has high iteration count even if the CFG profi
Hi,
this patch fixes template in the two testcases so it matches the output
correctly. I did not re-test after last changes in the previous patch,
sorry for that.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/copy-headers-9.c: Fix template for
tree-ssa-loop-ch.cc changes.
* gcc.dg/
Avoid scaling flat loop profiles of vectorized loops
As discussed, when vectorizing loop with static profile, it is not always good
idea
to divide the header frequency by vectorization factor because the profile may
not realistically represent the expected number of iterations. Since in such
ca
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%. This patch fixes it.
> > Richi, I don't know
Hi,
this patch makes profile_count::to_sreal_scale consider the scale
unknown when in is 0. This fixes the case where loop has 0 executions
in profile feedback and thus we can't determine its trip count.
Bootstrapped/regtested x86_64-linux, comitted.
Honza
gcc/ChangeLog:
* profile-coun
Hi,
profile_count::apply_probability misses check for uninitialized
probability which leads to completely random results on applying
uninitialized probability to initialized scale. This can make
difference when i.e. inlining -fno-guess-branch-probability function to
-fguess-branch-probability one.
Hi,
this fixes two bugs in tree-ssa-loop-im.cc. First is that cap probability is
not
reliable, but it is constructed with adjusted quality. Second is that sometimes
the conditional has wrong joiner BB count. This is visible on
testsuite/gcc.dg/pr102385.c however the testcase triggers another pr
Hi,
This patch fixes profile update in tree_transform_and_unroll_loop which is used
by predictive comming. I stared by attempt to fix
gcc.dg/tree-ssa/update-unroll-1.c I xfailed last week, but it turned to be
harder job.
Unrolling was never fixed for changes in duplicate_loop_body_to_header_edge
This patch fixes profile update after RTL unroll, that is now done same way as
in tree one. We still produce (slightly) corrupted profile for multiple exit
loops I can try to fix incrementally.
I also updated testcases to look for profile mismatches so they do not creep
back in again.
Bootstrapp
Hi,
as discussed with Richard, we want store to be likely in
optimize_mask_stores.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* tree-vect-loop.cc (optimize_mask_stores): Make store
likely.
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 256155
Hi,
while looking on profile misupdate on hmmer I noticed that loop splitting pass
is not
able to handle the loop it has as an example it should apply on:
One transformation of loops like:
for (i = 0; i < 100; i++)
{
if (i < 50)
A;
else
B;
}
in
Hi,
this patch fixes profile update in the first case of loop splitting.
The pass still gives up on very basic testcases:
__attribute__ ((noinline,noipa))
void test1 (int n)
{
if (n <= 0 || n > 10)
return;
for (int i = 0; i <= n; i++)
{
if (i < n)
do_something ();
> On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > this patch fixes profile update in the first case of loop splitting.
> > The pass still gives up on very basic testcases:
> >
> > __attribute__ ((noinline,noipa))
>
Hi,
This patch extends tree-ssa-loop-split to understand test of the form
if (i==0)
and
if (i!=0)
which triggers only during the first iteration. Naturally we should
also be able to trigger last iteration or split into 3 cases if
the test indeed can fire in the middle of the loop.
Last iteratio
> On Fri, Jul 28, 2023 at 2:57 PM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > This patch extends tree-ssa-loop-split to understand test of the form
> > if (i==0)
> > and
> > if (i!=0)
> > which triggers only during the first iteration.
Hi,
Vectorizer while loop versioning produces a versioned loop
guarded with two conditionals of the form
if (cond1)
goto scalar_loop
else
goto next_bb
next_bb:
if (cond2)
godo scalar_loop
else
goto vector_loop
It wants the combined test to be prob (whch is set to likely)
a
Hi,
This patch fixes update after constant peeling in profilogue. We now reached 0
profile
update bugs on tramp3d vectorizaiton and also on quite few testcases, so I am
enabling the
testuiste checks so we do not regress again.
Bootstrapped/regtested x86_64, comitted.
Honza
gcc/ChangeLog:
Hi,
Loop distribution and ifcvt introduces verisons of loops which may be removed
later if vectorization fails. Ifcvt does this by temporarily breaking profile
and producing conditional that has two arms with 100% probability because we
know one of the versions will be removed.
Loop distribution
> On Thu, Jun 23, 2022 at 4:03 AM Kewen.Lin wrote:
> >
> > Hi,
> >
> > Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596212.html
> >
> > BR,
> > Kewen
> >
> > on 2022/6/6 14:20, Kewen.Lin via Gcc-patches wrote:
> > > Hi,
> > >
> > > PR105459 exposes one issue in inline_call handl
Hello,
> From: Lili
>
>
> Hi Hubicka,
>
> This patch is to add a heuristic inline hint to eliminate redundant load and
> store.
>
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> OK for trunk?
>
> Thanks,
> Lili.
>
> Add a INLINE_HINT_eliminate_load_and_store hint in to inline
> Hi,
>
> with -fno-toplevel-reorder (and -fwhole-program), there apparently can
> be local functions without any callers. This is something that IPA-CP
If there is possibility to trigger a local function without callers, I
think one can also make two local functions calling each other but with
> gcc/ChangeLog:
>
> * profile.cc (compute_branch_probabilities): Dump details only
> if TDF_DETAILS.
> * symtab.cc (symtab_node::dump_base): Do not dump pointer unless
> TDF_ADDRESS is used, it makes comparison harder.
> ---
> gcc/profile.cc | 2 +-
> gcc/symtab.cc | 3 +
> On Tue, 2 Aug 2022, Aldy Hernandez wrote:
>
> > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote:
> > >
> > > On Tue, 2 Aug 2022, Aldy Hernandez wrote:
> > >
> > > > Unfortunately, this was before my time, so I don't know.
> > > >
> > > > That being said, thanks for tackling these issues tha
> The following swaps the loop splitting pass and the final value
> replacement pass to avoid keeping the IV of the earlier loop
> live when not necessary. The existing gcc.target/i386/pr87007-5.c
> testcase shows that we otherwise fail to elide an empty loop
> later. I don't see any good reason
>
> Note most of the profile consistency checks FAIL when testing with -m32 on
> x86_64-unknown-linux-gnu ...
>
> For example vect-11.c has
>
> ;; basic block 4, loop depth 0, count 719407024 (estimated locally,
> freq 0.6700), maybe hot
> ;; Invalid sum of incoming counts 708669602 (estimat
> >
> > Note most of the profile consistency checks FAIL when testing with -m32 on
> > x86_64-unknown-linux-gnu ...
> >
> > For example vect-11.c has
> >
> > ;; basic block 4, loop depth 0, count 719407024 (estimated locally,
> > freq 0.6700), maybe hot
> > ;; Invalid sum of incoming counts
> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor wrote:
> >
> > Hi,
> >
> > when IPA-SRA detects whether a parameter passed by reference is
> > written to, it does not special case CLOBBERs which means it often
> > bails out unnecessarily, especially when dealing with C++ destructors.
> > Fixed by
> > Jeff, an help would be appreciated here :)
> >
> > I will try to debug this. One option would be to disable branch
> > prediciton on vect_check for time being - it is not inlined anyway
> Not a lot of insight. The backwards threader uses a totally different API
> for the CFG/SSA updates and
Hi,
Profiledbootstrap fails with ICE in update_loop_exit_probability_scale_dom_bbs
called from loop unroling.
The reason is that under relatively rare situations, we may run into case where
loop has multiple exits and all are considered as likely but then we scale down
the profile and one of the ex
Hi,
Hmmer's internal function has 4 loops. The following is the profile at start:
loop 1:
estimate 472
iterations by profile: 473.497707 (reliable) count in:84821 (precise, freq
0.9979)
loop 2:
estimate 99
iterations by profile: 100.00 (reliable) count in:39848881 (precise
> >
> > A couple cycles ago I separated most of code to distinguish between the
> > back and forward threaders. There is class jt_path_registry that is
> > common to both, and {fwd,back}_jt_path_registry for the forward and
> > backward threaders respectively. It's not perfect, but it's a start.
Hi,
this prevents useless loop distribiton produced in hmmer. With FDO we now
correctly work out that the loop created for last iteraiton is not going to
iterate however loop distribution still produces a verioned loop that has no
chance to survive loop vectorizer since we only keep distributed lo
Hi,
so I found the problem. We duplicate multiple paths and end up with:
;; basic block 6, loop depth 0, count 365072224 (estimated locally, freq 0.3400)
;; prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED)
;; pred: 4 [never (guessed)] count:0 (estimated locally, freq 0.)
> On Fri, Aug 4, 2023 at 9:16 AM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > this prevents useless loop distribiton produced in hmmer. With FDO we now
> > correctly work out that the loop created for last iteraiton is not going to
> > iterate howeve
Hi,
Epilogue peeling expects the scalar loop to have same number of executions as
the vector loop which is true at the beggining of vectorization. However if the
epilogues are vectorized, this is no longer the case. In this situation the
loop preheader is replaced by new guard code with correct pr
Hi,
If loop is ifconverted and later versioning by vectorizer, vectorizer will
reuse the scalar loop produced by ifconvert. Curiously enough it does not seem
to do so for versions produced by loop distribution while for loop distribution
this matters (since since both ldist versions survive to fina
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote:
> >
> > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener
> > wrote:
> > >
> > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote:
> > > >
> > > > Currently we have 3 different independent tunes for gather
> > > > "use_gather,use_gather_2parts,use_
Hi,
This patch avoid overflow in profile_count::differs_from_p and also makes it to
return false from one of the values is undefined while other is defined.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* profile-count.cc (profile_count::differs_from_p): Fix overflow and
Hi,
ssa_fix_duplicate_block_edges later calls update_profile to correct profile
after threading.
In the testcase this does not work since we lose track of the duplicated edge.
This
happens because redirect_edge_and_branch returns NULL if the edge already has
correct
destination which is the ca
Hi,
this patch makes duplicate_loop_body_to_header_edge to not drop profile counts
to
uninitialized when count_in is 0. This happens because profile_probability in
0 count
is undefined.
Bootstrapped/regtested x86_64-linux, committed.
gcc/ChangeLog:
* cfgloopmanip.cc (duplicate_loop_bo
Hi,
Profile update I added to tree-ssa-loop-split can divide by zero in
situation that the conditional is predicted with 0 probability which
is triggered by jump threading update in the testcase.
gcc/ChangeLog:
PR middle-end/110923
* tree-ssa-loop-split.cc (split_loop): Watch for
Hi,
My patch to fix profile after folding internal call is missing check for the
case profile was already zero before if-conversion.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
PR gcov-profile/110988
* tree-cfg.cc (fold_loop_internal_call): Avoid division by zer
> The following testcase shows that we are bad at identifying inductions
> that will be optimized away after vectorizing them because SCEV doesn't
> handle vectorized defs. The following rolls a simpler identification
> of SSA cycles covering a PHI and an assignment with a binary operator
> with a
> Hi,
>
> In IPA-SRA we use can_be_local_p () predicate rather than just plain
> local call graph flag in order to figure out whether the node is a
> part of an external API that we cannot change. Although there are
> cases where this can allow more transformations, it also means we can
> analyze
Hi,
we currently produce very bad code on loops using std::vector as a stack, since
we fail to inline push_back which in turn prevents SRA and we fail to optimize
out some store-to-load pairs (PR109849).
I looked into why this function is not inlined and it is inlined by clang. We
currently estim
Hi,
_M_check_len is used in vector reallocations. It computes __n + __s but does
checking for case that (__n + __s) * sizeof (Tp) would overflow ptrdiff_t.
Since we know that __s is a size of already allocated memory block if __n is
not too large, this will never happen on 64bit systems since memor
Hi,
this patch extends ipa-fnsummary to anticipate statements that will be removed
by SRA. This is done by looking for calls passing addresses of automatic
variables. In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.
Thi
Hi,
as noticed by Jeff, this patch also triggers warning in one of LTO
testcases. The testcase is reduced and warning seems legit, triggered
by extra inlining. So I have just silenced it.
Honza
gcc/testsuite/ChangeLog:
* gcc.dg/lto/20091013-1_0.c: Disable stringop-overread warning.
di
Hi,
this patch avoids unnecessary post dominator and update_ssa in phiprop.
Bootstrapped/regtested x86_64-linux, OK?
gcc/ChangeLog:
* tree-ssa-phiprop.cc (propagate_with_phi): Add
post_dominators_computed;
compute post dominators lazilly.
(const pass_data pass_data_phipr
Hi,
this was suggested earlier somewhere, but I can not find the thread.
C++ has assume attribute that expands int
if (conditional)
__builtin_unreachable ()
We do not want to account the conditional in inline heuristics since
we know that it is going to be optimized out.
Bootstrapped/regtest
> On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> wrote:
> >
> > Hi,
> > this was suggested earlier somewhere, but I can not find the thread.
> > C++ has assume attribute that expands int
> > if (conditional)
> > __builtin_unreac
> > - if (max_size() - size() < __n)
> > - __throw_length_error(__N(__s));
> > + // On 64bit systems vectors of small sizes can not
> > + // reach overflow by growing by small sizes; before
> > + // this happens, we will run out of memory.
> > + if (__builtin_c
> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
>
> > On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
> > wrote:
> > > - if (max_size() - size() < __n)
> > > - __throw_length_error(__N(__s));
> >
> >
> > size_type
> > _M_check_len(size_type __n, const char* __s) const
> > {
> > const size_type __size = size();
> > const size_type __max_size = max_size();
> >
> > if (__is_same(allocator_type, allocator<_Tp>)
> > && __size > __max_size /
> > >
> > > size_type
> > > _M_check_len(size_type __n, const char* __s) const
> > > {
> > > const size_type __size = size();
> > > const size_type __max_size = max_size();
> > >
> > > if (__is_same(allocator_type, allocator<_Tp>)
> > > && __s
>
> If I manually add a __builtin_unreachable () to the above case
> I see the *(int *)0 = 0; store DSEd. Maybe we should avoid
> removing stores that might trap here? POSIX wise such a trap
> could be a way to jump out of the path leading to unreachable ()
> via siglongjmp ...
I am not sure ho
>
>
> On 6/22/23 00:31, Richard Biener wrote:
> > I think there's a difference in that __builtin_trap () is observable
> > while __builtin_unreachable () is not and reaching __builtin_unreachable
> > () invokes undefined behavior while reaching __builtin_trap () does not.
> >
> > So the isolatio
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > > wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> >
> > gcc/ChangeLog:
> >
> > * builtins.cc (expand_builtin_fork_or_exec): Check
> > profile_condition_flag.
> > * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
> > * common.opt: Add new options -fprofile-conditions and
> > * doc/gcov.texi: Add --conditions
>
> So you need to feed it with extra info on the optimized out stmts because
> as-is it will not remove __builtin_unreachable (). That means you're
My plan was to add entry point to tree-ssa-dce that will take an
set of stmts declared dead by external force and will do the usual mark
stage bypa
Hi,
here is updated version with TODO_update_ssa_only_virtuals.
bootstrapped/regtested x86_64-linux. OK?
gcc/ChangeLog:
* tree-ssa-phiprop.cc (propagate_with_phi): Compute post dominators on
demand.
(pass_phiprop::execute): Do not compute it here; return
update_ssa
> I intend to push this to trunk once testing finishes.
>
> I generated the diff with -b so the whitespace changes aren't shown,
> because there was some re-indenting that makes the diff look larger than
> it really is.
>
> Honza, I don't think this is likely to make much difference for the PR
>
> > Also as discussed some time ago, the volatile loads between traps has
> > effect of turning previously pure/const functions into non-const which
> > is somewhat sad, so it is still on my todo list to change it this stage1
> > to something more careful. We discussed internal functions trap_sto
Hi,
compiling the testcase from PR109849 (which uses std:vector based stack to
drive a loop) with profile feedbakc leads to profile mismatches introduced by
tree-ssa-dce. This is the new code to produce unified forwarder blocks for
PHIs.
I am not including the testcase itself since
checking it fo
Hi,
playing with testcases for path isolation and const function, I noticed
that we do not seem to even try to isolate out of range array accesses:
int a[3]={0,1,2};
test(int i)
{
if (i > 3)
return test2(a[i]);
return a[i];
}
Here call to test2 is dead, since a[i] will acces
Hi,
as shown in the testcase (which would eventually be useful for
optimizing std::vector's push_back), ipa-prop can use context dependent ranger
queries for better value range info.
Bootstrapped/regtested x86_64-linux, OK?
Honza
gcc/ChangeLog:
PR middle-end/110377
* ipa-prop.cc
>
> On 6/27/23 09:19, Jan Hubicka wrote:
> > Hi,
> > as shown in the testcase (which would eventually be useful for
> > optimizing std::vector's push_back), ipa-prop can use context dependent
> > ranger
> > queries for better value range info.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
>
>
> On 6/27/23 12:24, Jan Hubicka wrote:
> > > On 6/27/23 09:19, Jan Hubicka wrote:
> > > > Hi,
> > > > as shown in the testcase (which would eventually be useful for
> > > > optimizing std::vector's push_back), ipa-prop can use context dependent
> > > > ranger
> > > > queries for better value ra
> I think the __throw_bad_alloc() and __throw_bad_array_new_length()
> functions should always be rare, so marking them cold seems fine (users who
> define their own allocators that want to throw bad_alloc "often" will
> probably throw it directly, they shouldn't be using our __throw_bad_alloc()
>
Hi,
early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder. This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite com
Compute ipa-predicates for conditionals involving __builtin_expect_p
std::vector allocator looks as follows:
__attribute__((nodiscard))
struct pair * std::__new_allocator
>::allocate (struct __new_allocator * const this, size_type __n, const void *
D.27753)
{
bool _1;
long int _2;
long in
Hi,
while looking into the std::vector _M_realloc_insert codegen I noticed that
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it. This
incorrectly takes precedence over more reliable heuristics predicting
Hi,
this patch fixes some of profile mismatches caused by profile updating.
It seems that I misupdated update_bb_profile_for_threading in 2017 which
results in invalid updates from rtl threading and threadbackwards.
update_bb_profile_for_threading knows that some paths to BB are being
redirected el
Hi,
most common source of profile mismatches is now copyheader pass. The reason is
that
in comon case the duplicated header condition will become constant true and
that needs
changes in the loop exit condition probability.
While this can be done by jump threading it is not, since it gives up on
> The mod-subtract optimization with ncounts==1 produced incorrect edge
> probabilities due to incorrect conditional probability calculation. This
> patch fixes the calculation.
>
> gcc/ChangeLog:
>
> * value-prof.cc (gimple_mod_subtract_transform): Correct edge
> prob calculation.
> On Wed, 28 Jun 2023, Tamar Christina wrote:
>
> > Hi All,
> >
> > There's an existing bug in loop frequency scaling where the if statement
> > checks
> > to see if there's a single exit, and records an dump file note but then
> > continues.
> >
> > It then tries to access the null pointer, wh
Hi,
this patch applies some TLC to update_bb_profile_for_threading. The function
resales
probabilities by:
FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;
which is correct but in case prob is 0 (took all execution counts to the newly
constructed path), this leads to undefi
Hi,
original scale_loop_profile was implemented to only handle very simple loops
produced by vectorizer at that time (basically loops with only one exit and no
subloops). It also has not been updated to new profile-count API very carefully.
Since I want to use it from loop peeling and unlooping, I
Hi,
this patch makes loop-ch and loop unrolling to fix profile in case the loop is
known to not iterate at all (or iterate few times) while profile claims it
iterates more. While this is kind of symptomatic fix, it is best we can do
incase profile was originally esitmated incorrectly.
In the test
>
> Looks good, but I wonder what we can do to at least make the
> multiple exit case behave reasonably? The vectorizer keeps track
> of a "canonical" exit, would it be possible to pass in the main
> exit edge and use that instead of single_exit (), would other
> exits then behave somewhat reaso
> Hi Both,
>
> Thanks for all the reviews/patches so far 😊
>
> > >
> > > Looks good, but I wonder what we can do to at least make the multiple
> > > exit case behave reasonably? The vectorizer keeps track
> >
> > > of a "canonical" exit, would it be possible to pass in the main exit
> > > edge
Hi,
Information about profile mismatches is printed only with -details-blocks for
some time.
I think it should be printed even with default to make it easier to spot when
someone introduces
new transform that breaks the profile, but I will send separate RFC for that.
This patch enables details i
Hi,
we can use the new set_edge_probability_and_rescale_others here.
Bootstrapped/regtested x86_64-linux, comitted.
Honza
gcc/ChangeLog:
* predict.cc (force_edge_cold): Use
set_edge_probability_and_rescale_others; improve dumps.
diff --git a/gcc/predict.cc b/gcc/predict.cc
inde
1 - 100 of 403 matches
Mail list logo