https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698
--- Comment #4 from kugan at gcc dot gnu.org ---
Thanks for looking into this. The main reason we ere seeing performance issue
turned out to be due to glibc malloc issue in
https://sourceware.org/bugzilla/show_bug.cgi?id=30945
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683
--- Comment #5 from kugan at gcc dot gnu.org ---
-O3 -fno-tree-vectorize and -O3 -fno-tree-vrp works. I looked at the ever
dump and it is not doing anything suspicious. Looks like range_info usage in
vectoriser is causing the problem.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116337
Bug ID: 116337
Summary: Reverse iterated loops has redundant code compared to
clang
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
Bug ID: 116338
Summary: GCC is not vectoring TSVC s255 while clang can
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #20 from kugan at gcc dot gnu.org ---
(In reply to Richard Sandiford from comment #19)
> (In reply to Richard Biener from comment #14)
> > Usually targets do have a limit on the actual length but I see
> > constant_upper_bound_with_li
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #3 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> The issue is the recurrence
>
>[local count: 10737416]:
> x_10 = b[31999];
> y_11 = b[31998];
>
>[local count: 1063004408]:
> # x_18 =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> You can try to see whether adding a SSA copy would make this supported, it
> seems not allowing a PHI is simply a missed feature.
We now fail in
/*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116528
Bug ID: 116528
Summary: Not vectoring TSVC s318 loop
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116562
Bug ID: 116562
Summary: wrong cost of gather load preventing loop from
vectored
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Prior
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116626
Bug ID: 116626
Summary: ICE while VLA vectorisation
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116626
--- Comment #1 from kugan at gcc dot gnu.org ---
Looks duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
Bug ID: 114653
Summary: Not vectoring the loop with openmp reduction.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: mi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
--- Comment #2 from kugan at gcc dot gnu.org ---
Thanks. I see the following in the log:
test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed: bad operation or unsupport
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
--- Comment #3 from kugan at gcc dot gnu.org ---
For SVE mode in vect_analyze_loop_2, we have
(gdb) p min_vf
$15 = {coeffs = {4, 4}}
(gdb) p max_vf
$16 = 16
Thus maybe_lt (max_vf, min_vf)) is false. This results in bad data dependence.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
--- Comment #4 from kugan at gcc dot gnu.org ---
This particular loop has loop->safelen set to 16. Does this mean this can never
be loop vectorized for VLA?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
--- Comment #5 from kugan at gcc dot gnu.org ---
ddd for the :
ref_a:
_57 = D.4803[_20];
ref_b:
D.4803[_20] = _ifc__174;
We get DDR_ARE_DEPENDENT (ddr) == chrec_dont_know. Hence apply_safelen ().
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
kugan at gcc dot gnu.org changed:
What|Removed |Added
Resolution|--- |DUPLICATE
Status|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
kugan at gcc dot gnu.org changed:
What|Removed |Added
CC||kugan at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 114653, which changed state.
Bug 114653 Summary: Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653
What|Removed |Added
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #9 from kugan at gcc dot gnu.org ---
Looking at the options, looks to me that making loop->safelen a poly_in is the
way to go. (In reply to Jakub Jelinek from comment #4)
> The OpenMP safelen clause argument is a scalar integer, so us
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #10 from kugan at gcc dot gnu.org ---
Created attachment 57946
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57946&action=edit
patch
patch to make loop->safelen a poly_int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #11)
> (In reply to kugan from comment #9)
> > Looking at the options, looks to me that making loop->safelen a poly_in is
> > the way to go. (In reply to Ja
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #18 from kugan at gcc dot gnu.org ---
Also, can we set INT_MAX when there is no explicit safelen specified in OMP.
Something like:
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6975,14 +6975,11 @@ lower_rec_input_clauses (tree clause
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383
Bug ID: 115383
Summary: ICE with TCVC_2 build
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383
--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> Created attachment 58378 [details]
> patch
>
> I'm testing this, but I do not have hardware to test correctness (and qemu
> not set up).
Thanks. I w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383
--- Comment #6 from kugan at gcc dot gnu.org ---
(In reply to kugan from comment #5)
> (In reply to Richard Biener from comment #4)
> > Created attachment 58378 [details]
> > patch
> >
> > I'm testing this, but I do not have hardware to test cor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698
Bug ID: 113698
Summary: GNU OpenMP with OMP_PROC_BIND alters thread affinity
in a way that negatively affects performance
Product: gcc
Version: 14.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115450
Bug ID: 115450
Summary: cpu2017 502.gcc runtime miscompute
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimiza
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115450
--- Comment #2 from kugan at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> >[r15-1006-gd93353e6423eca] Do single-lane SLP discovery for reductions
>
>
> Interesting because PR 115256 bisect it to an earlier patch.
I believe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
--- Comment #14 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #13)
> Did it help?
Thanks for the quick Fix. This commit brings back most of the regression.
Please note that the current trunk seems to be broken for un
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117050
kugan at gcc dot gnu.org changed:
What|Removed |Added
CC||kugan at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258
kugan at gcc dot gnu.org changed:
What|Removed |Added
CC||kugan at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
--- Comment #10 from kugan at gcc dot gnu.org ---
Created attachment 59186
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59186&action=edit
reduced test (second attempt)
Sorry about the test case. Here is another attempt at reducing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
Bug ID: 116785
Summary: RAJAPerf REDUCE_SUM regresses with commit
f0a02467bbc35a478eb82f5a8a7e8870827b51fc
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Sever
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
--- Comment #2 from kugan at gcc dot gnu.org ---
Created attachment 59155
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59155&action=edit
creduce reduced file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785
--- Comment #1 from kugan at gcc dot gnu.org ---
Created attachment 59154
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59154&action=edit
preprocessed file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
--- Comment #9 from kugan at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #8)
> Can you try again now that PR 117350 has actually been pushed?
Thanks. This fixes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
kugan at gcc dot gnu.org changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
--- Comment #6 from kugan at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #5)
> Specifically see
> https://inbox.sourceware.org/gcc-patches/20241031204043.3231740-1-ak@linux.
> intel.com/T/#u .
>
> You need to figure out why need_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
Bug ID: 117782
Summary: template ICE in write_unscoped_name while using
autofda bootstrap on aarch64
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
--- Comment #1 from kugan at gcc dot gnu.org ---
Created attachment 59705
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59705&action=edit
profile gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782
--- Comment #2 from kugan at gcc dot gnu.org ---
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -1194,6 +1194,7 @@ write_unscoped_name (const tree decl)
in a local function scope. A lambda can also be mangled in the
scope of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118320
Bug ID: 118320
Summary: [aarch64] internal compiler error: Segmentation fault
in aarch64-ldp-fusion.cc
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
Bug ID: 120614
Summary: 525.x264_r is ~30% slower with AutoFDO
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: gcov-prof
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #3 from kugan at gcc dot gnu.org ---
Created attachment 61610
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61610&action=edit
x264_pixel_sad_x4_16x16.diff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #4 from kugan at gcc dot gnu.org ---
x264_pixel_sad_x4_16x16.diff is at -O3 without -flto. Function level profiling
is same even with -flto.
x264_pixel_sad_x4_16x16 total:18508 head:4627
0: 4627
0.1: 0
0.2: 0
0.3: 0
0.4:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #8 from kugan at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #6)
> Also BTW, I think it is useful to do the dumps wth -details-blocks since
> that also dumps BB count inconsistencies caused by AutoFDO that are
> otherwis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #7 from kugan at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #6)
> Also BTW, I think it is useful to do the dumps wth -details-blocks since
> that also dumps BB count inconsistencies caused by AutoFDO that are
> otherwis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #10 from kugan at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #9)
> > > as mentioned by Andrew, it is important to clone and also resolve indirect
> > > calls. Those auto-FDO 0 may prevent it from happening.
> > > It is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #11 from kugan at gcc dot gnu.org ---
This specific ICE seems to be fixed with
e416c8097fc87513e05c2d104c63488f733758c0
Thanks for the fix.
I am now seeing one in:
x264_src/common/mc.c: In function 'mc_weight_w16.part.0':
x264_src/c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to kugan from comment #11)
> This specific ICE seems to be fixed with
> e416c8097fc87513e05c2d104c63488f733758c0
> Thanks for the fix.
>
> I am now seeing one in:
>
> x264_src/common/m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #17 from kugan at gcc dot gnu.org ---
fotonik3d_r regresses -20% compared to base (no PGO).
Base perf
33.19% fotonik3d_r_pea fotonik3d_r_peak.mytest-64 [.]
leapfrog_.constprop.0
23.76% fotonik3d_r_pea fotonik3d_r_peak.my
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #16 from kugan at gcc dot gnu.org ---
I ran spec2017 again with recent gcc and SPE based autofdo (with local patches
to enable SPE based profiling support for autofdo tools). I am seeing following
compared PGO:
621.wrf_s -23%
549.fot
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #19 from kugan at gcc dot gnu.org ---
I did the spec2017 runs few days ago and the .gcov files looks OK. I can see
them with dump_gcov.
I am seeing hot/cold blocks switched in __material_mod_MOD_mat_updatee/13 of
fotonik3d_r (see the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121210
Bug ID: 121210
Summary: IPA Inline pass ICE with AutoFDO
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #21 from kugan at gcc dot gnu.org ---
I looked into 531.deepsjeng_r. For deepsjeng_r we see similar performance for
AutoFDO as without it. Still looks like we have a missed opportunity there as
srearch() now accounts for higher time i
56 matches
Mail list logo