from:"kugan at gcc dot gnu.org via Gcc\-bugs"

[Bug libgomp/113698] GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-02-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698 --- Comment #4 from kugan at gcc dot gnu.org --- Thanks for looking into this. The main reason we ere seeing performance issue turned out to be due to glibc malloc issue in https://sourceware.org/bugzilla/show_bug.cgi?id=30945

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683 --- Comment #5 from kugan at gcc dot gnu.org --- -O3 -fno-tree-vectorize and -O3 -fno-tree-vrp works. I looked at the ever dump and it is not doing anything suspicious. Looks like range_info usage in vectoriser is causing the problem.

[Bug middle-end/116337] New: Reverse iterated loops has redundant code compared to clang

2024-08-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116337 Bug ID: 116337 Summary: Reverse iterated loops has redundant code compared to clang Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal P

[Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can

2024-08-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338 Bug ID: 116338 Summary: GCC is not vectoring TSVC s255 while clang can Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: t

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-08-13 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #20 from kugan at gcc dot gnu.org --- (In reply to Richard Sandiford from comment #19) > (In reply to Richard Biener from comment #14) > > Usually targets do have a limit on the actual length but I see > > constant_upper_bound_with_li

[Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can

2024-08-20 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338 --- Comment #3 from kugan at gcc dot gnu.org --- (In reply to Richard Biener from comment #2) > The issue is the recurrence > >[local count: 10737416]: > x_10 = b[31999]; > y_11 = b[31998]; > >[local count: 1063004408]: > # x_18 =

[Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can

2024-08-20 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338 --- Comment #5 from kugan at gcc dot gnu.org --- (In reply to Richard Biener from comment #4) > You can try to see whether adding a SSA copy would make this supported, it > seems not allowing a PHI is simply a missed feature. We now fail in /*

[Bug tree-optimization/116528] New: Not vectoring TSVC s318 loop

2024-08-28 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116528 Bug ID: 116528 Summary: Not vectoring TSVC s318 loop Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization

[Bug middle-end/116562] New: wrong cost of gather load preventing loop from vectored

2024-09-01 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116562 Bug ID: 116562 Summary: wrong cost of gather load preventing loop from vectored Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Prior

[Bug middle-end/116626] New: ICE while VLA vectorisation

2024-09-05 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116626 Bug ID: 116626 Summary: ICE while VLA vectorisation Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end

[Bug middle-end/116626] ICE while VLA vectorisation

2024-09-05 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116626 --- Comment #1 from kugan at gcc dot gnu.org --- Looks duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116569

[Bug middle-end/114653] New: Not vectoring the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 Bug ID: 114653 Summary: Not vectoring the loop with openmp reduction. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: mi

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #2 from kugan at gcc dot gnu.org --- Thanks. I see the following in the log: test.cpp:33:53: missed: not vectorized: relevant stmt not supported: _54 = .MASK_LOAD (_53, 32B, _171); test.cpp:22:19: missed: bad operation or unsupport

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #3 from kugan at gcc dot gnu.org --- For SVE mode in vect_analyze_loop_2, we have (gdb) p min_vf $15 = {coeffs = {4, 4}} (gdb) p max_vf $16 = 16 Thus maybe_lt (max_vf, min_vf)) is false. This results in bad data dependence.

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #4 from kugan at gcc dot gnu.org --- This particular loop has loop->safelen set to 16. Does this mean this can never be loop vectorized for VLA?

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 --- Comment #5 from kugan at gcc dot gnu.org --- ddd for the : ref_a: _57 = D.4803[_20]; ref_b: D.4803[_20] = _ifc__174; We get DDR_ARE_DEPENDENT (ddr) == chrec_dont_know. Hence apply_safelen ().

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 kugan at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |DUPLICATE Status|

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 kugan at gcc dot gnu.org changed: What|Removed |Added CC||kugan at gcc dot gnu.org ---

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 114653, which changed state. Bug 114653 Summary: Not vectorizing the loop with openmp reduction. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653 What|Removed |Added -

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #9 from kugan at gcc dot gnu.org --- Looking at the options, looks to me that making loop->safelen a poly_in is the way to go. (In reply to Jakub Jelinek from comment #4) > The OpenMP safelen clause argument is a scalar integer, so us

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #10 from kugan at gcc dot gnu.org --- Created attachment 57946 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57946&action=edit patch patch to make loop->safelen a poly_int

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #12 from kugan at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #11) > (In reply to kugan from comment #9) > > Looking at the options, looks to me that making loop->safelen a poly_in is > > the way to go. (In reply to Ja

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #18 from kugan at gcc dot gnu.org --- Also, can we set INT_MAX when there is no explicit safelen specified in OMP. Something like: --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -6975,14 +6975,11 @@ lower_rec_input_clauses (tree clause

[Bug tree-optimization/115383] New: ICE with TCVC_2 build

2024-06-07 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 Bug ID: 115383 Summary: ICE with TCVC_2 build Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8

2024-06-07 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 --- Comment #5 from kugan at gcc dot gnu.org --- (In reply to Richard Biener from comment #4) > Created attachment 58378 [details] > patch > > I'm testing this, but I do not have hardware to test correctness (and qemu > not set up). Thanks. I w

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8

2024-06-07 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 --- Comment #6 from kugan at gcc dot gnu.org --- (In reply to kugan from comment #5) > (In reply to Richard Biener from comment #4) > > Created attachment 58378 [details] > > patch > > > > I'm testing this, but I do not have hardware to test cor

[Bug libgomp/113698] New: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-01-31 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698 Bug ID: 113698 Summary: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug tree-optimization/115450] New: cpu2017 502.gcc runtime miscompute

2024-06-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115450 Bug ID: 115450 Summary: cpu2017 502.gcc runtime miscompute Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimiza

[Bug tree-optimization/115450] [15 Regression] cpu2017 502.gcc runtime miscompute on aarch64 with SVE since r15-1006-gd93353e6423eca

2024-06-16 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115450 --- Comment #2 from kugan at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #1) > >[r15-1006-gd93353e6423eca] Do single-lane SLP discovery for reductions > > > Interesting because PR 115256 bisect it to an earlier patch. I believe

[Bug tree-optimization/116785] [15 Regression] RAJAPerf REDUCE_SUM regresses with r15-792-gf0a02467bbc35a

2024-09-30 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785 --- Comment #14 from kugan at gcc dot gnu.org --- (In reply to Richard Biener from comment #13) > Did it help? Thanks for the quick Fix. This commit brings back most of the regression. Please note that the current trunk seems to be broken for un

[Bug tree-optimization/117050] [15 Regression] ice in vect_build_slp_tree_2

2024-10-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117050 kugan at gcc dot gnu.org changed: What|Removed |Added CC||kugan at gcc dot gnu.org ---

[Bug target/115258] [14 Regression] register swaps for vector perm in some cases after r14-6290

2024-09-17 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115258 kugan at gcc dot gnu.org changed: What|Removed |Added CC||kugan at gcc dot gnu.org ---

[Bug tree-optimization/116785] [15 Regression] RAJAPerf REDUCE_SUM regresses with r15-792-gf0a02467bbc35a

2024-09-24 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785 --- Comment #10 from kugan at gcc dot gnu.org --- Created attachment 59186 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59186&action=edit reduced test (second attempt) Sorry about the test case. Here is another attempt at reducing.

[Bug tree-optimization/116785] New: RAJAPerf REDUCE_SUM regresses with commit f0a02467bbc35a478eb82f5a8a7e8870827b51fc

2024-09-19 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785 Bug ID: 116785 Summary: RAJAPerf REDUCE_SUM regresses with commit f0a02467bbc35a478eb82f5a8a7e8870827b51fc Product: gcc Version: 15.0 Status: UNCONFIRMED Sever

[Bug tree-optimization/116785] RAJAPerf REDUCE_SUM regresses with commit g:f0a02467bbc35a478eb82f5a8a7e8870827b51fc

2024-09-19 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785 --- Comment #2 from kugan at gcc dot gnu.org --- Created attachment 59155 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59155&action=edit creduce reduced file

[Bug tree-optimization/116785] RAJAPerf REDUCE_SUM regresses with commit f0a02467bbc35a478eb82f5a8a7e8870827b51fc

2024-09-19 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116785 --- Comment #1 from kugan at gcc dot gnu.org --- Created attachment 59154 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59154&action=edit preprocessed file

[Bug ipa/117782] template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-27 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 --- Comment #9 from kugan at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #8) > Can you try again now that PR 117350 has actually been pushed? Thanks. This fixes.

[Bug ipa/117782] template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-27 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 kugan at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|-

[Bug ipa/117782] template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-26 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 --- Comment #6 from kugan at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #5) > Specifically see > https://inbox.sourceware.org/gcc-patches/20241031204043.3231740-1-ak@linux. > intel.com/T/#u . > > You need to figure out why need_

[Bug c++/117782] New: template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-25 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 Bug ID: 117782 Summary: template ICE in write_unscoped_name while using autofda bootstrap on aarch64 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: n

[Bug c++/117782] template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-25 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 --- Comment #1 from kugan at gcc dot gnu.org --- Created attachment 59705 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59705&action=edit profile gcov

[Bug c++/117782] template ICE in write_unscoped_name while using autofda bootstrap on aarch64

2024-11-25 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117782 --- Comment #2 from kugan at gcc dot gnu.org --- --- a/gcc/cp/mangle.cc +++ b/gcc/cp/mangle.cc @@ -1194,6 +1194,7 @@ write_unscoped_name (const tree decl) in a local function scope. A lambda can also be mangled in the scope of

[Bug target/118320] New: [aarch64] internal compiler error: Segmentation fault in aarch64-ldp-fusion.cc

2025-01-06 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118320 Bug ID: 118320 Summary: [aarch64] internal compiler error: Segmentation fault in aarch64-ldp-fusion.cc Product: gcc Version: 15.0 Status: UNCONFIRMED Severity:

[Bug gcov-profile/120614] New: 525.x264_r is ~30% slower with AutoFDO

2025-06-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 Bug ID: 120614 Summary: 525.x264_r is ~30% slower with AutoFDO Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-prof

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #3 from kugan at gcc dot gnu.org --- Created attachment 61610 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61610&action=edit x264_pixel_sad_x4_16x16.diff

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #4 from kugan at gcc dot gnu.org --- x264_pixel_sad_x4_16x16.diff is at -O3 without -flto. Function level profiling is same even with -flto. x264_pixel_sad_x4_16x16 total:18508 head:4627 0: 4627 0.1: 0 0.2: 0 0.3: 0 0.4:

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #8 from kugan at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #6) > Also BTW, I think it is useful to do the dumps wth -details-blocks since > that also dumps BB count inconsistencies caused by AutoFDO that are > otherwis

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #7 from kugan at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #6) > Also BTW, I think it is useful to do the dumps wth -details-blocks since > that also dumps BB count inconsistencies caused by AutoFDO that are > otherwis

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-11 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #10 from kugan at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #9) > > > as mentioned by Andrew, it is important to clone and also resolve indirect > > > calls. Those auto-FDO 0 may prevent it from happening. > > > It is

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-12 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #11 from kugan at gcc dot gnu.org --- This specific ICE seems to be fixed with e416c8097fc87513e05c2d104c63488f733758c0 Thanks for the fix. I am now seeing one in: x264_src/common/mc.c: In function 'mc_weight_w16.part.0': x264_src/c

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-12 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #12 from kugan at gcc dot gnu.org --- (In reply to kugan from comment #11) > This specific ICE seems to be fixed with > e416c8097fc87513e05c2d104c63488f733758c0 > Thanks for the fix. > > I am now seeing one in: > > x264_src/common/m

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-13 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #17 from kugan at gcc dot gnu.org --- fotonik3d_r regresses -20% compared to base (no PGO). Base perf 33.19% fotonik3d_r_pea fotonik3d_r_peak.mytest-64 [.] leapfrog_.constprop.0 23.76% fotonik3d_r_pea fotonik3d_r_peak.my

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-13 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #16 from kugan at gcc dot gnu.org --- I ran spec2017 again with recent gcc and SPE based autofdo (with local patches to enable SPE based profiling support for autofdo tools). I am seeing following compared PGO: 621.wrf_s -23% 549.fot

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #19 from kugan at gcc dot gnu.org --- I did the spec2017 runs few days ago and the .gcov files looks OK. I can see them with dump_gcov. I am seeing hot/cold blocks switched in __material_mod_MOD_mat_updatee/13 of fotonik3d_r (see the

[Bug ipa/121210] New: IPA Inline pass ICE with AutoFDO

2025-07-21 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121210 Bug ID: 121210 Summary: IPA Inline pass ICE with AutoFDO Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa A

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-22 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #21 from kugan at gcc dot gnu.org --- I looked into 531.deepsjeng_r. For deepsjeng_r we see similar performance for AutoFDO as without it. Still looks like we have a missed opportunity there as srearch() now accounts for higher time i

56 matches

Mail list logo