[Bug debug/121093] Missed location of inlined function

2025-07-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093 --- Comment #5 from Jan Hubicka --- Just for bit more context, LlVM doesn't have an equivalent of debug markers and compiles p3 as: p3: # @p3 .Lfunc_begin0: .file 0 "/home/jh" "e.c" md5 0x8a15ab558b

[Bug debug/121093] Missed location of inlined function

2025-07-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093 --- Comment #4 from Jan Hubicka --- > in the end I'm not sure what's "wrong" here and why you think you are missing p2 - p2 is not executed, you shouldn't get any profile on it. Seems we kind of disagree on how "executed" is defined. If you com

[Bug debug/121093] Missed location of inlined function

2025-07-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093 --- Comment #2 from Jan Hubicka --- Created attachment 61957 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61957&action=edit patch to autofdo for multiple source locations per single instruction This is patch which makes the autofdo tool

[Bug ipa/121210] IPA Inline pass ICE with AutoFDO

2025-07-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121210 --- Comment #1 from Jan Hubicka --- The problem here is that the funciton is inlined into function with guessed profile while it has AFDO profile. Inline scaling should change "globally 0 auto FDO" to "guessed" but it did not. I guess it is bu

[Bug gcov-profile/121123] [16 regression] some gcc.misc-tests/gcov-*.c fail starting with r16-2197-g385d9937f0e23c

2025-07-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121123 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug bootstrap/121038] autoprofiledbootstrap is broken in few ways

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038 --- Comment #2 from Jan Hubicka --- I experimented with smaller sampling period and indeed create_gcov then runs out of memory. On my setup create_gcov was simply segfaulting and produced just partial profile. Since Makefile does not fail on cr

[Bug debug/121093] New: Missed location of inlined function

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093 Bug ID: 121093 Summary: Missed location of inlined function Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug

[Bug gcov-profile/121074] [16 Regression] ICE: in gcov_open, at gcov-io.cc:128 with -ftest-coverage -fauto-profile

2025-07-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121074 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0

[Bug bootstrap/121038] New: autoprofiledbootstrap is broken in few ways

2025-07-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038 Bug ID: 121038 Summary: autoprofiledbootstrap is broken in few ways Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: boot

[Bug tree-optimization/119876] suboptimal code for avx512 conditional move

2025-07-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 --- Comment #6 from Jan Hubicka --- Aha, I was looking into scalar-to-vector improvements promoting scalar integer + 1 to vector on AMD CPUs.

[Bug tree-optimization/119876] suboptimal code for avx512 conditional move

2025-07-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 --- Comment #5 from Jan Hubicka --- I think I made the testcase while working on something else that I forgot, sorry :)

[Bug gcov-profile/120229] [GCOV] AutoFDO cannot distinguish privatized functions within an LTO partition

2025-07-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229 --- Comment #2 from Jan Hubicka --- See thread https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689018.html

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #9 from Jan Hubicka --- Created attachment 61818 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61818&action=edit create_gcov path

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #8 from Jan Hubicka --- Patching create_gcov to account all of debug statements associated with a given address instead of just the last one gets me: test total:4350509 head:8642 1: 4484 // { 2: 4484 // for ( 3: 4484

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #3 from Jan Hubicka --- There is also 3% performance regressions that got lost on transition to ne PR https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #2 from Jan Hubicka --- This is likely ipa-cp heuristics issue which decides to clone now but after all the benefits are not really visible.

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #7 from Jan Hubicka --- LLVM also gets execution counts wrong, just the different (and less harmful) way: test:270773509:9780 1: 9116 2: 51984 for ( 4: 51984 iThis Inner Loop Header: Depth=1 .loc0 10 15

[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug 120867 depends on bug 104457, which changed state. Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457 What|Remo

[Bug ipa/104457] ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457 Jan Hubicka changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #10 from Jan Hubicka --- https://github.com/google/autofdo/issues/248

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug 120867 depends on bug 120938, which changed state. Bug 120938 Summary: discriminators are not useful in statements doing multiple calls https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 What|Removed |Add

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #8 from Jan Hubicka --- Porlbem goes away with diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index d1a55dbcbcb..52ca189531e 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -25012,9 +25012,8 @@ add_call_src_coords_attribute

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #7 from Jan Hubicka --- Looking at the diff there seems to few changes: - # d.C:16:2 - .loc 1 16 2 is_stmt 1 view .LVU16 + # d.C:15:8 + .loc 1 15 8 is_stmt 1 discriminator 1 view .LVU16 This is a line table

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #6 from Jan Hubicka --- Created attachment 61795 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61795&action=edit Diff

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #5 from Jan Hubicka --- Created attachment 61794 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61794&action=edit bad assembly

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #4 from Jan Hubicka --- Created attachment 61793 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61793&action=edit good assembly

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #3 from Jan Hubicka --- Even smaller set of example. Bad profile: #include volatile int variablev; static void inc() { variablev++; } static int zero = 0; int main () { for (int i = 0; i < 1; i++)

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #2 from Jan Hubicka --- This is even smaller testcase #include volatile int variablev; static void inc(int a) { variablev++; } inline int inline_me (int l) { for (int i = 0; i < 1; i++) {inc(1);inc(

[Bug debug/120938] discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 --- Comment #1 from Jan Hubicka --- Removing the parameter of inc makes the problem to go away. So does removing the recursion #include volatile int variablev; static int dead () { return 0; } static void inc() { variablev++; }

[Bug debug/120938] New: discriminators are not useful in statements doing multiple calls

2025-07-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938 Bug ID: 120938 Summary: discriminators are not useful in statements doing multiple calls Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/120916] debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #3 from Jan Hubicka --- Well, PR32445 is about us not being able to vartrack value of I. I think that may be fixed since then by adding corresponding debug binds. However here we are missing info about statement being executed...

[Bug driver/120916] debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #1 from Jan Hubicka --- Here is variant for gcov tool: jh@shroud:/tmp> cat tt.c int s = 1023; int a[1024]; __attribute__ ((weak)) void test() { for ( int i = 0; /* Line 7, relative 3 */ i < s;

[Bug driver/120916] New: debug info for IV increment is lost

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 Bug ID: 120916 Summary: debug info for IV increment is lost Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-07-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #15 from Jan Hubicka --- https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=68430&plot.0=1370.377.0&plot.1=1288.377.0 compares AFDO to no profile feedback

[Bug lto/66229] LTO fails with -fauto-profile on mcf

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229 Jan Hubicka changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #11 from Jan Hubicka --- *** Bug 86404 has been marked as a duplicate of this bug. ***

[Bug testsuite/86404] UNRESOLVED/UNSUPPORTED gcov test results due to Permission error mapping pages

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org Resolut

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 Jan Hubicka changed: What|Removed |Added Blocks||120867 CC|

[Bug gcov-profile/120229] [GCOV] AutoFDO cannot distinguish privatized functions within an LTO partition

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229 Jan Hubicka changed: What|Removed |Added Blocks||120867 Ever confirmed|0

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-06-29 Blocks|

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/120867] New: [metabug] AutoFDO issues

2025-06-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug ID: 120867 Summary: [metabug] AutoFDO issues Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization

[Bug tree-optimization/120752] 5% slowdown of 525.x264_r since r16-1346-gb0d50cbb42ab2c

2025-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120752 --- Comment #4 from Jan Hubicka --- Hmm, there seems to be no big differences in IPA decisions between the runs, so further investigation is necessary :( The patch attempts to preserve more of profile and here profile is bit counter-productive

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #9 from Jan Hubicka --- I am happy it helps. I wonder if you can share details of your SPEC config. I.e. how you call perf (do you specify count etc) and how you handle merging of profiles. We now have regular tester (on AMD hardwa

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #6 from Jan Hubicka --- Also BTW, I think it is useful to do the dumps wth -details-blocks since that also dumps BB count inconsistencies caused by AutoFDO that are otherwise hard to spot. In ipa-cp dump it should be visible if cons

[Bug middle-end/120614] 525.x264_r is ~30% slower with AutoFDO

2025-06-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614 --- Comment #5 from Jan Hubicka --- Note that on x86-64 I get OK scores on x264. This compares no-FDO -Ofast -flto -march=native to autoFDO. I hacked the scripts to use ref run for training so it is longer: 500.perlbench_r 1158

[Bug target/119298] [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 119298, which changed state. Bug 119298 Summary: [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f https://gcc.g

[Bug target/120218] [16 Regression] 8% slowdown of 507.cactuBSSN_r on Intel

2025-05-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120218 --- Comment #2 from Jan Hubicka --- I guess for costing changes, too. Since this is a weekly tester, bisecting would help.

[Bug tree-optimization/120219] [16 Regression] ~11% slowdown of 548.exchange2_r on x86_64 (maybe also on aarch64?) since r16-448-g8335fd561fa823

2025-05-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #5 from Jan Hubicka -

[Bug target/120226] New: 8% regression of exchange2 with -O2 between g:d0571638a6bad932 and g:9b13bea07706a7ca

2025-05-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120226 Bug ID: 120226 Summary: 8% regression of exchange2 with -O2 between g:d0571638a6bad932 and g:9b13bea07706a7ca Product: gcc Version: 16.0 Status: UNCONFIRMED Se

[Bug ipa/120099] [16 regression] gfortran.dg/specifics_1.f90 FAILs since r16-372-g064cac730f88dc

2025-05-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120099 --- Comment #4 from Jan Hubicka --- This patch enables more inlining, so I guess it is previously latent problem triggered by inliner...

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #9 from Jan Hubicka --- Forgot to say, -fno-optimize-sibbling-calls re-enables the cloning & inline.

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #8 from Jan Hubicka --- The difference is that tailr1 pass now turns recursion into loop. GCC15 does: Basic block 11 has extra exit edges Basic block 33 has extra exit edges Basic block 28 has extra exit edges Basic block 23 has ex

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-06 Ever confirmed|0

[Bug tree-optimization/120069] [16 Regression] Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df126

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-03 Ever confirmed|0

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #6 from Jan Hubicka --- Sadly this did not fix the whole regression. The problem is that after my change to enable ipa-cp to clone over cold edges we clone GetVirtualPixelsFromNexus twice (as constprop.0 and constprop.1). This func

[Bug target/120069] New: Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df1266153b438

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069 Bug ID: 120069 Summary: Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a28

[Bug tree-optimization/120065] [14/15/16 Regression] profile info corrupted by dom2

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120065 --- Comment #3 from Jan Hubicka --- while (n > 0 && a) ; This is an odd loop which loops iterates 0 times or infinitely many times. We do not pattern match that at profile-estimate time (since such code is kind of useless) and we guess i

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-04-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 Jan Hubicka changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org S

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #3 from Jan Hubicka --- Reverting the change of size_costs solves the regression, so it is about differences in optimization of cold code. I will try to track down what causes that.

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #2 from Jan Hubicka --- aha, I mistakely added analysis to PR105275. One problem I noticed was wrong costing of FP scalar min/max which is fixed now but does not affect imgick. Interesting is that we now vectorized same loops and BBs

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #5 from Jan Hubicka --- This is MorphologyApply MagickExport Image *MorphologyApply(const Image *image, const ChannelType channel,const MorphologyMethod method, const ssize_t iterations, const KernelInfo *kernel, const Com

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #4 from Jan Hubicka --- With -fprofile-use we get Evaluating opportunities for MorphologyApply/3266. - considering value 134217719 for param #1 const ChannelType (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 427

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #9 from Jan Hubicka --- The only vectorization difference is: +imagick_r.ltrans8.ltrans.189t.slp1:magick/distort.c:1911:18: optimized: basic block part vectorized using 16 byte vectors +imagick_r.ltrans8.ltrans.189t.slp1:magick/dist

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/119924] [16 Regression] ICE when building 531.deepsjeng_r during ipa-cp since r16-101-g132d01d96ea9d6

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119924 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #6 from Jan Hubicka --- Exchange2 regression is solved and tonto seem to be noise (performance is back today w/o change of a checksum of the text segment). still we account one extra setcc and misaccount scatter, so lets keep this t

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #3 from Jan Hubicka -

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #5 from Jan Hubicka --- as g:132d01d96ea9d617aaffdd5dfba3284a8958e529 I have committed the patch that enables ipa-cp to clone over edges which are !maybe_hot_p(). This improves x264 with FDO by 7.8% and exchange by 3.3% It causes qu

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #1 from Jan Hubicka --- There is also 4% tonto regression in Intel in the same range it seems https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0

[Bug target/119919] New: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Bug ID: 119919 Summary: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5 Product: gcc Ve

[Bug tree-optimization/119902] New: open-coded scatter/gather should not account vec_to_scalar cost

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902 Bug ID: 119902 Summary: open-coded scatter/gather should not account vec_to_scalar cost Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal

[Bug target/119900] New: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 Bug ID: 119900 Summary: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601 Product: gcc Version: 16.

[Bug target/119879] [16 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c since r16-39

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #2 from Jan Hubicka --- Created attachment 61166 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit Fix I am testing The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with vec_promote_dem

[Bug target/119879] [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #1 from Jan Hubicka --- The problem is in: /* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end up doing two conversions and packing them. */ if (!scalar_p && inner_size > outer_size) { i

[Bug target/119876] New: suboptimal code for avx512 conditinal move

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876 Bug ID: 119876 Summary: suboptimal code for avx512 conditinal move Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: ta

[Bug tree-optimization/119875] New: loop with floating point conditional move not vectorized without -ffast-math

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119875 Bug ID: 119875 Summary: loop with floating point conditional move not vectorized without -ffast-math Product: gcc Version: unknown Status: UNCONFIRMED Severity

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #47 from Jan Hubicka --- Created attachment 61134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit patch w/o forgotten debug output

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #46 from Jan Hubicka --- Created attachment 61133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit updated patch The problem in previous patch was that ipa-prop streams 0 to the end of block of summary section

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #44 from Jan Hubicka --- Summaries are duplicated when clone is created. Let me debug why it gets lost here.

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #37 from Jan Hubicka --- Created attachment 61128 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit updated patch (regtests and bootstraps) Updated patch. Streaming summaries seems to work and fixes the testcase

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #36 from Jan Hubicka --- Created attachment 61127 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit patch (untested)

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #34 from Jan Hubicka --- I there is only problem that ipa_return_value_sum value sum does not survive from compile time to WPA then we only need to add streaming code for it. This should be straightforward and there is no need to add

[Bug target/105275] [12/13/14/15 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #6 from Jan Hubicka --- as discussed in PR111551 the SPEC train run does not include hottest loop of imagick (in ref loop), so we optimize it for size (in particular disable vectorization) and get poor performance

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #7 from Jan Hubicka --- Details are in PR111551

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #6 from Jan Hubicka --- The problem is that the internal loop in hottest function changes between train and ref run (train run uses different variant of the loop). This disables vectorization of the loop believed to be cold causing -

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #15 from Jan Hubicka --- I made sily stand-alone test: long test[4]; __attribute__ ((noipa)) void foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d) { test[0]=a; test[1]=b; test[2]=c;

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #14 from Jan Hubicka --- > > I am OK with using addss cost of 3 for trunk&release branches and make this > > more precise next stage1. > > That's what we use now? But I still don't understand why exactly > 538.imagick_r regresses

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #12 from Jan Hubicka --- > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP > difference. Yep, I know. With that patch I mostly wanted to limit redundancy of the tables. The int/Fp difference was mostly based

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #7 from Jan Hubicka --- Hmm, the sequence does not use + at all, but I think I know what is going on. While the field is called addss it is used as an kitchen sink for all other simple operations. /* pmuludq under sse2, pmuld

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #4 from Jan Hubicka --- Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto -Ofast -march=native + PGO (peak) on znver3 Estimated Estimated Base

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #3 from Jan Hubicka --- With speculation_useful_p we now are able to constant propagate stride into mc_chroma with PGO, but it does not help runtime. https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html solves the costi

[Bug libstdc++/119606] [15 regression] Commit 'Optimize string constructor' causes regression in Snappy workload for -mcpu=neoverse-v2 with LTO

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug target/119565] New: 13-17% regression of botan CAS128 and DES on zen4

2025-04-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119565 Bug ID: 119565 Summary: 13-17% regression of botan CAS128 and DES on zen4 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component

  1   2   3   4   5   6   7   8   9   10   >