[Bug target/119298] [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-05-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 119298, which changed state. Bug 119298 Summary: [15/16 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f https://gcc.g

[Bug target/120218] [16 Regression] 8% slowdown of 507.cactuBSSN_r on Intel

2025-05-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120218 --- Comment #2 from Jan Hubicka --- I guess for costing changes, too. Since this is a weekly tester, bisecting would help.

[Bug tree-optimization/120219] [16 Regression] ~11% slowdown of 548.exchange2_r on x86_64 (maybe also on aarch64?) since r16-448-g8335fd561fa823

2025-05-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #5 from Jan Hubicka -

[Bug target/120226] New: 8% regression of exchange2 with -O2 between g:d0571638a6bad932 and g:9b13bea07706a7ca

2025-05-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is visible on both Zen and Intel testers https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0

[Bug ipa/120099] [16 regression] gfortran.dg/specifics_1.f90 FAILs since r16-372-g064cac730f88dc

2025-05-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120099 --- Comment #4 from Jan Hubicka --- This patch enables more inlining, so I guess it is previously latent problem triggered by inliner...

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #9 from Jan Hubicka --- Forgot to say, -fno-optimize-sibbling-calls re-enables the cloning & inline.

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 --- Comment #8 from Jan Hubicka --- The difference is that tailr1 pass now turns recursion into loop. GCC15 does: Basic block 11 has extra exit edges Basic block 33 has extra exit edges Basic block 28 has extra exit edges Basic block 23 has ex

[Bug ipa/120120] [16 Regression] gcc-16: performance regression with -O3 compared to gcc-15 since r16-170-ga670ebde399548

2025-05-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-06 Ever confirmed|0

[Bug tree-optimization/120069] [16 Regression] Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df126

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-05-03 Ever confirmed|0

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-05-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #6 from Jan Hubicka --- Sadly this did not fix the whole regression. The problem is that after my change to enable ipa-cp to clone over cold edges we clone GetVirtualPixelsFromNexus twice (as constprop.0 and constprop.1). This func

[Bug target/120069] New: Yes another imagick -march=native -flto -Ofast + PGO regression between g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and g:55b01e17c793688a2878fa43a76df1266153b438

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
:55b01e17c793688a2878fa43a76df1266153b438 Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone

[Bug tree-optimization/120065] [14/15/16 Regression] profile info corrupted by dom2

2025-05-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120065 --- Comment #3 from Jan Hubicka --- while (n > 0 && a) ; This is an odd loop which loops iterates 0 times or infinitely many times. We do not pattern match that at profile-estimate time (since such code is kind of useless) and we guess i

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e (interaction of rpad and late-combine)

2025-04-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 Jan Hubicka changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org S

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #3 from Jan Hubicka --- Reverting the change of size_costs solves the regression, so it is about differences in optimization of cold code. I will try to track down what causes that.

[Bug target/119900] [16 regression] imagick slowdown with -Ofast -march=native -fprofile-use since r16-39-gf6859fb621179e

2025-04-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900 --- Comment #2 from Jan Hubicka --- aha, I mistakely added analysis to PR105275. One problem I noticed was wrong costing of FP scalar min/max which is fixed now but does not affect imgick. Interesting is that we now vectorized same loops and BBs

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #5 from Jan Hubicka --- This is MorphologyApply MagickExport Image *MorphologyApply(const Image *image, const ChannelType channel,const MorphologyMethod method, const ssize_t iterations, const KernelInfo *kernel, const Com

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 --- Comment #4 from Jan Hubicka --- With -fprofile-use we get Evaluating opportunities for MorphologyApply/3266. - considering value 134217719 for param #1 const ChannelType (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 427

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #9 from Jan Hubicka --- The only vectorization difference is: +imagick_r.ltrans8.ltrans.189t.slp1:magick/distort.c:1911:18: optimized: basic block part vectorized using 16 byte vectors +imagick_r.ltrans8.ltrans.189t.slp1:magick/dist

[Bug target/105275] [12/13/14/15/16 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/119924] [16 Regression] ICE when building 531.deepsjeng_r during ipa-cp since r16-101-g132d01d96ea9d6

2025-04-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119924 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #6 from Jan Hubicka --- Exchange2 regression is solved and tonto seem to be noise (performance is back today w/o change of a checksum of the text segment). still we account one extra setcc and misaccount scatter, so lets keep this t

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 Jan Hubicka changed: What|Removed |Added Depends on||119902 --- Comment #3 from Jan Hubicka -

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|ASSIGNED Last reconfirmed||2025-04-24 Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #2 from Jan Hubicka --- This is with -O2 only. Difference is +++ bbb 2025-04-24 16:21:25.029155295 +0200 @@ -108,10 +108,7

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #5 from Jan Hubicka --- as g:132d01d96ea9d617aaffdd5dfba3284a8958e529 I have committed the patch that enables ipa-cp to clone over edges which are !maybe_hot_p(). This improves x264 with FDO by 7.8% and exchange by 3.3% It causes qu

[Bug target/119919] 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919 --- Comment #1 from Jan Hubicka --- There is also 4% tonto regression in Intel in the same range it seems https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0

[Bug target/119919] New: 7% exchange2 regression between g:6390fc86995fbd5239497cb9e1797a3af51d3936 and g:f72a2d221539cede358f2487b94bc370c6fc44b5

2025-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- this reproduces both on Zen and Intel: https

[Bug tree-optimization/119902] New: open-coded scatter/gather should not account vec_to_scalar cost

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681555.html in loop > void foo (int n, int *

[Bug target/119900] New: regression if imagick with -Ofast -march=native -fprofile-use between g:b986ed16c2546674 and g:e1098c7b08d9e601

2025-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- this seems to reproduce on Intel (119%) https://lnt.opensuse.org

[Bug target/119879] [16 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c since r16-39

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #2 from Jan Hubicka --- Created attachment 61166 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit Fix I am testing The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with vec_promote_dem

[Bug target/119879] [r16-39 Regression] FAIL: gcc.target/i386/avx512fp16-trunc-extendvnhf.c

2025-04-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879 --- Comment #1 from Jan Hubicka --- The problem is in: /* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end up doing two conversions and packing them. */ if (!scalar_p && inner_size > outer_size) { i

[Bug target/119876] New: suboptimal code for avx512 conditinal move

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- int a[1000]; int b[1000]; int c[1000]; int d[1000]; void test() { for (int i = 0; i < 1000; i++) a[i] = b[i] > 0 ? c[i] + 1 : c[i] + 2;

[Bug tree-optimization/119875] New: loop with floating point conditional move not vectorized without -ffast-math

2025-04-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- double a[1000]; double b[1000]; double c[1000]; double d[1000]; void test() { for (int i = 0; i

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #47 from Jan Hubicka --- Created attachment 61134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit patch w/o forgotten debug output

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #46 from Jan Hubicka --- Created attachment 61133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit updated patch The problem in previous patch was that ipa-prop streams 0 to the end of block of summary section

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #44 from Jan Hubicka --- Summaries are duplicated when clone is created. Let me debug why it gets lost here.

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #37 from Jan Hubicka --- Created attachment 61128 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit updated patch (regtests and bootstraps) Updated patch. Streaming summaries seems to work and fixes the testcase

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #36 from Jan Hubicka --- Created attachment 61127 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit patch (untested)

[Bug tree-optimization/119614] [15 regression] protobuf-29.4 fails to build with -O2 (error: cannot tail-call: call and return value are different)

2025-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614 --- Comment #34 from Jan Hubicka --- I there is only problem that ipa_return_value_sum value sum does not survive from compile time to WPA then we only need to add streaming code for it. This should be straightforward and there is no need to add

[Bug target/105275] [12/13/14/15 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #6 from Jan Hubicka --- as discussed in PR111551 the SPEC train run does not include hottest loop of imagick (in ref loop), so we optimize it for size (in particular disable vectorization) and get poor performance

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #7 from Jan Hubicka --- Details are in PR111551

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #6 from Jan Hubicka --- The problem is that the internal loop in hottest function changes between train and ref run (train run uses different variant of the loop). This disables vectorization of the loop believed to be cold causing -

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #15 from Jan Hubicka --- I made sily stand-alone test: long test[4]; __attribute__ ((noipa)) void foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d) { test[0]=a; test[1]=b; test[2]=c;

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #14 from Jan Hubicka --- > > I am OK with using addss cost of 3 for trunk&release branches and make this > > more precise next stage1. > > That's what we use now? But I still don't understand why exactly > 538.imagick_r regresses

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #12 from Jan Hubicka --- > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP > difference. Yep, I know. With that patch I mostly wanted to limit redundancy of the tables. The int/Fp difference was mostly based

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

2025-04-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298 --- Comment #7 from Jan Hubicka --- Hmm, the sequence does not use + at all, but I think I know what is going on. While the field is called addss it is used as an kitchen sink for all other simple operations. /* pmuludq under sse2, pmuld

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #4 from Jan Hubicka --- Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto -Ofast -march=native + PGO (peak) on znver3 Estimated Estimated Base

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 --- Comment #3 from Jan Hubicka --- With speculation_useful_p we now are able to constant propagate stride into mc_chroma with PGO, but it does not help runtime. https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html solves the costi

[Bug libstdc++/119606] [15 regression] Commit 'Optimize string constructor' causes regression in Snappy workload for -mcpu=neoverse-v2 with LTO

2025-04-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug target/119565] New: 13-17% regression of botan CAS128 and DES on zen4

2025-04-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is visible on: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.676.1 https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=553.675.1 https

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #5 from Jan Hubicka --- Thinking of it more, I think enabling memory alternatives in (define_insn "sse4_1_v4hiv4si2" [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") (any_extend:V4SI (vec_select:V4HI (m

[Bug target/119368] immintrin code running slower with gcc than clang

2025-03-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368 --- Comment #2 from Jan Hubicka --- On this combiner fails to match: Failed to match this instruction: (set (subreg:V4SI (reg:V2DI 101 [ ]) 0) (sign_extend:V4SI (vec_select:V4HI (mem:V8HI (reg:DI 106) [0 *x_3(D)+0 S16 A128]) (p

[Bug target/119368] New: immintrin code running slower with gcc than clang

2025-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- as mentioned in https://www.root.cz/clanky/instrukcni-sady-simd-a-automaticke-vektorizace-provadene-prekladacem-gcc/nazory/#newIndex1 the following code runs faster

[Bug ipa/119312] Constant array not allocated in read-only segment

2025-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312 --- Comment #13 from Jan Hubicka --- And forgot to write. In case of strcmp I think we can use fnspec info we already have at the time constructing callgraph to represent it as a read rather than taking address. This would make things go bit sm

[Bug ipa/119312] Constant array not allocated in read-only segment

2025-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119312 --- Comment #12 from Jan Hubicka --- Indeed at IPA level we track if address of a symbol is taken, but we do not keep any extra info about how it may be used. It would be useful to track 1) if address is used only to read (to figure out readon

[Bug ipa/119147] 525.x264_r is approx. 10% slower with LTO+PGO than without (at -Ofast -march-native)

2025-03-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2025-03-13 Ever confirmed|0

[Bug c++/118924] [12/13/14/15 regression] Wrong code at -O2 and above leading to uninitialized accesses on aarch64-linux-gnu since r10-917-g3b47da42de621c

2025-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118924 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug middle-end/119147] New: 525.x264_r is approx. slower with LTO+PGO than without (at -Ofast -march-native)

2025-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This seems to be at least partly caused by fact that ipa-cp does not clone function with no hot calls. This

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 --- Comment #4 from Jan Hubicka --- >From gcov dump, the normal train run exercises loop: 742632: 2953: switch ( method ) { 742632: 2954:case ConvolveMorphology: -: 2955:/* Weighted Average of pixels using r

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 --- Comment #3 from Jan Hubicka --- With LTO the situation seems pretty much the same 21.23% imagick_r_peak. imagick_r_peak.trunk-pgolto-Ofast-native-m64 [.] MorphologyApply.cold 14.30% imagick_r_peak. imagick_r_peak.trunk-nop

[Bug middle-end/111551] Fix for PR106081 is not working with profile feedback on imagemagick

2025-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111551 Jan Hubicka changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug middle-end/119033] [13/14/15 regression] Unsafe FRE of pointer assignment since r13-469-g9a53101caadae1

2025-02-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119033 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug ipa/119006] [12/13/14/15 Regression] ICF merging pointer to array types which don't have the same bounds since r11-5181-g0862d007b564ec

2025-02-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119006 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug middle-end/119033] New: Unsafe FRE of pointer assignment

2025-02-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- The following (artificial) testcase (pointed to me by Filip Hejsek) is miscompiled at -O2, since we miss the fact that test3(3) overwrites b[0]. #include struct foo

[Bug target/119010] [15 Regression] 444.namd shows a huge compile-time regression with -mtune=znver5

2025-02-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #5 from Jan Hubicka --- I had patch to reduce max issue back to 4 (exactly because of the compile time slowdown and because the way we model decoder we own't issue more than 4 instructions). Seems I forgot to push it, wi

[Bug ipa/118318] [15 regression] ICE when building firefox-134.0 with PGO

2025-02-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318 --- Comment #13 from Jan Hubicka --- Thanks for running this through debugger Breakpoint 2.2, profile_count::operator+= (this=0x76e7e888, other=...) at /usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/profile-count.h:932 932

[Bug tree-optimization/118527] When a loop is unlooped due to sccvn, its profile is not updated

2025-01-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118527 --- Comment #3 from Jan Hubicka --- The reason why I did not implement profile fixups to cfgcleanup is that you can not really fix the profile without knowing why it became inconsistent. Consider situation where we have function foo (int a) {

[Bug ipa/118318] ICE when building firefox-134.0 with PGO and LTO

2025-01-07 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318 --- Comment #6 from Jan Hubicka --- Some profile inconsistencies are expected unless you use atomic counters since Firefox uses threads. Do you know why compatible_p returns false? It looks like mixing IPA and function local profiles together..

[Bug tree-optimization/90345] too pessimistic check whether pointer may alias a local variable

2024-12-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
||hubicka at gcc dot gnu.org --- Comment #5 from Jan Hubicka --- When push_back is visible to compiler as in suggested modified testcase: #include #include void push_back(uint32_t const&) __attribute__((noinline)); struct big_integer { void push_back(uint32_t c

[Bug tree-optimization/80641] missed optimization with with std::vector resize in loop

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80641 --- Comment #18 from Jan Hubicka --- With -O3 we now get: int main () { [local count: 114863531]: return 0; } -O2 offlines destructors which prevents us from optimizing away new() int main () { void * D.27676; int * c$_M_finish; int

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #6 from Jan Hubicka --- Patch to optimize operator[] to be again branchless posted https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672286.html Main problem with auto-generating bt is that it needs change of conditional from C

[Bug tree-optimization/26388] Variable sized storage allocation should be promoted to stack allocation

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26388 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug tree-optimization/117638] No loop splitting and bounds check not optimized out with -D_GLIBCXX_ASSERTIONS

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117638 --- Comment #4 from Jan Hubicka --- Both with assertions or without we offline _M_default_append which would be better inlined. It is because main is known to be called once. One difference is that non-assertion clobbers the vectors prior const

[Bug c++/86276] Poor codegen when returning a std::vector

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86276 --- Comment #2 from Jan Hubicka --- With -O3 we now do quite well. _Z4goodv: .LFB1248: .cfi_startproc ret .cfi_endproc .LFE1248: .size _Z4goodv, .-_Z4goodv .p2align 4 .globl _Z3badv .typ

[Bug tree-optimization/117639] Modified loop-split-1.C doesn't recognise non-escaping std::vector

2024-12-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117639 --- Comment #3 from Jan Hubicka --- With -O3 -std=c++20 https://godbolt.org/z/3WKnn8rax we inline but still get stuck on loop calling log and modifying errno. Without -std=c++20 we reach --param max-inline-insns-auto. We need --param max-inlin

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #5 from Jan Hubicka --- Combine constructs: (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:DI (mem:DI (plus:DI (mult:DI (reg:DI 111 [ _8 ]) (const_int 8 [0x8])) (reg/f:DI 112 [ v_2(

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #4 from Jan Hubicka --- Bit_reference constructor takes mask and not bit position. _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR reference operator[](size_type __n) { __glibcxx_requires_subscript(__n);

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 --- Comment #3 from Jan Hubicka --- OK, so the horrid codegen is because bvector's [] operator is imlemented using iterator: return begin()[__n]; iterator's [] operator is implemented using: _GLIBCXX20_CONSTEXPR void _M_incr(ptrdif

[Bug target/80813] [12/13/14/15 Regression] x86: std::vector::operator[] could be somewhat faster using BT instead of SHL

2024-12-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80813 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Summary|x86:

[Bug tree-optimization/109440] Missed optimization of vector::at when a function is called inside the loop

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440 --- Comment #3 from Jan Hubicka --- I believe that since v is constructed and passed by invisible refernece in the caller, we would need to know constructors of std::vector and prove that they do not make &v to escape to global memory, so foo ca

[Bug libstdc++/90436] Redundant size checking in vector

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #14 from Jan Hubicka --- Jonathan, is there some problem with your patch?

[Bug ipa/110378] IPA-SRA for destructors

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378 --- Comment #10 from Jan Hubicka --- Martin, I think this is fixed?

[Bug middle-end/109849] suboptimal code for vector walking loop

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 Bug 109849 depends on bug 110287, which changed state. Bug 110287 Summary: _M_check_len is expensive https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 What|Removed |Added -

[Bug libstdc++/110287] _M_check_len is expensive

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/118130] New: std::vector code quality issues

2024-12-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- meta-bug for code quality problems of std::vector. I made statistic of use of std::vector in clang binary counting number of occurrences of abstract instances of these functions in debug

[Bug c++/97094] Compiling big std::unordered_map became slower

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
||2024-12-15 Status|UNCONFIRMED |NEW CC||hubicka at gcc dot gnu.org --- Comment #4 from Jan Hubicka --- We seems to spend most of time sorting PHI edges #0 0x0231f835 in mergesort (in=, c=0x7fffd530, n

[Bug tree-optimization/86701] Optimize strlen called on std::string c_str()

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86701 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #5

[Bug libstdc++/60621] std::vector::emplace_back generates massively more code than push_back

2024-12-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|NEW CC||hubicka at gcc dot gnu.org, ||mjambor at suse dot cz Last reconfirmed||2024-12-15 See Also||https://gcc.gnu.org/bugzill

[Bug tree-optimization/117924] unused std::vector are not optimized out fully at gimple level

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
|1 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #4 from Jan Hubicka --- patch posted here https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671137.html

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502 --- Comment #16 from Jan Hubicka --- https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html optimizes the string constructors. Having strlen pass catching more cases would be nice, too.

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #15 from Jan Hubicka --- Original testcase is solved by https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html We still won't optimize longer strings because _M_create is not inline.

[Bug c++/103827] function which takes an argument via (hidden) reference should assume the argument does not escape or is only read from

2024-12-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103827 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug c++/94960] extern template prevents inlining of standard library objects

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94960 --- Comment #10 from Jan Hubicka --- Note that passing function body to middle-end does not only enable inlining, but other optimizations too. Often ipa-modref is able to summarize side effects of the function and enables more optimization, since

[Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 --- Comment #35 from Jan Hubicka --- On #include bool test1(const std::vector& in) { return in == std::vector{42}; } we produce: bool test1 (const struct vector & in) { bool _12; int * _13; int * _14; long int _24; unsigned in

[Bug ipa/93921] -Os generates much bigger code than -O{1,2,3,fast} for std::string::size

2024-12-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93921 Jan Hubicka changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug libstdc++/80331] unused const std::string not optimized away

2024-12-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #14 from Jan Hubicka --- Declaring _S_create and _M_create inline indeed helps a little: diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 17b973c8b45..d73a61abe5b 100644 --- a/lib

  1   2   3   4   5   6   7   8   9   10   >