https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
--- Comment #5 from Jan Hubicka ---
Just for bit more context, LlVM doesn't have an equivalent of debug markers and
compiles p3 as:
p3: # @p3
.Lfunc_begin0:
.file 0 "/home/jh" "e.c" md5 0x8a15ab558b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
--- Comment #4 from Jan Hubicka ---
> in the end I'm not sure what's "wrong" here and why you think you are missing
p2 - p2 is not executed, you shouldn't get any profile on it.
Seems we kind of disagree on how "executed" is defined.
If you com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
--- Comment #2 from Jan Hubicka ---
Created attachment 61957
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61957&action=edit
patch to autofdo for multiple source locations per single instruction
This is patch which makes the autofdo tool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121210
--- Comment #1 from Jan Hubicka ---
The problem here is that the funciton is inlined into function with guessed
profile while it has AFDO profile. Inline scaling should change "globally 0
auto FDO" to "guessed" but it did not. I guess it is bu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121123
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038
--- Comment #2 from Jan Hubicka ---
I experimented with smaller sampling period and indeed create_gcov then runs
out of memory. On my setup create_gcov was simply segfaulting and produced just
partial profile. Since Makefile does not fail on cr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121093
Bug ID: 121093
Summary: Missed location of inlined function
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: debug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121074
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121038
Bug ID: 121038
Summary: autoprofiledbootstrap is broken in few ways
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: boot
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
--- Comment #6 from Jan Hubicka ---
Aha, I was looking into scalar-to-vector improvements promoting scalar integer
+ 1 to vector on AMD CPUs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
--- Comment #5 from Jan Hubicka ---
I think I made the testcase while working on something else that I forgot,
sorry :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229
--- Comment #2 from Jan Hubicka ---
See thread
https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689018.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #9 from Jan Hubicka ---
Created attachment 61818
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61818&action=edit
create_gcov path
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #8 from Jan Hubicka ---
Patching create_gcov to account all of debug statements associated with a given
address instead of just the last one gets me:
test total:4350509 head:8642
1: 4484 // {
2: 4484 // for (
3: 4484
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965
--- Comment #3 from Jan Hubicka ---
There is also 3% performance regressions that got lost on transition to ne PR
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965
--- Comment #2 from Jan Hubicka ---
This is likely ipa-cp heuristics issue which decides to clone now but after all
the benefits are not really visible.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #7 from Jan Hubicka ---
LLVM also gets execution counts wrong, just the different (and less harmful)
way:
test:270773509:9780
1: 9116
2: 51984 for (
4: 51984 iThis Inner Loop Header: Depth=1
.loc0 10 15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug 120867 depends on bug 104457, which changed state.
Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in
update_specialized_profile, at ipa-cp.c:4422
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457
What|Remo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457
Jan Hubicka changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #10 from Jan Hubicka ---
https://github.com/google/autofdo/issues/248
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug 120867 depends on bug 120938, which changed state.
Bug 120938 Summary: discriminators are not useful in statements doing multiple
calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
What|Removed |Add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #8 from Jan Hubicka ---
Porlbem goes away with
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d1a55dbcbcb..52ca189531e 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -25012,9 +25012,8 @@ add_call_src_coords_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #7 from Jan Hubicka ---
Looking at the diff there seems to few changes:
- # d.C:16:2
- .loc 1 16 2 is_stmt 1 view .LVU16
+ # d.C:15:8
+ .loc 1 15 8 is_stmt 1 discriminator 1 view .LVU16
This is a line table
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #6 from Jan Hubicka ---
Created attachment 61795
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61795&action=edit
Diff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #5 from Jan Hubicka ---
Created attachment 61794
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61794&action=edit
bad assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #4 from Jan Hubicka ---
Created attachment 61793
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61793&action=edit
good assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #3 from Jan Hubicka ---
Even smaller set of example. Bad profile:
#include
volatile int variablev;
static void inc()
{
variablev++;
}
static int zero = 0;
int main ()
{
for (int i = 0; i < 1; i++)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #2 from Jan Hubicka ---
This is even smaller testcase
#include
volatile int variablev;
static void inc(int a)
{
variablev++;
}
inline int
inline_me (int l)
{
for (int i = 0; i < 1; i++)
{inc(1);inc(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
--- Comment #1 from Jan Hubicka ---
Removing the parameter of inc makes the problem to go away. So does removing
the recursion
#include
volatile int variablev;
static int dead ()
{
return 0;
}
static void inc()
{
variablev++;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120938
Bug ID: 120938
Summary: discriminators are not useful in statements doing
multiple calls
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #3 from Jan Hubicka ---
Well, PR32445 is about us not being able to vartrack value of I. I think that
may be fixed since then by adding corresponding debug binds.
However here we are missing info about statement being executed...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
--- Comment #1 from Jan Hubicka ---
Here is variant for gcov tool:
jh@shroud:/tmp> cat tt.c
int s = 1023;
int a[1024];
__attribute__ ((weak))
void test()
{
for (
int i = 0; /* Line 7, relative 3 */
i < s;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916
Bug ID: 120916
Summary: debug info for IV increment is lost
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: driver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #15 from Jan Hubicka ---
https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=68430&plot.0=1370.377.0&plot.1=1288.377.0
compares AFDO to no profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229
Jan Hubicka changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684
--- Comment #11 from Jan Hubicka ---
*** Bug 86404 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86404
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
Resolut
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684
Jan Hubicka changed:
What|Removed |Added
Blocks||120867
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229
Jan Hubicka changed:
What|Removed |Added
Blocks||120867
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-06-29
Blocks|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug ID: 120867
Summary: [metabug] AutoFDO issues
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120752
--- Comment #4 from Jan Hubicka ---
Hmm,
there seems to be no big differences in IPA decisions between the runs, so
further investigation is necessary :(
The patch attempts to preserve more of profile and here profile is bit
counter-productive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551
--- Comment #9 from Jan Hubicka ---
I am happy it helps. I wonder if you can share details of your SPEC config.
I.e. how you call perf (do you specify count etc) and how you handle merging of
profiles.
We now have regular tester (on AMD hardwa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #6 from Jan Hubicka ---
Also BTW, I think it is useful to do the dumps wth -details-blocks since that
also dumps BB count inconsistencies caused by AutoFDO that are otherwise hard
to spot.
In ipa-cp dump it should be visible if cons
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120614
--- Comment #5 from Jan Hubicka ---
Note that on x86-64 I get OK scores on x264. This compares no-FDO -Ofast -flto
-march=native to autoFDO. I hacked the scripts to use ref run for training so
it is longer:
500.perlbench_r 1158
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 119298, which changed state.
Bug 119298 Summary: [15/16 Regression] 538.imagick_r is faster when compiled
with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since
r15-3441-g4292297a0f938f
https://gcc.g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120218
--- Comment #2 from Jan Hubicka ---
I guess for costing changes, too. Since this is a weekly tester, bisecting
would help.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219
Jan Hubicka changed:
What|Removed |Added
Depends on||119902
--- Comment #5 from Jan Hubicka -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120226
Bug ID: 120226
Summary: 8% regression of exchange2 with -O2 between
g:d0571638a6bad932 and g:9b13bea07706a7ca
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120099
--- Comment #4 from Jan Hubicka ---
This patch enables more inlining, so I guess it is previously latent problem
triggered by inliner...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
--- Comment #9 from Jan Hubicka ---
Forgot to say, -fno-optimize-sibbling-calls re-enables the cloning & inline.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
--- Comment #8 from Jan Hubicka ---
The difference is that tailr1 pass now turns recursion into loop.
GCC15 does:
Basic block 11 has extra exit edges
Basic block 33 has extra exit edges
Basic block 28 has extra exit edges
Basic block 23 has ex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-05-06
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069
Jan Hubicka changed:
What|Removed |Added
Last reconfirmed||2025-05-03
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #6 from Jan Hubicka ---
Sadly this did not fix the whole regression. The problem is that after my
change to enable ipa-cp to clone over cold edges we clone
GetVirtualPixelsFromNexus twice (as constprop.0 and constprop.1). This
func
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120069
Bug ID: 120069
Summary: Yes another imagick -march=native -flto -Ofast + PGO
regression between
g:1c0cbc1b300e08df5ebfce00a7195890d78f2064 and
g:55b01e17c793688a28
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120065
--- Comment #3 from Jan Hubicka ---
while (n > 0 && a)
;
This is an odd loop which loops iterates 0 times or infinitely many times.
We do not pattern match that at profile-estimate time (since such code is kind
of useless) and we guess i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
Jan Hubicka changed:
What|Removed |Added
CC||rsandifo at gcc dot gnu.org
S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #3 from Jan Hubicka ---
Reverting the change of size_costs solves the regression, so it is about
differences in optimization of cold code. I will try to track down what causes
that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
--- Comment #2 from Jan Hubicka ---
aha, I mistakely added analysis to PR105275. One problem I noticed was wrong
costing of FP scalar min/max which is fixed now but does not affect imgick.
Interesting is that we now vectorized same loops and BBs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #5 from Jan Hubicka ---
This is MorphologyApply
MagickExport Image *MorphologyApply(const Image *image, const ChannelType
channel,const MorphologyMethod method, const ssize_t iterations,
const KernelInfo *kernel, const Com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734
--- Comment #4 from Jan Hubicka ---
With -fprofile-use we get
Evaluating opportunities for MorphologyApply/3266.
- considering value 134217719 for param #1 const ChannelType (caller_count: 3)
good_cloning_opportunity_p (time: 1, size: 427
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
--- Comment #9 from Jan Hubicka ---
The only vectorization difference is:
+imagick_r.ltrans8.ltrans.189t.slp1:magick/distort.c:1911:18: optimized: basic
block part vectorized using 16 byte vectors
+imagick_r.ltrans8.ltrans.189t.slp1:magick/dist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119924
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
--- Comment #6 from Jan Hubicka ---
Exchange2 regression is solved and tonto seem to be noise (performance is back
today w/o change of a checksum of the text segment).
still we account one extra setcc and misaccount scatter, so lets keep this t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
Jan Hubicka changed:
What|Removed |Added
Depends on||119902
--- Comment #3 from Jan Hubicka -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
Jan Hubicka changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #5 from Jan Hubicka ---
as g:132d01d96ea9d617aaffdd5dfba3284a8958e529 I have committed the patch that
enables ipa-cp to clone over edges which are !maybe_hot_p().
This improves x264 with FDO by 7.8% and exchange by 3.3%
It causes qu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
--- Comment #1 from Jan Hubicka ---
There is also 4% tonto regression in Intel in the same range it seems
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.230.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119919
Bug ID: 119919
Summary: 7% exchange2 regression between
g:6390fc86995fbd5239497cb9e1797a3af51d3936 and
g:f72a2d221539cede358f2487b94bc370c6fc44b5
Product: gcc
Ve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119902
Bug ID: 119902
Summary: open-coded scatter/gather should not account
vec_to_scalar cost
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
Bug ID: 119900
Summary: regression if imagick with -Ofast -march=native
-fprofile-use between g:b986ed16c2546674 and
g:e1098c7b08d9e601
Product: gcc
Version: 16.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #2 from Jan Hubicka ---
Created attachment 61166
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61166&action=edit
Fix I am testing
The fix I am testing. When VEC_PACK_TRUNC_EXPR is used, add_hook is called with
vec_promote_dem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #1 from Jan Hubicka ---
The problem is in:
/* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end
up doing two conversions and packing them. */
if (!scalar_p && inner_size > outer_size)
{
i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
Bug ID: 119876
Summary: suboptimal code for avx512 conditinal move
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119875
Bug ID: 119875
Summary: loop with floating point conditional move not
vectorized without -ffast-math
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #47 from Jan Hubicka ---
Created attachment 61134
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61134&action=edit
patch w/o forgotten debug output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #46 from Jan Hubicka ---
Created attachment 61133
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61133&action=edit
updated patch
The problem in previous patch was that ipa-prop streams 0 to the end of block
of summary section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #44 from Jan Hubicka ---
Summaries are duplicated when clone is created. Let me debug why it gets lost
here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #37 from Jan Hubicka ---
Created attachment 61128
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61128&action=edit
updated patch (regtests and bootstraps)
Updated patch. Streaming summaries seems to work and fixes the testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #36 from Jan Hubicka ---
Created attachment 61127
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61127&action=edit
patch (untested)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119614
--- Comment #34 from Jan Hubicka ---
I there is only problem that ipa_return_value_sum value sum does not survive
from compile time to WPA then we only need to add streaming code for it. This
should be straightforward and there is no need to add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275
--- Comment #6 from Jan Hubicka ---
as discussed in PR111551 the SPEC train run does not include hottest loop of
imagick (in ref loop), so we optimize it for size (in particular disable
vectorization) and get poor performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646
--- Comment #7 from Jan Hubicka ---
Details are in PR111551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646
--- Comment #6 from Jan Hubicka ---
The problem is that the internal loop in hottest function changes between train
and ref run (train run uses different variant of the loop). This disables
vectorization of the loop believed to be cold causing -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #15 from Jan Hubicka ---
I made sily stand-alone test:
long test[4];
__attribute__ ((noipa))
void
foo (unsigned long a, unsigned long b, unsigned long c, unsigned long d)
{
test[0]=a;
test[1]=b;
test[2]=c;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #14 from Jan Hubicka ---
> > I am OK with using addss cost of 3 for trunk&release branches and make this
> > more precise next stage1.
>
> That's what we use now? But I still don't understand why exactly
> 538.imagick_r regresses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #12 from Jan Hubicka ---
> Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP
> difference.
Yep, I know. With that patch I mostly wanted to limit redundancy of the
tables. The int/Fp difference was mostly based
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #7 from Jan Hubicka ---
Hmm, the sequence does not use + at all, but I think I know what is going on.
While the field is called addss it is used as an kitchen sink for all other
simple operations.
/* pmuludq under sse2, pmuld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #4 from Jan Hubicka ---
Re-benchmarked current trunk -flto -Ofast -march=native (base) and -flto
-Ofast -march=native + PGO (peak) on znver3
Estimated Estimated
Base
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119147
--- Comment #3 from Jan Hubicka ---
With speculation_useful_p we now are able to constant propagate stride into
mc_chroma with PGO, but it does not help runtime.
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680055.html
solves the costi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119606
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119565
Bug ID: 119565
Summary: 13-17% regression of botan CAS128 and DES on zen4
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component
1 - 100 of 911 matches
Mail list logo