[Bug tree-optimization/66862] OpenMP SIMD does not work (use SIMD instructions) on conditional code

2024-04-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #5 from Hongtao Liu --- > My experience is memory cost for the operand with rm or separate r, m is > different which impacts RA decision. > > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html Change operands[1] alterna

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #9 from Hongtao Liu --- > > It looks that different modes of memory read confuse LRA to not CSE the read. > > IMO, if the preloaded value is later accessed in different modes, LRA should > leave it. Alternatively, LRA should CSE m

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #11 from Hongtao Liu --- unsigned v; long long v2; char foo () { v2 = v; return v; } This is related to *movqi_internal, and codegen has been worse since gcc8.1 foo: movlv(%rip), %eax movq%rax, v2(%r

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #12 from Hongtao Liu --- short a; short c; short d; void foo (short b, short f) { c = b + a; d = f + a; } foo(short, short): addwa(%rip), %di addwa(%rip), %si movw%di, c(%rip) movw

[Bug middle-end/110027] [11/12/13/14 regression] Stack objects with extended alignments (vectors etc) misaligned on detect_stack_use_after_return

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #19 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #17) > Both of the posted patches are incorrect, this needs to be fixed in > asan_emit_stack_protection, account for the different offsets[0] which > happens when a sta

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #15 from Hongtao Liu --- > I don't see this as problematic. IIRC, there was a discussion in the past > that a couple (two?) memory accesses from the same location close to each > other can be faster (so, -O2, not -Os) than preloading

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #16 from Hongtao Liu --- > > 4952 /* See if a MEM has already been loaded with a widening operation; > 4953 if it has, we can use a subreg of that. Many CISC machines > 4954 also have such operations, but this

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #4 from Hongtao Liu --- (In reply to Hongtao Liu from comment #3) > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. Oh, ix86_vect_estimate_reg_pressure is only for loop, BB vectorizer only use ix86_builti

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > (In reply to Hongtao Liu from comment #3) > > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. > > Oh, ix86_vect_estimate_reg_pressure is

[Bug target/85048] [missed optimization] vector conversions

2024-04-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/85048] [missed optimization] vector conversions

2024-04-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048 --- Comment #16 from Hongtao Liu --- (In reply to Matthias Kretz (Vir) from comment #15) > So it seems that if at least one of the vector builtins involved in the > expression is 512 bits GCC needs to locally increase prefer-vector-width to > 512

[Bug target/110621] x86_64: Test gcc.target/i386/pr105354-2.c fails with -fstack-protector

2024-04-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110621 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/114883] New: 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- during GIMPLE pass: vect dump file: module_cam_mp_ndrop.fppized.f90.179t.vect module_cam_mp_ndrop.fppized.f90:33:27

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #2 from Hongtao Liu --- (In reply to Andrew Pinski from comment #1) > Can you reduce the fortran code down for the ICE? It should not be hard, you > can use delta even. Let me try.

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #3 from Hongtao Liu --- Created attachment 58066 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58066&action=edit reproduced testcase gfortran -O2 -march=x86-64-v4 -fvect-cost-model=cheap.

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #4 from Hongtao Liu --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a6cf0a5546c..ae6abe00f3e 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8505,7 +8505,8 @@ vect_transform_reduction (loop_vec

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index a6cf0a5546c..ae6abe00f3e 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #10 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #9) > Created attachment 58073 [details] > gcc14-pr114883.patch > > Full untested patch. This will fix 521.wrf_r ICE, and pass runtime validation.

[Bug libgcc/114907] __trunchfbf2 should be renamed to __extendhfbf2

2024-05-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114907 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114943] X86 AVX2: inefficient code generated to convert SIMD Vectors

2024-05-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.

2024-05-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/113090] Suboptimal vector permuation for 64-bit vector.

2024-05-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113090 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org

[Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog

2024-05-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef signed char v16qi __attribute__ ((__vector_size__ (16))); v16qi foo (v16qi x) { return x >> 5; } with -march=x86-64-v4 -O2, GCC 13.2 gen

[Bug target/114987] [14/15 Regression] floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms

2024-05-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987 --- Comment #6 from Hongtao Liu --- > I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)" > and rebuilt the binary and it will save half the regression. 57.93 │200: vaddps 0xc0(%rsp),%ymm3,%ymm5

[Bug target/101017] ICE: Segmentation fault, convert_memory_address_addr_space_1 with vector_size(32) and target_clone arch=core-avx2/default

2024-05-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101017 Hongtao Liu changed: What|Removed |Added CC||haochen.jiang at intel dot com --- Commen

[Bug middle-end/115101] New: [wrong code] with -O1 -floop-nest-optimize for gcc.dg/graphite/interchange-8.c

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- When I'm working on turning cunrolli, I found if cunrollis is disabled

[Bug target/115115] [12/13/14/15 Regression] highway-1.0.7 wrong _mm_cvttps_epi32() constant fold

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115115 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/115116] New: [x86] rtx_cost is overestimated for big size memory.

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef char v16qi __attribute__((vector_size(16))); v16qi __attribute__((noipa)) foo (v16qi a) { v16qi c

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #5 from Hongtao Liu --- (In reply to Krzysztof Kanas from comment #4) > I bisected the issue and it seems that commit > 0368fc54bc11f15bfa0ed9913fd0017815dfaa5d introduces regression. I guess the real guilty commit is commit 52ff3

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #11 from Hongtao Liu --- (In reply to Haochen Jiang from comment #10) > A patch like Comment 8 could definitely solve the problem. But I need to > test more benchmarks to see if there is surprise. > > But, yes, as Uros said in Comme

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #14 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #13) > (In reply to Haochen Jiang from comment #12) > > (In reply to Hongtao Liu from comment #11) > > > (In reply to Haochen Jiang from comment #10) > > > > A patch like

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #16 from Hongtao Liu --- > Should we also run a SPEC on with -O2 -mtune=generic -march=x86-64-v3 to see > if there is any surprise? Sure, I guess no.

[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 --- Comment #4 from Hongtao Liu --- (In reply to Hu Lin from comment #3) > I found compiler allocates mem to the third source register of vpternlog in > IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the > generate code

[Bug target/114427] [x86] vec_pack_truncv8si/v4si can be optimized with pblendw instead of pand for AVX2 target

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114427 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #11 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #10) > Any of the floating point to integer intrinsics if they have out of range > value (haven't checked whether floating point to unsigned intrinsic is a > problem to

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #16 from Hongtao Liu --- > > That said, this change really won't help the backend which supposedly should > have the same behavior regardless of -fno-trapping-math, because in that > case it is the value > of the result (which is u

[Bug target/114148] gcc.target/i386/pr106010-7b.c FAILs

2024-05-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114148 --- Comment #4 from Hongtao Liu --- (In reply to r...@cebitec.uni-bielefeld.de from comment #3) > To investigate further, I've added comparison functions to a reduced > version of pr106010-7b.c, with > > void > cmp_epi8 (_Complex unsigned char*

[Bug target/114148] gcc.target/i386/pr106010-7b.c FAILs

2024-05-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114148 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/115161] highway-1.0.7 miscompilation of _mm_cvttps_epi32(): invalid result assumed

2024-05-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #25 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #17) > I don't think the cost of using UNSPEC would be significant if the backend > tried to constant fold more target builtins. Anyway, with the proposed > changes pe

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/67325] Optimize shift (aka subreg) of load to simple load

2024-05-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67325 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 112325, which changed state. Bug 112325 Summary: Missed vectorization of reduction after unrolling https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 What|Removed |Added ---

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508 --- Comment #25 from Hongtao Liu --- (In reply to Peter Cordes from comment #22) > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic > for MOVLPD? > >From Intel intrinsic guide[1], there's explict "mem_addr does not need

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508 --- Comment #26 from Hongtao Liu --- (In reply to Hongtao Liu from comment #25) > (In reply to Peter Cordes from comment #22) > > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic > > for MOVLPD? > > > From Intel intrins

[Bug target/114125] Support vcond_mask_qiqi and friends.

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114125 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/115299] New: [14 regression] pr86722.c failed to eliminate branch.

2024-05-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* i?86-*-* void f(double*d,double*e){ for(;d

[Bug target/115299] [14/15 regression] pr86722.c failed to eliminate branch.

2024-05-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115299 --- Comment #2 from Hongtao Liu --- > Maybe r14-53-g675b1a7f113adb . Probably, current cost model may need adjustment.

[Bug target/113609] EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF.

2024-06-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113609 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/115299] [14/15 regression] pr86722.c failed to eliminate branch.

2024-06-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115299 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug other/115334] new test case gcc.dg/vect/pr112325.c from r15-919-gef27b91b62c3aa fails

2024-06-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115334 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug other/115334] new test case gcc.dg/vect/pr112325.c from r15-919-gef27b91b62c3aa fails

2024-06-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115334 --- Comment #2 from Hongtao Liu --- diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c b/gcc/testsuite/gcc.dg/vect/pr112325.c index dea6cca3b86..143903beab2 100644 --- a/gcc/testsuite/gcc.dg/vect/pr112325.c +++ b/gcc/testsuite/gcc.dg/vect/pr11232

[Bug target/115341] [15 regression] gcc.target/i386/apx-ndd-2.c etc. FAIL

2024-06-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115341 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug rtl-optimization/115351] [14/15 regression] pointless movs when passing by value on x86-64

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114428] [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug other/115365] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365 --- Comment #1 from Hongtao Liu --- pr100927.c.349r.final:(fix:SI (reg:SF 32 0 [120]))) "../../gcc/intel-innersource/pr115365/gcc/testsuite/gcc.dg/pr100927.c":12:10 428 {*fix_truncsfsi2_p8} pr100927.c.349r.final: (expr_list:REG_EQUIV

[Bug target/43618] Incorrect sse2_cvtX2Y pattern

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43618 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/43618] Incorrect sse2_cvtX2Y pattern

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43618 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug rtl-optimization/115369] New: ifcvt failed to condition elimination for__builtin_mul_overflow

2024-06-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- int foo (unsigned a, unsigned b, unsigned d, unsigned e, int* p) { unsigned int r; int c = __builtin_mul_overflow

[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL

2024-06-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug testsuite/115365] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-06-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365 --- Comment #5 from Hongtao Liu --- (In reply to Rainer Orth from comment #4) > Unfortunately, the fix broke 32-bit Solaris/SPARC in exchange: > > FAIL: gcc.dg/pr100927.c scan-rtl-dump-times final "(?n)(fix:SI" 3 > /* { dg-final { scan-rtl

[Bug testsuite/115334] new test case gcc.dg/vect/pr112325.c from r15-919-gef27b91b62c3aa fails

2024-06-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115334 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/112532] [14 Regression] ICE: in extract_insn, at recog.cc:2804 (unrecognizable insn: vec_duplicate:V4HI) with -O -msse4 since r14-5388-g2794d510b979be

2023-11-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112532 --- Comment #5 from liuhongt at gcc dot gnu.org --- (In reply to Sam James from comment #4) > btw: if you change your email on bugzilla to liuho...@gcc.gnu.org, you'll > get more permissions to edit bugs. Ok, thanks.

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #4 from liuhongt at gcc dot gnu.org --- (In reply to liuhongt from comment #3) > BB vectorizer relies on the backend support of .REDUC_PLUS for reduction, > but loop vectorizer can manually do reduction. That's w

[Bug target/112532] [14 Regression] ICE: in extract_insn, at recog.cc:2804 (unrecognizable insn: vec_duplicate:V4HI) with -O -msse4 since r14-5388-g2794d510b979be

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112532 liuhongt at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status

[Bug target/112547] 9% exec time regression of 462.libquantum SPEC on AMD zen4 CPU

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112547 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug tree-optimization/112579] New: bb vectorizer failed to reduction sum += inv >> {1,2,3,4,5,6,7,8}

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Blocks: 112325 Target Milestone: --- This is from PR112325 unsigned foo (un

[Bug tree-optimization/112579] bb vectorizer failed to reduction sum += inv >> {1,2,3,4,5,6,7,8}

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112579 --- Comment #1 from liuhongt at gcc dot gnu.org --- test.c:28:8: note: vect_is_simple_use: operand qh_16(D) >> 1, type of def: internal test.c:28:8: note: vect_is_simple_use: operand qh_16(D), type of def: external test.c:28:8

[Bug tree-optimization/112579] bb vectorizer failed to reduction sum += inv >> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112579 --- Comment #2 from liuhongt at gcc dot gnu.org --- Got vectorized after change source code to unsigned foo (unsigned * restrict s, unsigned qh, unsigned * restrict qs) { unsigned int sumi = 0; sumi += (qh >> 16); sumi += (q

[Bug tree-optimization/112579] bb vectorizer failed to reduction sum += inv >> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112579 --- Comment #3 from liuhongt at gcc dot gnu.org --- (In reply to liuhongt from comment #1) > test.c:28:8: note: vect_is_simple_use: operand qh_16(D) >> 1, type of def: > internal > test.c:28:8: note: vect_is_simple_use:

[Bug tree-optimization/112579] bb vectorizer failed to reduction sum += inv >> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

2023-11-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112579 --- Comment #4 from liuhongt at gcc dot gnu.org --- > or normally, it should splitted into groups size 4 + 4 + 3 and vectorize for > 2 group size 4. /* Try to break the group up into pieces. */ if (kind == slp_inst_kind_store Cur

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #14 from liuhongt at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #13) > (In reply to liuhongt from comment #12) > > > > Is there any progress for this? > > I have a patch ready to post for thi

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/102543] -march=cascadelake performs odd alignment peeling

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102543 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED

[Bug target/107261] ICE: in classify_argument, at config/i386/i386.cc:2523 on __bf16 vect argument or return value

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107261 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/101471] AVX512 incorrect writemask generated for _mm512_fpclass_ps_mask in O0

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101471 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED

[Bug rtl-optimization/107057] [11/12 Regression] ICE in extract_constrain_insn, at recog.cc:2692

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107057 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED

[Bug target/107322] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) with __bf16 compare

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107322 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/111907] ICE: in curr_insn_transform, at lra-constraints.cc:4294 unable to generate reloads for: {*andnottf3} with -mavx512f -mno-evex512

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111907 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/111225] ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED

[Bug target/111062] ICE: in final_scan_insn_1, at final.cc:2808 could not split insn {*andndi_1} with -O -mavx10.1-256 -mavx512bw -mno-avx512f

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111062 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/111061] ICE: in emit_move_insn, at expr.cc:4219 with -O -mavx10.1-512 and __builtin_convertvector()

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111061 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/110788] Spilling to mask register for GPR vec_duplicate

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED

[Bug target/109504] [12/13/14 Regression] Compilation fails with pragma GCC target sse4.1 and immintrin.h

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109504 liuhongt at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED CC

[Bug target/110591] [i386] (Maybe) Missed optimisation: _cmpccxadd sets flags

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110591 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED

[Bug target/110227] [13/14 Regression] gcc generates invalid AVX-512 code

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110227 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-11-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug middle-end/112824] Stack spills and vector splitting with vector builtins

2023-12-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/101017] ICE: Segmentation fault, convert_memory_address_addr_space_1 with vector_size(32) and target_clone arch=core-avx2/default

2023-12-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101017 liuhongt at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED

[Bug target/112816] [11/12/13/14 Regression] ICE unrecognizable_insn with __builtin_signbit and returning struct with int[4]

2023-12-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112816 liuhongt at gcc dot gnu.org changed: What|Removed |Added CC||liuhongt at gcc dot

[Bug target/101017] ICE: Segmentation fault, convert_memory_address_addr_space_1 with vector_size(32) and target_clone arch=core-avx2/default

2023-12-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101017 --- Comment #6 from Hongtao Liu --- (In reply to Andrew Pinski from comment #4) > Note the testcase which ICEs is now: > ``` > typedef int v32qi __attribute__((vector_size(32))); > __attribute__((target_clones("arch=core-avx2", "default"))) v32q

<    1   2   3   4   5   6   >