[Bug rtl-optimization/107057] [10/11/12/13 Regression] ICE in extract_constrain_insn, at recog.cc:2692

2022-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107057 --- Comment #8 from Hongtao.liu --- And it looks like the pattern is wrongly defined since from [1]. --cut begin Matching constraints are used in these circumstances. More precisely, the two operands that match must include

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #3 from Hongtao.liu --- typedef int v4si __attribute__((vector_size(16))); typedef long long v4di __attribute__((vector_size(32))); v4si foo (v4di a) { return __builtin_convertvector (a, v4si); } hmm, we actually support truncv

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #4 from Hongtao.liu --- (In reply to Hongtao.liu from comment #3) > typedef int v4si __attribute__((vector_size(16))); > typedef long long v4di __attribute__((vector_size(32))); > > v4si > foo (v4di a) > { > return __builtin_con

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #5 from Hongtao.liu --- > It's lowered by pass_lower_vector, ideally, can we use truncmn2 in > expand_VEC_CONVERT if src is bigger integer mode than dest. Currently, expand_vector_conversion uses VEC_PACK_TRUNC_EXPR --

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #6 from Hongtao.liu --- > Guess expand_vector_conversion can be optimized. if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type)) && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type))) code = FIX_TRUNC_EXPR; else if (INTEGRAL_TYPE_P (TRE

[Bug target/107432] __builtin_convertvector generates inefficient code

2022-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432 --- Comment #7 from Hongtao.liu --- (In reply to Hongtao.liu from comment #6) > > Guess expand_vector_conversion can be optimized. > > if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type)) > && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type))) > cod

[Bug target/107261] ICE: in classify_argument, at config/i386/i386.cc:2523 on __bf16 vect argument or return value

2022-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107261 --- Comment #6 from Hongtao.liu --- Fixed in GCC13.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #32 from Hongtao.liu --- (In reply to Marko Mäkelä from comment #31) > Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang > there is a related ticket https://github.com/llvm/llvm-project/issues/37322 > > I

[Bug tree-optimization/107451] [11/12/13 Regression] Segmentation fault with vectorized code.

2022-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451 --- Comment #4 from Hongtao.liu --- (In reply to bartoldeman from comment #3) > Created attachment 53786 [details] > Corrected test case > > In my eagerness to make it as short as possible I made it too short indeed! 35 [local count: 105119

[Bug middle-end/107487] New: Issue an error for illegal digit constraint.

2022-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107487 Bug ID: 107487 Summary: Issue an error for illegal digit constraint. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: mid

[Bug rtl-optimization/107057] [10/11/12/13 Regression] ICE in extract_constrain_insn, at recog.cc:2692

2022-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107057 --- Comment #11 from Hongtao.liu --- Fixed in GCC13, and open a separate bug PR107487 for #c9.

[Bug target/107540] [13 Regression] ICE: SIGSEGV in memory_operand(rtx_def*, machine_mode) (recog.cc:1834) with -flive-range-shrinkage

2022-11-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/107540] [13 Regression] ICE: SIGSEGV in memory_operand(rtx_def*, machine_mode) (recog.cc:1834) with -flive-range-shrinkage

2022-11-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540 --- Comment #2 from Hongtao.liu --- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fa93ae7bf21..4e8463addc3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12203,7 +12203,7 @@ (define_insn "avx512f_movddu

[Bug target/107546] [10/11/12/13 Regression] simd, redundant pcmpeqb and pxor

2022-11-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546 --- Comment #3 from Hongtao.liu --- Failed to match this instruction: (set (reg:V16QI 95) (eq:V16QI (gt:V16QI (subreg:V16QI (reg:V2DI 89 [ MEM[(const __m128i_u * {ref-all})p_2(D)] ]) 0) (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC0") [

[Bug target/107546] [10/11/12/13 Regression] simd, redundant pcmpeqb and pxor

2022-11-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546 --- Comment #4 from Hongtao.liu --- > even. Notice the < vs <= there. > I suspect the <= expansion part of the x86_64 backend needs to be fixed up > to produce better code. Hmm, we do have a extra pcmpeq to negate the result. --cu

[Bug target/107546] [10/11/12/13 Regression] simd, redundant pcmpeqb and pxor

2022-11-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546 --- Comment #7 from Hongtao.liu --- (In reply to Hongtao.liu from comment #3) > Failed to match this instruction: > (set (reg:V16QI 95) > (eq:V16QI (gt:V16QI (subreg:V16QI (reg:V2DI 89 [ MEM[(const __m128i_u * > {ref-all})p_2(D)] ]) 0) >

[Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one

2022-11-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #5

[Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one

2022-11-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563 --- Comment #6 from Hongtao.liu --- Shufd only handles void foo1(temp_vec_type& v) noexcept { v=__builtin_shufflevector(v,v,12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3); } Not the case in #c0.

[Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one

2022-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563 --- Comment #10 from Hongtao.liu --- (In reply to cqwrteur from comment #9) > (In reply to cqwrteur from comment #8) > > for sse2 to do the __builtin_convertvector job yeah > > https://godbolt.org/z/dsf3WK58E > > using temp_vec_type [[__gnu__:

[Bug target/107540] [13 Regression] ICE: SIGSEGV in memory_operand(rtx_def*, machine_mode) (recog.cc:1834) with -flive-range-shrinkage

2022-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540 --- Comment #4 from Hongtao.liu --- Fixed in GCC13.

[Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or.

2022-11-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627 Bug ID: 107627 Summary: [13] Regression int128_t shift generates extra xor/or. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Comp

[Bug target/106220] x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity

2022-11-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220 --- Comment #4 from Hongtao.liu --- Try to add combine splitter (define_insn_and_split "*x86_64_shrd_lshiftrtti" [(set (match_operand:DI 0 "nonimmediate_operand") (subreg:DI (lshiftrt:TI (match_operand:TI 1 "nonimmediate_operand")

[Bug target/107627] [13] Regression int128_t shift generates extra xor/or.

2022-11-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627 --- Comment #1 from Hongtao.liu --- Looks like caused by r13-1379-ge8a46e5cdab500

[Bug target/107671] i386: Missed optimization: use of bt in bit test pattern (using -O2 -mtune=core2)

2022-11-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107671 --- Comment #3 from Hongtao.liu --- We already have --cut from i386.md 15204;; Help combine recognize bt followed by setc 15205(define_insn_and_split "*bt_setcqi" 15206 [(set (subreg:SWI48 (match_operand:QI 0 "register_operand")

[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?

2022-11-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748 --- Comment #2 from Hongtao.liu --- float _mm_cvtsbh_ss (__bf16 __A) { union{ float sf; __bf16 bf[2];} __tmp; __tmp.sf = 0.0f; __tmp.bf[1] = __A; return __tmp.sf; } Looks like gcc can optimize it to _mm_cvtsbh_ss(bool _Accum):

[Bug rtl-optimization/107774] New: rtl failed to simplify subreg:vec_select to just vec_select.

2022-11-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107774 Bug ID: 107774 Summary: rtl failed to simplify subreg:vec_select to just vec_select. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal

[Bug rtl-optimization/107775] New: misoptimization in vec_set lower part of vector in the memory.

2022-11-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107775 Bug ID: 107775 Summary: misoptimization in vec_set lower part of vector in the memory. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal

[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?

2022-11-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748 --- Comment #9 from Hongtao.liu --- Since BFmode is most like in xmm register, I'm going to use vector shift instruction: pslld $16, %xmm0 for extendbfsf2, psrld %16, %xmm0 for truncsfbf2, It doesn't require any GPR, and no need to use avx512bf1

[Bug rtl-optimization/107775] misoptimization in vec_set lower part of vector in the memory.

2022-11-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107775 --- Comment #2 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > misoptimization as in wrong-code or missed-optimization? missed optimization.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 Hongtao.liu changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #4

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #5 from Hongtao.liu --- Also I get below from build_common_tree_nodes /* Define `char', which is like either `signed char' or `unsigned char' but not the same as either. */ char_type_node = (signed_char ? make_s

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #6 from Hongtao.liu --- For pattern (set (reg:QI 607) (const_int 255 [0xff])) general_operand return false for op const_int 255 QImode since trunc_int_for_mode (INTVAL (op), mode) return -1, INVAL (op) is 255. ---cut from gener

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #7 from Hongtao.liu --- > - if (width < HOST_BITS_PER_WIDE_INT) > + if (width < HOST_BITS_PER_WIDE_INT > + && (mode != QImode || !flag_signed_char)) typo should be + && (mode != QImode || flag_signed_char))

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #8 from Hongtao.liu --- (In reply to Hongtao.liu from comment #7) > > - if (width < HOST_BITS_PER_WIDE_INT) > > + if (width < HOST_BITS_PER_WIDE_INT > > + && (mode != QImode || !flag_signed_char)) > typo should be > + &&

[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748 --- Comment #11 from Hongtao.liu --- Fixed in GCC13.

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #23 from Hongtao.liu --- > the blends do not look like no-ops so I wonder if this is really computing > the same thing ... (it swaps lane 0 from the two loads from x but not the > stores) They're computing the same thing since we al

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #24 from Hongtao.liu --- _233 = {f_im_36, f_re_35, f_re_35, f_re_35}; _217 = {f_re_35, f_im_36, f_im_36, f_im_36}; ... vect_x_re_55.15_227 = VEC_PERM_EXPR ; vect_x_re_55.23_211 = VEC_PERM_EXPR ; ... vect_y_re_69.17_224 = .FNMA

[Bug middle-end/107891] New: Redudant "double" permutation from PR97832

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891 Bug ID: 107891 Summary: Redudant "double" permutation from PR97832 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middl

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #26 from Hongtao.liu --- > I guess that's possible but the SLP vectorizer has a permute optimization > phase (and SLP discovery itself), it would be nice to see why the former > doesn't elide the permutes here. I've opened PR107891

[Bug middle-end/107891] Redudant "double" permutation from PR97832

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891 --- Comment #1 from Hongtao.liu --- commemt25 from PR97832 I guess that's possible but the SLP vectorizer has a permute optimization phase (and SLP discovery itself), it would be nice to see why the former doesn't elide the permutes here.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #9 from Hongtao.liu --- expand_expr_real_1 generates (const_int 255) without considering the target mode. I guess it's on purpose, so I'll leave that alone and only change the expander in the backend. After applying convert_modes to

[Bug rtl-optimization/107892] Unnecessary move between ymm registers in loop using AVX2 intrinsic

2022-11-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892 --- Comment #3 from Hongtao.liu --- > In the bad version, I noticed that the RTL initially has two separate insns > for 'a += *p': one to do the addition and write the result to a new pseudo > register, and one to convert the value from mode V8

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #10 from Hongtao.liu --- I notice there's TARGET_PROMOTE_PROTOTYPES which can prevent unsigend char 255 be extended to int 255 which is a more perfect solution to this problem. But we can only get fntype in this hook, ideally we shou

[Bug target/107934] ICE: SIGSEGV in immediate_operand (recog.cc:1618) with -O2 -mtune=knl -ffinite-math-only and __bf16 since r13-4314-ga1ecc5600464f6a6

2022-11-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107934 --- Comment #3 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #2) > The type of extendbfsf2_1 insn should be sseishft1. Yes.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #15 from Hongtao.liu --- Fixed in GCC10.5, GCC11.4,GCC12.3 and GCC13.

[Bug target/107934] ICE: SIGSEGV in immediate_operand (recog.cc:1618) with -O2 -mtune=knl -ffinite-math-only and __bf16 since r13-4314-ga1ecc5600464f6a6

2022-12-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107934 --- Comment #5 from Hongtao.liu --- Fixed in GCC13.

[Bug target/107970] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-2730-gd0c73b6c85677e67

2022-12-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107970 --- Comment #1 from Hongtao.liu --- Mine, let me fix it.

[Bug target/107970] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-2730-gd0c73b6c85677e67

2022-12-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107970 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/36821] Flush denormals to Zero Flag

2022-12-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36821 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #8 f

[Bug tree-optimization/105216] [12/13 regression] 8% regression for m-queens compared to gcc11 O2 on CLX. since r12-3876-g4a960d548b7d7d94

2023-01-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #16 from Hongtao.liu --- (In reply to Andrew Pinski from comment #15) > Might be interesting to test it again to see if it has been fixed on the > trunk. The regression is still there. gcc version 13.0.0 20230102 (experimental) (GC

[Bug target/105484] [11/12/13 Regression] ICE: verify_flow_info failed: BB 2 cannot throw but has an EH edge with -Os -march=cannonlake -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop since r11

2022-05-04 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105484 --- Comment #2 from Hongtao.liu --- I'll take a look.

[Bug target/105484] [11/12/13 Regression] ICE: verify_flow_info failed: BB 2 cannot throw but has an EH edge with -Os -march=cannonlake -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop since r11

2022-05-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105484 --- Comment #3 from Hongtao.liu --- Similar like PR104450, don't expand stmt to vec_set when there's EH on it.

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 --- Comment #1 from Hongtao.liu --- Pass remove_partial_avx_dependency is before RA, which we have (insn 128 127 129 22 (set (reg/v:DF 99 [ z ]) (float_extend:DF (reg/v:SF 117 [ x ]))) "test.c":43:10 163 {*extendsfdf2} and attr avx_par

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 --- Comment #2 from Hongtao.liu --- After set remove_partial_avx_dependency to true for register alternative, we get vxorps %xmm3, %xmm3, %xmm3 vmovsd .LC16(%rip), %xmm6 vmovsd .LC14(%rip), %xmm5 vcvtss2sd

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 Hongtao.liu changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #3

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 --- Comment #4 from Hongtao.liu --- Another possible solution is add a little bit dislike for "m" alternative(like ?m) to avoid potential spill.

[Bug target/105072] Miss optimization for pmovzxbq.

2022-05-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/105073] [meta bug]Patch pending for GCC13.

2022-05-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Bug 105073 depends on bug 105072, which changed state. Bug 105072 Summary: Miss optimization for pmovzxbq. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072 What|Removed |Added ---

[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled

2022-05-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354 --- Comment #5 from Hongtao.liu --- Fixed in GCC13.

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #2 from Hongtao.liu --- Just note #c4 in pr105504 also solve this issue. >Another possible solution is add a little bit dislike for "m" alternative(like >?m) to avoid potential spill.

[Bug target/104915] Miss optimization for vec_setv8hi_0

2022-05-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/105073] [meta bug]Patch pending for GCC13.

2022-05-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Bug 105073 depends on bug 104915, which changed state. Bug 104915 Summary: Miss optimization for vec_setv8hi_0 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915 What|Removed |Added ---

[Bug target/105576] x86: Support a machine constraint to get raw symbol name

2022-05-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #4

[Bug tree-optimization/102583] [x86] Failure to optimize 32-byte integer vector conversion to 16-byte float vector properly when converting upper part with -mavx2

2022-05-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583 --- Comment #6 from Hongtao.liu --- Fixed in GCC13.

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #4 from Hongtao.liu --- For pattern supports 'm' alternative, mem_cost is frequency which is quite low compared to pp->costs (ira_register_move_cost[mode][rclass][hard_reg_class]) * frequency) For x86 backend even gpr->gpr cost is 2

[Bug target/105587] [13 Regression] ICE in extract_insn, at recog.cc:2791 (error: unrecognizable insn) since r13-210-gfcda0efccad41eba

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105587 --- Comment #2 from Hongtao.liu --- Mine, I'm testing a patch.

[Bug tree-optimization/105591] [13 Regression] ICE: in tree_to_poly_uint64, at tree.cc:3250 with -O -mavx512f -mno-avx2

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug tree-optimization/105591] [13 Regression] ICE: in tree_to_poly_uint64, at tree.cc:3250 with -O -mavx512f -mno-avx2 since r13-379

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591 --- Comment #6 from Hongtao.liu --- Maybe we should add an canonicalization in match.pd to make sure index is in range of 0 - 2*N, and the general code need't to do check idx % 2*N.

[Bug tree-optimization/105591] [13 Regression] ICE: in tree_to_poly_uint64, at tree.cc:3250 with -O -mavx512f -mno-avx2 since r13-379

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591 --- Comment #10 from Hongtao.liu --- Understand code like builtin_shuffle may have out-range index which needs to be clamped, but why vec_perm_expr also needs to accept that.

[Bug tree-optimization/105591] [13 Regression] ICE: in tree_to_poly_uint64, at tree.cc:3250 with -O -mavx512f -mno-avx2 since r13-379

2022-05-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591 --- Comment #12 from Hongtao.liu --- (In reply to Jakub Jelinek from comment #11) > Because VEC_PERM_EXPR doesn't require the mask argument to be constant (and > neither does __builtin_shuffle, unlike e.g. __builtin_shufflevector). > If the mask

[Bug target/105587] [13 Regression] ICE in extract_insn, at recog.cc:2791 (error: unrecognizable insn) since r13-210-gfcda0efccad41eba

2022-05-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105587 --- Comment #4 from Hongtao.liu --- Fixed in GCC13.

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #5 from Hongtao.liu --- And for constraint like 'vm', it's different from 'v,m' in calculating mem_cost which will impact RA when op is REG_P. For 'v,m' mem_cost is just 1 * frequency, but for 'vm' mem_cost is much bigger(memory_move

[Bug tree-optimization/105591] [13 Regression] ICE: in tree_to_poly_uint64, at tree.cc:3250 with -O -mavx512f -mno-avx2 since r13-379

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591 --- Comment #14 from Hongtao.liu --- Fixed in GCC13.

[Bug target/105033] Suboptimal for vec_concat lower halves of two vectors.

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033 Hongtao.liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/105073] [meta bug]Patch pending for GCC13.

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Bug 105073 depends on bug 105033, which changed state. Bug 105033 Summary: Suboptimal for vec_concat lower halves of two vectors. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033 What|Removed |Added

[Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #7 from Hongtao.liu --- Hmm, we have specific code to add scalar->vector(vmovq) cost to vector construct, but it seems not to work here, guess it's because &r0,and thought it was load not scalar? r0.1_21 1 times scalar_store costs

[Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #8 from Hongtao.liu --- (In reply to Hongtao.liu from comment #7) > Hmm, we have specific code to add scalar->vector(vmovq) cost to vector > construct, but it seems not to work here, guess it's because &r0,and thought > it was load n

[Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases

2022-05-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #9 from Hongtao.liu --- (In reply to Hongtao.liu from comment #8) > (In reply to Hongtao.liu from comment #7) > > Hmm, we have specific code to add scalar->vector(vmovq) cost to vector > > construct, but it seems not to work here, gu

[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

2022-05-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610 --- Comment #18 from Hongtao.liu --- Fixed in GCC13.

[Bug target/104375] [x86] Failure to recognize bzhi pattern when shr is present

2022-05-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104375 --- Comment #3 from Hongtao.liu --- Fixed in GCC13.

[Bug tree-optimization/103462] GCC failed to reduce bit clear in loop.

2022-05-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/105073] [meta bug]Patch pending for GCC13.

2022-05-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Bug 105073 depends on bug 103462, which changed state. Bug 103462 Summary: GCC failed to reduce bit clear in loop. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462 What|Removed |Added ---

[Bug target/105073] [meta bug]Patch pending for GCC13.

2022-05-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/105650] [13 Regression] Possibly wrong code on fontforge -fvect-cost-model=unlimited

2022-05-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105650 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug tree-optimization/105650] [13 Regression] Possibly wrong code on fontforge -fvect-cost-model=unlimited

2022-05-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105650 --- Comment #4 from Hongtao.liu --- > > I think the problem for me is value mismatch in compare of `if (v != cnt) > __builtin_trap();`. Invalid instruction is generated by `__builtin_trap()`. Oh, it's ud2. But still can't reproduce the error

[Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases

2022-05-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #13 from Hongtao.liu --- > so for the situation at hand I don't see any reasonable way out that > doesn't have the chance of regressing things in other places (like > treat loads from non-indexed auto variables specially or so). Th

[Bug target/105493] [12/13 Regression] x86_64 538.imagick_r 6% regressions and 2% 525.x264_r regressions on Alder Lake after r12-7319-g90d693bdc9d718

2022-05-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105493 --- Comment #4 from Hongtao.liu --- regarding 525, it's pr101929.

[Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX.

2022-05-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 Hongtao.liu changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2022-05-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 101929, which changed state. Bug 101929 Summary: [12 Regression] r12-7319 regress x264_r by 4% on CLX. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 What|Removed |Added ---

[Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX.

2022-05-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 --- Comment #12 from Hongtao.liu --- > It's difficult (if not impossible) for the vectorizer to second-guess > the followup FRE, we're a long way from doing loop + SLP vectorization > in one go and discover we can elide the vector store. I'm t

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #6 from Hongtao.liu --- Also notice a intersting case impacted by a separate m alternatvie. typedef long v2di __attribute__((vector_size(16))); v2di foo (v2di a) { a[1] = 1113; return a; } with -O2 gcc generates foo(long __ve

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #8 from Hongtao.liu --- (In reply to Alexander Monakov from comment #7) > The second sequence is 3 uops vs 1/2 (issued/executed) uops in first, and on > Haswell and Skylake it ties up port 5 for two cycles. > > Unclear if you're mic

[Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector

2022-05-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668 --- Comment #4 from Hongtao.liu --- Guess we need to extend backend hook to handle different input and output modes.

[Bug tree-optimization/105735] New: GCC failed to reduce &= loop_inv in loop.

2022-05-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105735 Bug ID: 105735 Summary: GCC failed to reduce &= loop_inv in loop. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-o

[Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code

2022-05-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583 --- Comment #7 from Hongtao.liu --- i386 already has 12980(define_insn_and_split "*x86_shrd_2" 12981 [(set (match_operand:SI 0 "nonimmediate_operand") 12982(ior:SI (lshiftrt:SI (match_dup 0) 12983 (match_oper

[Bug target/105754] gcc/config/i386/i386.c missing break in get_builtin_code_for_version

2022-05-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105754 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug rtl-optimization/53533] [10/11/12/13 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2022-05-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 --- Comment #45 from Hongtao.liu --- A reduced testcase. int a[256]; int b[256]; void foo (void) { int i; for (i = 0; i < 256; ++i) { int tmp = a[i] + 12345; tmp *= 914237; tmp += 12332; tmp *= 914237; tmp

[Bug rtl-optimization/53533] [10/11/12/13 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2022-05-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 --- Comment #47 from Hongtao.liu --- > > The issue is that the re-association pass doesn't handle operations > with undefined overflow behavior, we do have duplicate bugreports > for this. > I saw below in match.pd 478/* Combine successive

[Bug target/89929] __attribute__((target("avx512bw"))) doesn't work on non avx512bw systems

2022-05-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89929 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #31

[Bug middle-end/105781] GCC does not unroll auto-vectorized loops.

2022-05-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105781 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #4

<    5   6   7   8   9   10   11   12   13   14   >