https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107057
--- Comment #8 from Hongtao.liu ---
And it looks like the pattern is wrongly defined since from [1].
--cut begin
Matching constraints are used in these circumstances. More precisely, the two
operands that match must include
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #3 from Hongtao.liu ---
typedef int v4si __attribute__((vector_size(16)));
typedef long long v4di __attribute__((vector_size(32)));
v4si
foo (v4di a)
{
return __builtin_convertvector (a, v4si);
}
hmm, we actually support truncv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #4 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #3)
> typedef int v4si __attribute__((vector_size(16)));
> typedef long long v4di __attribute__((vector_size(32)));
>
> v4si
> foo (v4di a)
> {
> return __builtin_con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #5 from Hongtao.liu ---
> It's lowered by pass_lower_vector, ideally, can we use truncmn2 in
> expand_VEC_CONVERT if src is bigger integer mode than dest.
Currently, expand_vector_conversion uses VEC_PACK_TRUNC_EXPR
--
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #6 from Hongtao.liu ---
> Guess expand_vector_conversion can be optimized.
if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
&& SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
code = FIX_TRUNC_EXPR;
else if (INTEGRAL_TYPE_P (TRE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #7 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #6)
> > Guess expand_vector_conversion can be optimized.
>
> if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
> && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
> cod
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107261
--- Comment #6 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #32 from Hongtao.liu ---
(In reply to Marko Mäkelä from comment #31)
> Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang
> there is a related ticket https://github.com/llvm/llvm-project/issues/37322
>
> I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451
--- Comment #4 from Hongtao.liu ---
(In reply to bartoldeman from comment #3)
> Created attachment 53786 [details]
> Corrected test case
>
> In my eagerness to make it as short as possible I made it too short indeed!
35 [local count: 105119
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107487
Bug ID: 107487
Summary: Issue an error for illegal digit constraint.
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: mid
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107057
--- Comment #11 from Hongtao.liu ---
Fixed in GCC13, and open a separate bug PR107487 for #c9.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540
--- Comment #2 from Hongtao.liu ---
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index fa93ae7bf21..4e8463addc3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12203,7 +12203,7 @@ (define_insn "avx512f_movddu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546
--- Comment #3 from Hongtao.liu ---
Failed to match this instruction:
(set (reg:V16QI 95)
(eq:V16QI (gt:V16QI (subreg:V16QI (reg:V2DI 89 [ MEM[(const __m128i_u *
{ref-all})p_2(D)] ]) 0)
(mem/u/c:V16QI (symbol_ref/u:DI ("*.LC0") [
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546
--- Comment #4 from Hongtao.liu ---
> even. Notice the < vs <= there.
> I suspect the <= expansion part of the x86_64 backend needs to be fixed up
> to produce better code.
Hmm, we do have a extra pcmpeq to negate the result.
--cu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107546
--- Comment #7 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #3)
> Failed to match this instruction:
> (set (reg:V16QI 95)
> (eq:V16QI (gt:V16QI (subreg:V16QI (reg:V2DI 89 [ MEM[(const __m128i_u *
> {ref-all})p_2(D)] ]) 0)
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #6 from Hongtao.liu ---
Shufd only handles
void foo1(temp_vec_type& v) noexcept
{
v=__builtin_shufflevector(v,v,12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3);
}
Not the case in #c0.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #10 from Hongtao.liu ---
(In reply to cqwrteur from comment #9)
> (In reply to cqwrteur from comment #8)
> > for sse2 to do the __builtin_convertvector job yeah
>
> https://godbolt.org/z/dsf3WK58E
>
> using temp_vec_type [[__gnu__:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107540
--- Comment #4 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627
Bug ID: 107627
Summary: [13] Regression int128_t shift generates extra xor/or.
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220
--- Comment #4 from Hongtao.liu ---
Try to add combine splitter
(define_insn_and_split "*x86_64_shrd_lshiftrtti"
[(set (match_operand:DI 0 "nonimmediate_operand")
(subreg:DI (lshiftrt:TI (match_operand:TI 1 "nonimmediate_operand")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627
--- Comment #1 from Hongtao.liu ---
Looks like caused by r13-1379-ge8a46e5cdab500
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107671
--- Comment #3 from Hongtao.liu ---
We already have
--cut from i386.md
15204;; Help combine recognize bt followed by setc
15205(define_insn_and_split "*bt_setcqi"
15206 [(set (subreg:SWI48 (match_operand:QI 0 "register_operand")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748
--- Comment #2 from Hongtao.liu ---
float
_mm_cvtsbh_ss (__bf16 __A)
{
union{ float sf; __bf16 bf[2];} __tmp;
__tmp.sf = 0.0f;
__tmp.bf[1] = __A;
return __tmp.sf;
}
Looks like gcc can optimize it to
_mm_cvtsbh_ss(bool _Accum):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107774
Bug ID: 107774
Summary: rtl failed to simplify subreg:vec_select to just
vec_select.
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107775
Bug ID: 107775
Summary: misoptimization in vec_set lower part of vector in the
memory.
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748
--- Comment #9 from Hongtao.liu ---
Since BFmode is most like in xmm register, I'm going to use vector shift
instruction: pslld $16, %xmm0 for extendbfsf2, psrld %16, %xmm0 for truncsfbf2,
It doesn't require any GPR, and no need to use avx512bf1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107775
--- Comment #2 from Hongtao.liu ---
(In reply to Richard Biener from comment #1)
> misoptimization as in wrong-code or missed-optimization?
missed optimization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
Hongtao.liu changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #5 from Hongtao.liu ---
Also I get below from build_common_tree_nodes
/* Define `char', which is like either `signed char' or `unsigned char'
but not the same as either. */
char_type_node
= (signed_char
? make_s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #6 from Hongtao.liu ---
For pattern
(set (reg:QI 607)
(const_int 255 [0xff]))
general_operand return false for op const_int 255 QImode since
trunc_int_for_mode (INTVAL (op), mode) return -1, INVAL (op) is 255.
---cut from gener
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #7 from Hongtao.liu ---
> - if (width < HOST_BITS_PER_WIDE_INT)
> + if (width < HOST_BITS_PER_WIDE_INT
> + && (mode != QImode || !flag_signed_char))
typo should be
+ && (mode != QImode || flag_signed_char))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #8 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #7)
> > - if (width < HOST_BITS_PER_WIDE_INT)
> > + if (width < HOST_BITS_PER_WIDE_INT
> > + && (mode != QImode || !flag_signed_char))
> typo should be
> + &&
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748
--- Comment #11 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #23 from Hongtao.liu ---
> the blends do not look like no-ops so I wonder if this is really computing
> the same thing ... (it swaps lane 0 from the two loads from x but not the
> stores)
They're computing the same thing since we al
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #24 from Hongtao.liu ---
_233 = {f_im_36, f_re_35, f_re_35, f_re_35};
_217 = {f_re_35, f_im_36, f_im_36, f_im_36};
...
vect_x_re_55.15_227 = VEC_PERM_EXPR ;
vect_x_re_55.23_211 = VEC_PERM_EXPR ;
...
vect_y_re_69.17_224 = .FNMA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891
Bug ID: 107891
Summary: Redudant "double" permutation from PR97832
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #26 from Hongtao.liu ---
> I guess that's possible but the SLP vectorizer has a permute optimization
> phase (and SLP discovery itself), it would be nice to see why the former
> doesn't elide the permutes here.
I've opened PR107891
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891
--- Comment #1 from Hongtao.liu ---
commemt25 from PR97832
I guess that's possible but the SLP vectorizer has a permute optimization
phase (and SLP discovery itself), it would be nice to see why the former
doesn't elide the permutes here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #9 from Hongtao.liu ---
expand_expr_real_1 generates (const_int 255) without considering the target
mode.
I guess it's on purpose, so I'll leave that alone and only change the expander
in the backend. After applying convert_modes to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892
--- Comment #3 from Hongtao.liu ---
> In the bad version, I noticed that the RTL initially has two separate insns
> for 'a += *p': one to do the addition and write the result to a new pseudo
> register, and one to convert the value from mode V8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #10 from Hongtao.liu ---
I notice there's TARGET_PROMOTE_PROTOTYPES which can prevent unsigend char 255
be extended to int 255 which is a more perfect solution to this problem. But we
can only get fntype in this hook, ideally we shou
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107934
--- Comment #3 from Hongtao.liu ---
(In reply to Uroš Bizjak from comment #2)
> The type of extendbfsf2_1 insn should be sseishft1.
Yes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863
--- Comment #15 from Hongtao.liu ---
Fixed in GCC10.5, GCC11.4,GCC12.3 and GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107934
--- Comment #5 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107970
--- Comment #1 from Hongtao.liu ---
Mine, let me fix it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107970
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36821
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #8 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #16 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #15)
> Might be interesting to test it again to see if it has been fixed on the
> trunk.
The regression is still there.
gcc version 13.0.0 20230102 (experimental) (GC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105484
--- Comment #2 from Hongtao.liu ---
I'll take a look.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105484
--- Comment #3 from Hongtao.liu ---
Similar like PR104450, don't expand stmt to vec_set when there's EH on it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #1 from Hongtao.liu ---
Pass remove_partial_avx_dependency is before RA, which we have
(insn 128 127 129 22 (set (reg/v:DF 99 [ z ])
(float_extend:DF (reg/v:SF 117 [ x ]))) "test.c":43:10 163
{*extendsfdf2}
and attr avx_par
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #2 from Hongtao.liu ---
After set remove_partial_avx_dependency to true for register alternative, we
get
vxorps %xmm3, %xmm3, %xmm3
vmovsd .LC16(%rip), %xmm6
vmovsd .LC14(%rip), %xmm5
vcvtss2sd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Hongtao.liu changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #4 from Hongtao.liu ---
Another possible solution is add a little bit dislike for "m" alternative(like
?m) to avoid potential spill.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Bug 105073 depends on bug 105072, which changed state.
Bug 105072 Summary: Miss optimization for pmovzxbq.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354
--- Comment #5 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #2 from Hongtao.liu ---
Just note #c4 in pr105504 also solve this issue.
>Another possible solution is add a little bit dislike for "m" alternative(like
>?m) to avoid potential spill.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Bug 105073 depends on bug 104915, which changed state.
Bug 104915 Summary: Miss optimization for vec_setv8hi_0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583
--- Comment #6 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #4 from Hongtao.liu ---
For pattern supports 'm' alternative, mem_cost is frequency which is quite low
compared to pp->costs (ira_register_move_cost[mode][rclass][hard_reg_class]) *
frequency)
For x86 backend even gpr->gpr cost is 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105587
--- Comment #2 from Hongtao.liu ---
Mine, I'm testing a patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591
--- Comment #6 from Hongtao.liu ---
Maybe we should add an canonicalization in match.pd to make sure index is in
range of 0 - 2*N, and the general code need't to do check idx % 2*N.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591
--- Comment #10 from Hongtao.liu ---
Understand code like builtin_shuffle may have out-range index which needs to be
clamped, but why vec_perm_expr also needs to accept that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591
--- Comment #12 from Hongtao.liu ---
(In reply to Jakub Jelinek from comment #11)
> Because VEC_PERM_EXPR doesn't require the mask argument to be constant (and
> neither does __builtin_shuffle, unlike e.g. __builtin_shufflevector).
> If the mask
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105587
--- Comment #4 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #5 from Hongtao.liu ---
And for constraint like 'vm', it's different from 'v,m' in calculating mem_cost
which will impact RA when op is REG_P. For 'v,m' mem_cost is just 1 *
frequency, but for 'vm' mem_cost is much bigger(memory_move
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105591
--- Comment #14 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033
Hongtao.liu changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Bug 105073 depends on bug 105033, which changed state.
Bug 105033 Summary: Suboptimal for vec_concat lower halves of two vectors.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #7 from Hongtao.liu ---
Hmm, we have specific code to add scalar->vector(vmovq) cost to vector
construct, but it seems not to work here, guess it's because &r0,and thought it
was load not scalar?
r0.1_21 1 times scalar_store costs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #8 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #7)
> Hmm, we have specific code to add scalar->vector(vmovq) cost to vector
> construct, but it seems not to work here, guess it's because &r0,and thought
> it was load n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #9 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Hongtao.liu from comment #7)
> > Hmm, we have specific code to add scalar->vector(vmovq) cost to vector
> > construct, but it seems not to work here, gu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610
--- Comment #18 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104375
--- Comment #3 from Hongtao.liu ---
Fixed in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
Hongtao.liu changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Bug 105073 depends on bug 103462, which changed state.
Bug 103462 Summary: GCC failed to reduce bit clear in loop.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103462
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105650
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105650
--- Comment #4 from Hongtao.liu ---
>
> I think the problem for me is value mismatch in compare of `if (v != cnt)
> __builtin_trap();`. Invalid instruction is generated by `__builtin_trap()`.
Oh, it's ud2.
But still can't reproduce the error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #13 from Hongtao.liu ---
> so for the situation at hand I don't see any reasonable way out that
> doesn't have the chance of regressing things in other places (like
> treat loads from non-indexed auto variables specially or so). Th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105493
--- Comment #4 from Hongtao.liu ---
regarding 525, it's pr101929.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
Hongtao.liu changed:
What|Removed |Added
Resolution|FIXED |---
Status|RESOLVED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 101929, which changed state.
Bug 101929 Summary: [12 Regression] r12-7319 regress x264_r by 4% on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
--- Comment #12 from Hongtao.liu ---
> It's difficult (if not impossible) for the vectorizer to second-guess
> the followup FRE, we're a long way from doing loop + SLP vectorization
> in one go and discover we can elide the vector store.
I'm t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #6 from Hongtao.liu ---
Also notice a intersting case impacted by a separate m alternatvie.
typedef long v2di __attribute__((vector_size(16)));
v2di
foo (v2di a)
{
a[1] = 1113;
return a;
}
with -O2 gcc generates
foo(long __ve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513
--- Comment #8 from Hongtao.liu ---
(In reply to Alexander Monakov from comment #7)
> The second sequence is 3 uops vs 1/2 (issued/executed) uops in first, and on
> Haswell and Skylake it ties up port 5 for two cycles.
>
> Unclear if you're mic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668
--- Comment #4 from Hongtao.liu ---
Guess we need to extend backend hook to handle different input and output
modes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105735
Bug ID: 105735
Summary: GCC failed to reduce &= loop_inv in loop.
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583
--- Comment #7 from Hongtao.liu ---
i386 already has
12980(define_insn_and_split "*x86_shrd_2"
12981 [(set (match_operand:SI 0 "nonimmediate_operand")
12982(ior:SI (lshiftrt:SI (match_dup 0)
12983 (match_oper
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105754
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
--- Comment #45 from Hongtao.liu ---
A reduced testcase.
int a[256];
int b[256];
void foo (void)
{
int i;
for (i = 0; i < 256; ++i)
{
int tmp = a[i] + 12345;
tmp *= 914237;
tmp += 12332;
tmp *= 914237;
tmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
--- Comment #47 from Hongtao.liu ---
>
> The issue is that the re-association pass doesn't handle operations
> with undefined overflow behavior, we do have duplicate bugreports
> for this.
>
I saw below in match.pd
478/* Combine successive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89929
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #31
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105781
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #4
901 - 1000 of 1358 matches
Mail list logo