https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439
--- Comment #5 from JuzheZhong ---
(In reply to Andrew Waterman from comment #4)
> Yes.
Thanks. The GCC codegen is correct here. Am I right ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439
--- Comment #3 from JuzheZhong ---
(In reply to Andrew Waterman from comment #2)
> > You are saying when vd and vs2 is overlaping in vnsrl, we can't allow
> > undisturbed policy ? CC RISC-V folks ing.
>
> No. The instruction is allowed, and i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117947
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120362
--- Comment #10 from JuzheZhong ---
(In reply to Robin Dapp from comment #9)
> > No. vlre should not depend on vtype. It should be hardware bug.
>
> Are you sure about that? vmv1r also doesn't depend on a specific vtype,
> each one is OK, but
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120362
--- Comment #8 from JuzheZhong ---
(In reply to Robin Dapp from comment #6)
> (In reply to Kito Cheng from comment #5)
> > Oh, vsetvli/vill issue should only appeared for whole reg move not whole reg
> > load store
>
> On the Banana Pi I get a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
--- Comment #9 from JuzheZhong ---
(In reply to Andrew Waterman from comment #8)
> > In fact, I'd be rather surprised to see anything preferring tail
> > undisturbed.
>
> Right. To be precise, microarchitectures without register renaming
> a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
--- Comment #6 from JuzheZhong ---
(In reply to Jeffrey A. Law from comment #5)
> This doesn't seem like an ABI issue (WRT c#2), it's just question of what
> uarchs prefer from a performance standpoint.
>
> With that in mind I'd tend to think t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
--- Comment #2 from JuzheZhong ---
I have thought about this long time ago while I am working on supporting RVV on
upstream GCC.
https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/37
I suggested we should have -mprefer-agnosti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #3 from JuzheZhong ---
(In reply to Robin Dapp from comment #2)
> I think depending on the performance of strided loads/stores this can be
> profitable to vectorize. Looks like we need loop versioning to account for
> the possible a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #1 from JuzheZhong ---
https://godbolt.org/z/q1E6dn6T9
Try -fno-vect-cost-model, it can be vectorized.
I think both Clang and GCC (with no cost vect model) vectorized code can't give
better performance in a wide-issue OOO superscal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #8 from JuzheZhong ---
(In reply to Robin Dapp from comment #7)
> > The problem is GCC-15 has performance regression compare to GCC-14 on both
> > strict align and we should fix it, we can't specify use no strict align in
> > GCC-15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #6 from JuzheZhong ---
(In reply to Robin Dapp from comment #5)
> According to Li Pan's results this is "just" vector strict align again?
> We should be vectorizing the first loop, in particular after the
> SLP-grouping changes.
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #2 from JuzheZhong ---
(In reply to Vineet Gupta from comment #1)
> How exactly are you building it ?
-march=rv64gcv_zvl512b -mabi=lp64d -mrvv-vector-bits=zvl -mrvv-max-lmul=m2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
Bug ID: 118019
Summary: RISC-V: Performance regression in hottest function of
X264
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Pr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974
--- Comment #7 from JuzheZhong ---
(In reply to Vineet Gupta from comment #4)
> (In reply to JuzheZhong from comment #2)
> > We need to split all insns since some of them are not the ultimate RVV
> > instruction pattern that depend on VL/VTYPE.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974
--- Comment #3 from JuzheZhong ---
I can optimize it if I find the time. (Currently, I am busy with other
stuff).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117974
--- Comment #2 from JuzheZhong ---
We need to split all insns since some of them are not the ultimate RVV
instruction pattern that depend on VL/VTYPE.
And I don't think the vsetvli should be keep close VLE, instead, They are
redundant, I think
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117804
Bug ID: 117804
Summary: RISC-V: Worse codegen in mc_chroma of x264
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769
JuzheZhong changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 117769, which changed state.
Bug 117769 Summary: RISC-V: Worse codegen in x264_pixel_satd_8x4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769
--- Comment #2 from JuzheZhong ---
Ok. I see it is not an issue now.
When we enable -mno-vect-strict-align:
https://godbolt.org/z/MzqzPTcc6
We have same codegen as ARM SVE now:
x264_pixel_satd_8x4(unsigned char*, int, unsigned char*, int):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117769
Bug ID: 117769
Summary: RISC-V: Worse codegen in x264_pixel_satd_8x4
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #4 from JuzheZhong ---
(In reply to Robin Dapp from comment #3)
> First, pixel_sad_4x4 is not very hot, 8x8 and 16x16 are.
>
> Second, we are vectorizing this, but with -mno-vector-strict-align.
>
> IMHO we don't need to synthesize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #1 from JuzheZhong ---
OK. I see we are lacking ssadd/usad pattern (SAD_EXPR):
Compute the sum of absolute differences of two signed/unsigned elements.
Operand 1 and operand 2 are of the same mode. Their absolute difference, which
i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
Bug ID: 117722
Summary: RISC-V: Failed to vectorize x264_pixel_sad_4x4
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116578
Bug 116578 depends on bug 116691, which changed state.
Bug 116691 Summary: RISC-V: Unexpected auto-vectorization codegen in simple
vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691
JuzheZhong changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573
--- Comment #10 from JuzheZhong ---
(In reply to Richard Biener from comment #9)
> So with the patch I see tons of "regressions"
> (https://github.com/ewlu/gcc-precommit-ci/issues/2248#issuecomment-
> 2355417578) like for example for
> gcc.targe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573
--- Comment #4 from JuzheZhong ---
(In reply to Richard Biener from comment #3)
> So when investigating "future" fallout I've seen similar differences for
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c for example with the
> GIMPLE diffe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116691
Bug ID: 116691
Summary: RISC-V: Unexpected auto-vectorization codegen in
simple vectorization
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116685
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115819, which changed state.
Bug 115819 Summary: RISC-V: Failed to hoist vrsub.vx to the header of the loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
What|Removed |Added
--
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
JuzheZhong changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115759
--- Comment #1 from JuzheZhong ---
Do you mean you want to see the codegen look like LLVM:
https://godbolt.org/z/b7W88WTGo ?
I personally think GCC has better codegen than LLVM for your case in general
since LLVM is using strided store wheras
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795
--- Comment #7 from JuzheZhong ---
(In reply to Jordi Sala from comment #6)
> Perfect, that's what I was looking for. I'm thinking of adding a way to tell
> GCC to minimize, maximize or preserve SEW on vsetvl expand. Like
> -mrvv-vsetvl-sew={max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795
--- Comment #5 from JuzheZhong ---
(In reply to Jordi Sala from comment #4)
> problem is this is not related to the vectorizer as far as I'm aware, so
> setting -mrvv-max-lmul=m8 does not change the fact that vsetvl pass is going
> to change the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
--- Comment #4 from JuzheZhong ---
(In reply to Andrew Pinski from comment #1)
> This might be a cost issue.
No. I don't it's cost issue.
It's because we suppress the hoist by incorrect POLY INT handling codes.
I have a patch to fix it:
https
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
Bug ID: 115819
Summary: RISC-V: Failed to hoist vrsub.vx to the header of the
loop
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Pr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
JuzheZhong changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #10 from JuzheZhong ---
(In reply to Robin Dapp from comment #9)
> We already merge with operand[0], just the TU is missing as far as I can
> tell.
>
> I'm seeing the following output with my patch:
>
> vsetivlizero,8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #8 from JuzheZhong ---
I think we should include operands[0] as the "merge/maskoff" operand which we
need to depend on and use TU for vec_set pattern
Take ARM for example:
(define_expand "vec_set"
[(match_operand:VALL_F16 0 "regi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #1 from JuzheZhong ---
It seems that we should use TU instead of TA?
Robin ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474
JuzheZhong changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115093
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
--- Comment #1 from JuzheZhong ---
I wonder whether RIVOS CI already found which commit cause this regression ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
Bug ID: 115104
Summary: RISC-V: GCC-14 can combine vsext+vadd -> vwadd but
Trunk GCC (GCC 15) Failed
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115068
Bug ID: 115068
Summary: RISC-V: Illegal instruction of vfwadd
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988
--- Comment #2 from JuzheZhong ---
Li Pan is going to work on it.
Hi, kito and Jeff.
Can this fix backport to GCC-14 ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988
--- Comment #1 from JuzheZhong ---
Ideally, it should be reported as (-march=rv64gc):
https://godbolt.org/z/3P76YEb9s
: In function 'test_vfwsub_wf_f32mf2':
:4:15: error: return type 'vfloat32mf2_t' requires the V ISA extension
4 | vfloat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988
Bug ID: 114988
Summary: RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887
--- Comment #2 from JuzheZhong ---
I think there is a too conservative analysis here:
note: _1: type = float, start = 1, end = 6
note: _5: type = float, start = 6, end = 8
note: _3: type = float, start = 3, end = 7
note: _4: type = floa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114887
--- Comment #1 from JuzheZhong ---
The "vect" cost model analysis:
https://godbolt.org/z/qbqzon8x1
note: Maximum lmul = 8, At most 40 number of live V_REG at program point 6
for bb 3
It seems that we count one more variable in program point
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639
--- Comment #18 from JuzheZhong ---
(In reply to Li Pan from comment #17)
> According to the V abi, looks like the asm code tries to save/restore the
> callee-saved registers when there is a call in function body.
>
> | Name| ABI Mnemonic |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639
--- Comment #16 from JuzheZhong ---
This issue is not fully fixed since the fixed patch only fixes ICE but there is
a regression in codegen:
https://godbolt.org/z/4nvxeqb6K
Terrible codege:
test(__rvv_uint64m4_t):
addisp,sp,-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114749
--- Comment #4 from JuzheZhong ---
Hi, Patrick.
It seems that Richard didn't append the testcase in the patch.
Could you send a patch to add the testcase for RISC-V port ?
Thangks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639
--- Comment #6 from JuzheZhong ---
Definitely it is a regression:
https://compiler-explorer.com/z/e68x5sT9h
GCC 13.2 is ok, but GCC 14 ICE.
I think you should bisect first.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476
--- Comment #7 from JuzheZhong ---
Hi, Robin.
Will you fix this bug ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114506
JuzheZhong changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #19 from JuzheZhong ---
I think it's better to add pr114396.c into vect testsuite instead of x86 target
test since it's the bug not only happens on x86.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281
--- Comment #28 from JuzheZhong ---
The original cost model I did work for all cases but with some middle-end
changes
the cost model failed.
I don't have time to figure out what's going on here.
Robin may be interested at it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109
--- Comment #3 from JuzheZhong ---
(In reply to Robin Dapp from comment #2)
> It is vectorized with a higher zvl, e.g. zvl512b, refer
> https://godbolt.org/z/vbfjYn5Kd.
OK. I see. But Clang generates many slide instruction which are expensive i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109
--- Comment #1 from JuzheZhong ---
It seems RISC-V Clang didn't vectorize it ?
https://godbolt.org/z/G4han6vM3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113913
--- Comment #2 from JuzheZhong ---
It's the known issue we are trying to fix it in GCC-15.
My colleague Lehua is taking care of it.
CCing Lehua.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #16 from JuzheZhong ---
The FMA is generated in widening_mul PASS:
Before widening_mul (fab1):
_5 = 3.33314829616256247390992939472198486328125e-1 - _4;
_6 = _5 * 1.22998223643160599749535322189331054687
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #15 from JuzheZhong ---
(In reply to rguent...@suse.de from comment #14)
> On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
> >
> > --- Comment #13 from JuzheZhong --
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #13 from JuzheZhong ---
Ok. I found the optimized tree:
_5 = 3.33314829616256247390992939472198486328125e-1 - _4;
_8 = .FMA (_5, 1.229982236431605997495353221893310546875e-1, _4);
Let CST0 = 3.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #12 from JuzheZhong ---
Ok. I found it even without vectorization:
GCC is worse than Clang:
https://godbolt.org/z/addr54Gc6
GCC (14 instructions inside the loop):
fld fa3,0(a0)
fld fa5,8(a0)
fld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #11 from JuzheZhong ---
Hi, I think this RVV compiler codegen is that optimal codegen we want for RVV:
https://repo.hca.bsc.es/epic/z/P6QXCc
.LBB0_5:# %vector.body
sub a4, t0, a3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134
--- Comment #22 from JuzheZhong ---
I have done this following experiment.
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bf017137260..8c36cc63d3b 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113608
--- Comment #2 from JuzheZhong ---
vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) {
int vl = __riscv_vsetvlmax_e8m1();
vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl);
return __riscv_vadd_vv_u16m2(a, c, vl);
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134
--- Comment #21 from JuzheZhong ---
Hi, Richard. I looked into ivcanon.
I found that:
/* If the loop has more than one exit, try checking all of them
for # of iterations determinable through scev. */
if (!exit)
ni
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #11 from JuzheZhong ---
Hi, Tamar.
We are interested in supporting saturating and rounding.
We may need to support scalar first.
Do you have any suggestions ?
Or you are already working on it?
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #10 from JuzheZhong ---
Hi, Tamar.
We are interested in supporting saturating and rounding.
We may need to support scalar first.
Do you have any suggestions ?
Or you are already working on it?
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #9 from JuzheZhong ---
Ok. After investigation of LLVM:
Before loop vectorizer:
%cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize)
%conv13 = trunc i32 %cond12 to i16
After loop vectorizer:
%10 = call <16 x i3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #8 from JuzheZhong ---
Missing saturate vectorization causes RVV Clang 20% performance better than RVV
GCC during recent benchmark evaluation.
In coremark pro zip-test, I believe other targets should be the same.
I wonder how we sho
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695
--- Comment #1 from JuzheZhong ---
Since both operand are input operand, early clobber "&" constraint can not
help.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113695
Bug ID: 113695
Summary: RISC-V: Sources with different EEW must use different
registers
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134
--- Comment #19 from JuzheZhong ---
The loop is:
bb 3 -> bb 4 -> bb 5
| |__⬆
|__⬆
The condition in bb 3 is if (i_21 == 1001).
The condition in bb 4 is if (N_13(D) > i_18).
Look into lsplit:
This loop doesn't satisfy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #18 from JuzheZhong ---
(In reply to rguent...@suse.de from comment #17)
> On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
> >
> > --- Comment #16 from JuzheZhong ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #16 from JuzheZhong ---
(In reply to rguent...@suse.de from comment #15)
> On Wed, 31 Jan 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
> >
> > --- Comment #14 from JuzheZhong ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #14 from JuzheZhong ---
Thanks Richard.
It seems that we can't fix this issue for now. Is that right ?
If I understand correctly, do you mean we should wait after SLP representations
are finished and then revisit this PR?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #12 from JuzheZhong ---
OK. It seems it has data dependency issue:
missed: not vectorized, possible dependence between data-refs a[i_15] and
a[_4]
a[i_15] = _3; STMT 1
_4 = i_15 + 2;
_5 = a[_4];STMT 2
STMT2 should not depend
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #11 from JuzheZhong ---
It seems that we should fix this case (Richard gave) first which I think it's
not the SCEV or value-numbering issue:
double a[1024];
void foo ()
{
for (int i = 0; i < 1022; i += 2)
{
double tem = a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #10 from JuzheZhong ---
I think the root cause is we think i_16 and _1 are alias due to scalar
evolution:
(get_scalar_evolution
(scalar = i_16)
(scalar_evolution = {0, +, 2}_1))
(get_scalar_evolution
(scalar = _1)
(scalar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
JuzheZhong changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #20 from JuzheZhong ---
(In reply to Robin Dapp from comment #19)
> What seems odd to me is that in fre5 we simplify
>
> _429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0,
> ... });
> vect_prephitmp_129.5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #8 from JuzheZhong ---
Hi, Richard.
Now, I find the time to GCC vectorization optimization.
I find this case:
_2 = a[_1];
...
a[i_16] = _4;
,,,
_7 = a[_1];---> This load should be eliminated and re-use _2.
Am I right
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113166
--- Comment #3 from JuzheZhong ---
#include
#include
template
inline vuint8m1_t tail_load(void const* data);
template<>
inline vuint8m1_t tail_load(void const* data) {
uint64_t const* ptr64 = reinterpret_cast(data);
#if 1
const vuin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113666
Bug ID: 113666
Summary: RISC-V: Cost model test regression due to recent
middle-end loop vectorizer changes
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Seve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #15 from JuzheZhong ---
Hi, Robin.
I tried to disable vec_extract, then the case passed.
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..b61b886ef3d 100644
--- a/gcc/config/riscv/autovec.md
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #13 from JuzheZhong ---
Ok. I found a regression between rvv-next and trunk.
I believe it is GCC-12 vs GCC-14:
rvv-next:
...
.L11:
li t1,31
mv a2,a1
bleua7,t1,.L12
bne a6,zero,.L13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #11 from JuzheZhong ---
(In reply to Robin Dapp from comment #10)
> The compile farm machine I'm using doesn't have SVE.
> Compiling with -march=armv8-a -O3 pr113607.c -fno-vect-cost-model and
> running it returns 0 (i.e. ok).
>
> p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #9 from JuzheZhong ---
Hi, Robin.
Could you try this case on latest ARM SVE ?
with -march=armv8-a+sve -O3 -fno-vect-cost-model.
I want to make sure first it is not an middle-end bug.
The RVV vectorized IR is same as ARM SVE.
Tha
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #8 from JuzheZhong ---
Ok. I can reproduce it too.
I am gonna work on fixing it.
Thanks.
1 - 100 of 599 matches
Mail list logo