- if (len >= 3
+ if (!reassoc_insert_powi_p
+ && len >= 3
&& (!has_fma
/* width > 1 means ranking ops results in better
parallelism. Check curre
Yeah, VMAT_STRIDED_SLP is what VMAT_ELEMENTWISE was to non-SLP,
though how we emit the contiguous part of the SLP group depends and it could
be elementwise as fallback.
For the single-element case (and only for that one AFAICT) we can switch to
VMAT_GATHER_SCATTER. Is the idea to relax that an
-;; Vector crypto, assumed to be a generic operation for now.
-(define_insn_reservation "vec_crypto" 4
+;; Vector population count
+(define_insn_reservation "vec_pop" 4
(and (eq_attr "tune" "generic_ooo,generic")
- (eq_attr "type" "crypto,vclz,vctz,vcpop"))
+ (eq_attr "type" "vcpop"
Hi Edwin,
sorry for the slow reply.
Currently this patch only supports clipping and returning an unsigned
narrow type. I'm unsure if this is the best way to approach the problem
as there is a similar optab .SAT_TRUNC which performs a similar
operation. The main difference between .NARROW_CLIP a
That would definitely be nice to have for both gather and stride loads
I'm not sure I like the direction that's heading ;)
So the loop I'm targeting is x264's satd:
for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
{
a0 = (pix1[0] - pix2[0])...
a1 = (pix1[1] -
I'm not sure whether handling this case as part of VMAT_STRIDED_SLP is
wise. IIRC we do already choose VMAT_GATHER_SCATTER for some
strided loads, so why not do strided load/store handling as part of
gather/scatter handling?
Now that we can deal with gather/scatter misalignment I think we can c
I've folded the "vectorizer status" into the beginning of the BoF,
so "only" two slots from my side.
Do you still need/want some status from the riscv vector side for the BoF?
If so, what would that entail? Rather a look back on what has been done or an
outlook of what we're looking at?
--
R
OK, so actually generating code with that vector(1) is bad (slower
than using scalar code)? Was that the same for PR121048?
The general situation is similar but IIRC we had a real vector mode there.
There the code didn't look terrible apart from using very small vectors
(2 elements). Here I gu
I was a bit concerned about the stmt_vec_info -> slp_tree hash map at
first, but I realized that it’s just a temporary hack, so LGTM :)
Thanks, going to commit in a while. Of course you know:
"There is nothing more permanent than a temporary solution." :)
--
Regards
Robin
So what was prevailing_mode then?
RVVM2SI, so != word_mode, and basically two glued 32-bit vector regs.
We get that from the first call with innermode SI.
But
/* Fall back to using mode_for_vector, mostly in the hope of being
able to use an integer mode. */
if (known_eq
It's probably changes from the RVV cost modeling behavior, the patches
are of course not supposed to change code generation.
Looks like a proper corner case...
From what I can see the difference to before is that we now always call
get_related_vectype_for_scalar_type
with VOIDmode while we u
It's probably changes from the RVV cost modeling behavior, the patches
are of course not supposed to change code generation.
These tests don't use dynamic LMUL (which needs a special flag and is not
generally active) so it would be odd if they were affected by the costing
changes. In particul
I do see regressions for zve32x-3.c et al. Those might be related to the
recently fixed tests regarding partial vectorization with vector(1) types
but I haven't checked further for now.
The regressions are "scan failures". One loop is not loop vectorized any more
but SLP vectorized and the f
Stefan kindly ran a regtest on s390 which looked OK as well.
I re-tested everything one more time and will commit soon. The patches were
bootstrapped individually on x86 (and built on riscv) so I hope it's safe to
not squash them.
Thanks for the guidance on that patch/series.
--
Regards
Rob
Note that your pr121073 test fails :-) So you'll need to adjust something
there. OK with pr121073.c fixed...
Pushed with -mabi=lp64d. There's nothing I have forgotten more often...
--
Regards
Robin
't checked further for now.
These tests don't use the LMUL heuristic so the failures can't be due to it.
I'll see if I can have a look tomorrow.
Regards
Robin
commit ac4c46ee66380fc81b4f4dc0138956e1f2c519c7
Author: Robin Dapp
Date: Wed Jul 23 15:31:38 2025 +0200
slp type map
Hmm, we only have one STMT_VINFO_VECTYPE in need_additional_vector_vars_p.
I think we can just save the mode/vectype we need during add_stmt_cost and get
it later, similar to STMT_VINFO_TYPE.
Testing a patch.
--
Regards
Robin
Generally, nobody is really happy with it :) It has been limping along
for a while and not been used a lot at all.
I also see it does compute post-dominators and scrap them for each
costing done! For larger functions with many loops that's going
to be slow (it's O(function-size)). I think
Note if-conversion emits IFN_MASK_LOAD/STORE, only the vectorizer later
emits the LEN variants. So this is about whether there are (might) be
uarchs that have vector aligned loads (aka target alignment is sizeof(vector))
and in addition to that have support for misaligned loads but those with sti
So this is the only part I think is odd - there is a dataref, it just
has only DR_REF as relevant data. I would have expected we can
adjust vect_supportable_dr_alignment to deal with the scatter/gather
case. I'm OK with doing it how you did it here, but seeing
the
/* For now assume all condit
The more I look at our heuristic the more it appears due for a rewrite.
But that's really not in my plans right now. I just sent a riscv patch
that does the necessary preparations so you can basically
s/STMT_VINFO_TYPE (stmt_info)/SLP_TREE_TYPE (node)/
once it lands.
I regtested with your patc
Hi,
This patch prepares the dynamic LMUL vector costing to use the coming
SLP_TREE_TYPE instead of the (to-be-removed) STMT_VINFO_TYPE.
Even though the whole approach should be reviewed and adjusted at some
point, the patch chooses the path of least resistance and uses a hash
map for the stmt_in
Hi,
This patch fixes the vf_vfmacc-run-1-f16.c test failures on rv32
by adding zvfh requirements as well as options to the test and
the target harness.
Regtested on rv64gcv_zvl512b and rv32gcv_zvl512b. Going to commit
as obvious if the CI agrees that it's obvious ;)
Regards
Robin
gcc/testsuit
Hi,
During the last weeks it became clear that our current broadcast
handling needs an overhaul in order to improve maintainability.
PR121073 showed that my intermediate fix wasn't enough and caused
regressions.
This patch now goes a first step towards untangling broadcast
(vmv.v.x), "set first"
There is currently no way to mimic this, the original idea would have been
that you record the per stmt info during add_stmt cost hook time and then
process that data at finish_cost time.
With SLP you could in theory walk the SLP graph via the instances vector of
the vinfo. But I’m not sure w
This patch would like to introduce the combine of vec_dup + vaaddu.vv
into vaaddu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
The series is OK, th
Can the risc-v people try to sort out this up to a point
where I can just s/STMT_VINFO_TYPE/SLP_TREE_TYPE there?
I think for us this mainly (only?) concerns the dynamic LMUL heuristic.
Currently we go through all vectorized instructions of the loop's blocks,
lookup the stmt_vec_info and then get
The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.
Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_
Hi,
r16-2175-g5aa21765236730 introduced an assert for floating-point modes
when expanding an RDIV_EXPR but forgot fixed-point modes. This patch
adds ALL_FIXED_POINT_MODE_P to the assert.
Bootstrap and regtest running on x86, aarch64, and power10. Regtested
on rv64gcv. Regtest on arm running,
Hi,
In PR120297 we fuse
vsetvl e8,mf2,...
vsetvl e64,m1,...
into
vsetvl e64,m4,...
Individually, that's ok but we also change the new vsetvl's demand to
"SEW only" even though the first original one demanded SEW >= 8 and
ratio = 16.
As we forget the ratio after the merge we find that the vse
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus-mult RTL
instruction.
Before this patch, we have three instructions, e.g.:
fcvt.s.h fa5,fa5
vfmv.v.f v24,fa5
vfmadd.vv v8,v24,v1
For the record, the Linaro CI notified me that this caused regressions:
Produces 2 regressions:
|
| regressions.sum:
| Running gcc:gcc.dg/dg.exp ...
| FAIL: gcc.dg/pr103248.c (internal compiler error: in optab_for_tree_code, at
optabs-tree.cc:85)
| FAIL: gcc.dg/pr103248.c (test for excess e
This patch adds simple misalignment checks for gather/scatter
operations. Previously, we assumed that those perform element accesses
internally so alignment does not matter. The riscv vector spec however
explicitly states that vector operations are allowed to fault on
element-misaligned accesses.
This patch adds access helpers for the gather/scatter offset and scale
parameters.
gcc/ChangeLog:
* internal-fn.cc (expand_scatter_store_optab_fn): Use new
function.
(expand_gather_load_optab_fn): Ditto.
(internal_fn_offset_index): Ditto.
(internal_fn_scale
This fixes a thinko in the misalignment check. If we want to check for
vector misalignment support we need to load 16-byte elements, not
8-byte elements that will never be misaligned.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Fix misalignment check.
---
gcc/testsuite/lib/targe
This patch adds an is_gather_scatter argument to the
support_vector_misalignment hook. All targets but riscv do not care
about alignment for gather/scatter so return true for is_gather_scatter.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_builtin_support_vector_misalignment):
This encapsulates the IFN and the builtin-function way of handling
gather/scatter via three defines:
GATHER_SCATTER_IFN_P
GATHER_SCATTER_LEGACY_P
GATHER_SCATTER_EMULATED_P
and introduces a helper define for SLP operand handling as well.
gcc/ChangeLog:
* tree-vect-slp.cc (GATHER_SC
an alias pointer. I deferred that for now, though.
The whole series was regtested and bootstrapped on x86, aarch64, and
power10 and I built the patches individually on x86 as well as riscv.
It was also regtested on rv64gcv_zvl512b.
Robin Dapp (5):
ifn: Add helper functions for gather/scatter.
vec
Hi,
this patch adds asserts that ensure we only expand an RDIV_EXPR with
actual float mode. It also replaces the RDIV_EXPR in setting a
vectorized loop's length by EXACT_DIV_EXPR. The code in question is
only used with length-control targets (riscv, powerpc, s390).
Bootstrapped and regtested o
Hi,
Changes from v1:
- Use Himode broadcast instead of float broadcast, saving two conversion
insns.
Let's be daring and leave the thorough testing to the CI first while my own
testing is in progress :)
This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "us
Oh, I guess I didn't expand enough about my thought:
I don't care that we have bad performance/bad code gen here if Zvfh is
mandatory for RVA23 since that means not many people and core will
fall into this code gen path.
But RVA23 will go to this code gen patch, which means we will go this
path fo
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 6753b01db59..866aaf1e8a0 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1580,8 +1580,27 @@ (define_insn_and_split "*vec_duplicate"
"&& 1"
[(const_int 0)]
{
-riscv_vector::emit_vlma
Hi,
This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "use_zero_stride_load". Right now we have quite a few
paths that reach a strided load and some of them are not exactly
straightforward.
While broadcast is relatively rare on rv64 targets it is more common on
The original pattern was not exercised by any pre-existing test. I tried but
failed to come up with a testcase that would expand to
float_extend ∘ vec_duplicate
rather than
vec_duplicate ∘ float_extend.
Ok, so we indeed don't have a test and the intrinsics tests unfortunately are
no help
Hi Paul-Antoine,
+;; Intermediate pattern for vfwmacc.vf and vfwmsac.vf used by combine
+(define_insn_and_split "*extend_vf_"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+(vec_duplicate:VWEXTF
+ (float_extend:
+(match_operand: 1 "register_operand"]
+ "TARGET_VECTOR"
This patch would like to introduce the combine of vec_dup + vssub.vv
into vssub.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Jeff has already pre-a
Hi,
in emit_vlmax_insn_lra we use a vsetivli for an immediate AVL.
XTHeadVector does not support this, so guard appropriately.
Regtested on rv64gcv_zvl512b.
Regards
Robin
PR target/120461
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_insn_lra): Do not emit
vset
Hi,
if a user passes a string that doesn't represent a variable we still try
to compute a hash for its type. Its tree does not represent a type but
just an exceptional, though. This patch just ignores it, leaving the
error to the checking code later.
Regtested on rv64gcv_zvl512b.
Regards
Rob
This generally looks OK to me (including the tests).
+ HOST_WIDE_INT max = ((uint64_t)1 << bitsize) - 1;
Wouldn't a uint64_t type for max be clearer? I guess the worst that can happen
is compiling on a 32-bit host for a 64-bit target and get bitsize == 32 here.
Do we even support this? If
This patch would like to introduce the combine of vec_dup + vsadd.vv
into vsadd.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
OK.
--
Regards
Robin
CI-testing was failed:
https://github.com/ewlu/gcc-precommit-ci/issues/3585#issuecomment-3022157670
for sat_u_add-5-u32.c and vect-reduc-sad-1.c. These failures are compile issues
appeared due to afdo-crossmodule-1b.c file. For some reason, in both cases
the following snippets are being inserted i
I'm not sure? I'd prefer some refactoring to make this more obvious
(and the split between the two functions doesn't help ...).
If you're sure it's all covered then ignore this comment, I can do
the refactoring as followup. It just wasn't obvious to me.
Ah, I think I misread your original com
The else (get_group_load_store_type) can end up returning
VMAT_GATHER_SCATTER and thus require the above checking as well.
Isn't this already covered by
if (*memory_access_type == VMAT_ELEMENTWISE
|| (*memory_access_type == VMAT_GATHER_SCATTER
&& GATHER_SCATTER_LEGACY_P (*gs_inf
It corrects the shift type of interleaved stepped patterns for const vector
expanding in LRA. The shift instruction was initially LSHIFTRT, and it seems
still should be the same type for both LRA and other cases.
This is OK, thanks.
--
Regards
Robin
This is failing pre-commit testing:
linux rv64gcv lp64d medlow multilib:
FAIL: gcc.target/riscv/rvv/base/bug-4.c (internal compiler error: in
extract_insn, at recog.cc:2882)
FAIL: gcc.target/riscv/rvv/base/bug-4.c (test for excess errors)
linux rv32gcv ilp32d medlow multilib:
FAIL: gcc.target
Is there anyway we can retrigger the test somewhere ? If no I can send a v3
series with the commit reordered and see.
I don't think there's a way other than re-submitting. But if you're sure you
tested properly and the CI is mistaken we can go ahead. I just wanted to make
sure as with the s
OK.
Hmm, I'm still seeing test failures in the CI. Could you check if those are
valid?
--
Regards
Robin
Hi,
Changes from v1:
- Add gather_scatter argument to support_vector_misalignment.
- Don't rely on DR_BASE_ALIGNMENT.
- Add IFN helpers and use them.
- Add gather/scatter helper macros.
- Clarify is_packed handling in docs.
This patch adds simple misalignment checks for gather/scatter
operations
This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
OK.
--
Regards
Robi
Maybe we can pass a scalar mode to the hook when we ask for
SCATTER/GATHER? That might need fixups in other targets of course,
but it would make it clear what we're asking for?
How about an additional argument bool gather_scatter to make it more explicit?
Then we could just
if (gather_scatt
Hi Pan,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
index 2932e189186..0af8b969f47 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/auto
Hi Kito,
This patch adds a comment to the riscv.md file to clarify the purpose of
the file and reorders the include files for better organization.
this seems to have broken the build. I believe that's due to
-(include "vector.md")
(include "vector-crypto.md")
because vector crypto depend
I guess I missed it when I first ran the testsuite before sending the patch
for review. I rebased and re-ran the testsuite after getting approved and saw
the regression. But at that point I realised Jeff had already merged it.
Anyway, I'll regtest more carefully next time!
The CI helps with th
This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar
multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult
operands were wrongly swapped.
This patch ensures that operands are not swapped in the vector-scalar case.
This looks reasonable, so OK for the
+ bool is_misaligned = scalar_align < inner_vectype_sz;
+ bool is_packed = scalar_align > 1 && is_misaligned;
+
+ *misalignment = !is_misaligned ? 0 : inner_vectype_sz -
scalar_align;
+
+ if (targetm.vectorize.support_vector_misalignment
+ (TYPE_MODE (vectype), inner_
This change reminds me that we lack documentation about arguments
of most of the "complicated" internal functions ...
I didn't mention it but I got implicitly reminded several times while writing
the patch... ;) An overhaul has been on my todo list for a while but of course
it never was top pr
Hi,
this patch adds simple misalignment checks for gather/scatter
operations. Previously, we assumed that those perform element accesses
internally so alignment does not matter. The riscv vector spec however
explicitly states that vector operations are allowed to fault on
element-misaligned acc
Hi Ma Jin,
thanks for looking into this, it has been on my todo list with very low
priority since the vsetvl rewrite.
+ /* Handle case with no predecessors (including ENTRY block). */
+ if (EDGE_COUNT (b->preds) == 0)
{
- e = EDGE_PRED (b, ix);
- bitmap_copy (dst, src[e->src
This LGTM for the trunk.
--
Regards
Robin
This patch would like to introduce the combine of vec_dup + vsaddu.vv
into vsaddu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
OK, thanks.
--
Rega
OK, thanks.
--
Regards
Robin
Hi Pan,
+(define_special_predicate "vectorization_factor_operand"
+ (match_code "const_int,const_poly_int"))
+
Does immediate_operand () work instead of a new predicate?
--
Regards
Robin
@@ -78,6 +79,7 @@ RISCV_CORE("sifive-e31", "rv32imac", "sifive-3-series")
RISCV_CORE("sifive-e34", "rv32imafc", "sifive-3-series")
RISCV_CORE("sifive-e76", "rv32imafc", "sifive-7-series")
+RISCV_CORE("generic", "rv64gc","generic")
^^^ Drop this and add -mtune=ge
The case 0 for vx combine def functions are most the same across
the different test files. Thus, re-arrange them in one place to
avoid code duplication.
OK.
--
Regards
Robin
OK.
--
Regards
Robin
This is OK for the trunk.
--
Regards
Robin
Note it’s far from obvious to me whether for stride and gather loads the
alignment of the elements loaded falls under the scalar or vector load
restriction. Is this explicitly spelled out for risc-v or is that your
interpretation?
We have the following in the vector spec:
If an element acces
At first I thought if we only cared about element misalignment checking the
first element/pointer should be sufficient. But riscv's gathers as well as
strided loads allow byte offsets rather than element-sized offsets so there
could be 16-bit loads with a stride of e.g. 1 byte.
Wait, no that
At least on aarch64, the gathers and scatters use (mem:BLK (scratch:P)),
i.e. a wildcard memory access. There's no good way in RTL to represent
multiplie distinct locations in a single reference.
(unspec on its own doesn't imply a memory access)
At first I thought if we only cared about elemen
I think the spotted correctness issues wrt alignment/aliasing should be
addressed up-front. In the end the gather/stride-load is probably an
UNSPEC, so there's no MEM RTX with wrong info? How would we
query the target on whether it can handle the alignment here? Usually
we go through vect_suppo
Yes. Note I don't see we guarantee element alignment for gather/scatter
either, nor do the IFNs seem to have encoding space for alignment. The
effective type for TBAA seems also missing there ...
Regarding vector_vector_composition_type I had a try and attached a preliminary
V3. I'm not reall
In case the riscv strided vector load instruction has additional requirements
on the loaded (scalar) element alignment then we'd have to implement this.
For the moment the vectorizer will really emit scalar loads here, so that's
fine (though eventually inefficient). For the strided vector load th
But that would not pass the alignment check either, no? In fact, I assume
that for strided loads we have a scalar type as component (ptype), so we
always get supported unaligned accesses here?
Perhaps I'm missing something, though.
What I was missing is that we're using the same element size
But that would not pass the alignment check either, no? In fact, I assume
that for strided loads we have a scalar type as component (ptype), so we
always get supported unaligned accesses here?
I was thinking of the case where we have e.g. a group of 4 int8s and use a
strided load with int32 el
So I do wonder how this interacts with vector_vector_composition_type,
in fact the difference is that for strided_load we know the composition
happens as part of a load, so how about instead extending
this function, pass it VLS_LOAD/STORE and also consider
strided_loads as composition kind there?
This patch would like to introduce the combine of vec_dup + vdiv.vv into
vdiv.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
The series is OK, thanks.
This series is OK now, thanks.
--
Regards
Robin
1. riscv64-linux-gcc -march=rv64gc -march=foo-cpu -mtune=foo-cpu
2. riscv64-linux-gcc -march=rv64gc -march=foo-cpu
3. riscv64-linux-gcc -march=rv64gc -march=unset -mtune=unset -mcpu=foo-cpu
Preference to me:
- Prefer option 1.
- Less prefer option 3. (acceptable but I don't like)
- Strongly disli
I don't quite follow this part. IIUC the rules before this patch were
-march=ISA: Generate code that requires the given ISA, without
changing the tuning model.
-mcpu=CPU: Generate code for the given CPU, targeting all the
extensions that CPU supports and using the best known tu
This rule clearly applies to directly related options like -ffoo and
-fno-foo, but it’s less obvious for unrelated pairs like -ffoo and
-fbar especially when there is traditionally strong specifics.
In many cases, the principle of "the most specific option wins"
governs the behavior.
Here
I stumped across this change from
https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/88
and I want to express my strong disagreement with this change.
Perhaps I'm accustomed to Arm's behavior, but I believe using -march= to
target a specific CPU isn't ideal.
* -march=X: (exe
Inspired by the avg_ceil patches, notice there were even more
lines too long from autovec.md. So fix that format issues.
OK.
--
Regards
Robin
Hi Paul-Antoine,
overall the patch looks reasonable to me now, provided the fr2vr followup.
BTW it's the late-combine pass that performs the optimization, not the combine
pass. You might still want to fix this in the commit message.
Please CC patchworks...@rivosinc.com for the next version
Looks like the CI cannot tell patch series? There are 3 patches and the CI
will run for each one.
Of course, the first one will have scan failure due to expanding change, but
the second one reconciles them.
Finally the third one will have all test passed as below, I think it
indicates all test
Similar to the avg_floor, the avg_ceil has the rounding mode
towards +inf, while the vaadd.vv has the rnu which totally match
the sematics. From RVV spec, the fixed vaadd.vv with rnu,
The CI shows some scan failures in vls/avg-[456].c and widen/vec-avg-rv32gcv.c.
Also, the lint check complains
This patch fixes the typo in the test case `param-autovec-mode.c` in the
RISC-V autovec testsuite.
The option `autovec-mode` is changed to `riscv-autovec-mode` to match the
expected parameter name.
OK of course :)
--
Regards
Robin
This patch would like to introduce the combine of vec_dup + vmul.vv into
vmul.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
OK.
--
Regards
Robin
LGTM, thanks.
--
Regards
Robin
The first patch makes SLP paths unreachable and the second one removes those
entirely. The third patch does the actual strided-load work.
Bootstrapped and regtested on x86 and aarch64.
Regtested on rv64gcv_zvl512b.
Robin Dapp (3):
vect: Make non-SLP paths unreachable in strided slp
From: Robin Dapp
This patch enables strided loads for VMAT_STRIDED_SLP. Instead of
building vectors from scalars or other vectors we can use strided loads
directly when applicable.
The current implementation limits strided loads to cases where we can
load entire groups and not subsets of them
1 - 100 of 1341 matches
Mail list logo