Re: vect_recog_mixed_size_cond_pattern dead code now
On Tue, 29 Aug 2023, Andrew Pinski wrote: > Hi, > I was reading some code in tree-vect-patterns.cc and I came across > vect_recog_mixed_size_cond_pattern . The code tries to handle > comparisons from COND_EXPR but that cannot happen any more (after > r13-707-g68e0063397ba82). > Should this code be removed now? tree-vect-patterns.cc still builds COND_EXPRs with embedded comparisons, so this might be still used by means of pattern recognition of patterns. Richard.
Re: vect_recog_mixed_size_cond_pattern dead code now
On Wed, Aug 30, 2023 at 12:26 AM Richard Biener wrote: > > On Tue, 29 Aug 2023, Andrew Pinski wrote: > > > Hi, > > I was reading some code in tree-vect-patterns.cc and I came across > > vect_recog_mixed_size_cond_pattern . The code tries to handle > > comparisons from COND_EXPR but that cannot happen any more (after > > r13-707-g68e0063397ba82). > > Should this code be removed now? > > tree-vect-patterns.cc still builds COND_EXPRs with embedded comparisons, > so this might be still used by means of pattern recognition of patterns. Oh I see and I also seem to have missed the comment at towards the beginning of the file too: /* TODO: Note the vectorizer still builds COND_EXPRs with GENERIC compares in the first operand. Disentangling this is future work, the IL is properly transfered to VEC_COND_EXPRs with separate compares. */ Thanks, Andrew > > Richard.
Question about wrapv-vect-reduc-dot-s8b.c
Hi, I start to enable "vect" testsuite for RISC-V. I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c' It failed at: FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1 FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" 1 They are found "2" times. Since at the first time, it failed at the vectorization of conversion: wrapv-vect-reduc-dot-s8b.c:29:14: missed: conversion not supported by target. wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand X[i_14], type of def: internal wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype vector([16,16]) signed char wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand X[i_14], type of def: internal wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype vector([16,16]) signed char wrapv-vect-reduc-dot-s8b.c:30:17: missed: not vectorized: relevant stmt not supported: _2 = (short int) _1; wrapv-vect-reduc-dot-s8b.c:29:14: missed: bad operation or unsupported loop bound. Here loop vectorizer is trying to do the conversion from char -> short with both same nunits. But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see the case that vec_unpack can optimize the codegen of autovectorizatio for RVV. To fix it, is it necessary to support 'vec_unpack' ? Thanks. juzhe.zh...@rivai.ai
Re: Question about wrapv-vect-reduc-dot-s8b.c
On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi, I start to enable "vect" testsuite for RISC-V. > > I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c' > It failed at: > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_dot_prod_pattern: detected" 1 > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_widen_mult_pattern: detected" 1 > > They are found "2" times. > > Since at the first time, it failed at the vectorization of conversion: > > wrapv-vect-reduc-dot-s8b.c:29:14: missed: conversion not supported by > target. > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand > X[i_14], type of def: internal > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype > vector([16,16]) signed char > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand > X[i_14], type of def: internal > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype > vector([16,16]) signed char > wrapv-vect-reduc-dot-s8b.c:30:17: missed: not vectorized: relevant stmt not > supported: _2 = (short int) _1; > wrapv-vect-reduc-dot-s8b.c:29:14: missed: bad operation or unsupported loop > bound. > > Here loop vectorizer is trying to do the conversion from char -> short with > both same nunits. > But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see > the case that vec_unpack can optimize the codegen of autovectorizatio for RVV. > > To fix it, is it necessary to support 'vec_unpack' ? both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer ties its hands by choosing vector types early and based on the number of incoming/outgoing vectors it chooses one or the other method. More precise dumping would probably help here but somewhere earlier you should be able to see the vector type used for _2 Richard.
Re: Re: Question about wrapv-vect-reduc-dot-s8b.c
Thanks Richi. >> both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer Sorry I made a mistake here. They are not the same nunits. wrapv-vect-reduc-dot-s8b.c:29:14: note: get vectype for scalar type: short int wrapv-vect-reduc-dot-s8b.c:29:14: note: vectype: vector([8,8]) short int wrapv-vect-reduc-dot-s8b.c:29:14: note: nunits = [8,8] wrapv-vect-reduc-dot-s8b.c:29:14: note: ==> examining statement: _1 = X[i_14]; wrapv-vect-reduc-dot-s8b.c:29:14: note: precomputed vectype: vector([16,16]) signed char wrapv-vect-reduc-dot-s8b.c:29:14: note: nunits = [16,16] wrapv-vect-reduc-dot-s8b.c:29:14: note: ==> examining statement: _2 = (short int) _1; wrapv-vect-reduc-dot-s8b.c:29:14: note: get vectype for scalar type: short int wrapv-vect-reduc-dot-s8b.c:29:14: note: vectype: vector([8,8]) short int wrapv-vect-reduc-dot-s8b.c:29:14: note: get vectype for smallest scalar type: signed char wrapv-vect-reduc-dot-s8b.c:29:14: note: nunits vectype: vector([16,16]) signed char wrapv-vect-reduc-dot-s8b.c:29:14: note: nunits = [16,16] Turns out for _2, it picks vector([8,8]) short int and _1, it picks vector([16,16]) signed char at the first time analysis. It seems that because we don't support vec_unpacks so that the first time analysis failed ? Then we end up with "2" times these 2 checks: > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_dot_prod_pattern: detected" 1 > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_widen_mult_pattern: detected" 1 juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-08-30 15:45 To: juzhe.zh...@rivai.ai CC: gcc; Robin Dapp Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote: > Hi, I start to enable "vect" testsuite for RISC-V. > > I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c' > It failed at: > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_dot_prod_pattern: detected" 1 > FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect > "vect_recog_widen_mult_pattern: detected" 1 > > They are found "2" times. > > Since at the first time, it failed at the vectorization of conversion: > > wrapv-vect-reduc-dot-s8b.c:29:14: missed: conversion not supported by > target. > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand > X[i_14], type of def: internal > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype > vector([16,16]) signed char > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: operand > X[i_14], type of def: internal > wrapv-vect-reduc-dot-s8b.c:29:14: note: vect_is_simple_use: vectype > vector([16,16]) signed char > wrapv-vect-reduc-dot-s8b.c:30:17: missed: not vectorized: relevant stmt not > supported: _2 = (short int) _1; > wrapv-vect-reduc-dot-s8b.c:29:14: missed: bad operation or unsupported loop > bound. > > Here loop vectorizer is trying to do the conversion from char -> short with > both same nunits. > But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see > the case that vec_unpack can optimize the codegen of autovectorizatio for RVV. > > To fix it, is it necessary to support 'vec_unpack' ? both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer ties its hands by choosing vector types early and based on the number of incoming/outgoing vectors it chooses one or the other method. More precise dumping would probably help here but somewhere earlier you should be able to see the vector type used for _2 Richard.
Re: Question about wrapv-vect-reduc-dot-s8b.c
>> To fix it, is it necessary to support 'vec_unpack' ? > > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer > ties its hands by choosing vector types early and based on the number > of incoming/outgoing vectors it chooses one or the other method. > > More precise dumping would probably help here but somewhere earlier you > should be able to see the vector type used for _2 We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and then switch to VNx4QI (i.e. a mode that only determines the number of units/elements) and have vectorize_related_mode return modes with the same number of units. This will then result in the sext/zext patterns matching. The first round where we try the normal mode will not match those because the related mode has a different number of units. So it's somewhat expected that the first try fails. My dump shows that we vectorize, so IMHO no problem. I can take a look at this but it doesn't look like a case for pack/unpack. Regards Robin
Re: Re: Question about wrapv-vect-reduc-dot-s8b.c
I am wondering whether we do have some situations that vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be beneficial for RVV ? I have ever met some situation that vec_unpack can be beneficial when working on SELECT_VL but I don't which case juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-08-30 16:06 To: Richard Biener; juzhe.zh...@rivai.ai CC: rdapp.gcc; gcc Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c >> To fix it, is it necessary to support 'vec_unpack' ? > > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer > ties its hands by choosing vector types early and based on the number > of incoming/outgoing vectors it chooses one or the other method. > > More precise dumping would probably help here but somewhere earlier you > should be able to see the vector type used for _2 We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and then switch to VNx4QI (i.e. a mode that only determines the number of units/elements) and have vectorize_related_mode return modes with the same number of units. This will then result in the sext/zext patterns matching. The first round where we try the normal mode will not match those because the related mode has a different number of units. So it's somewhat expected that the first try fails. My dump shows that we vectorize, so IMHO no problem. I can take a look at this but it doesn't look like a case for pack/unpack. Regards Robin
Re: Question about wrapv-vect-reduc-dot-s8b.c
On Wed, 30 Aug 2023, Robin Dapp wrote: > >> To fix it, is it necessary to support 'vec_unpack' ? > > > > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer > > ties its hands by choosing vector types early and based on the number > > of incoming/outgoing vectors it chooses one or the other method. > > > > More precise dumping would probably help here but somewhere earlier you > > should be able to see the vector type used for _2 > We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and > then switch to VNx4QI (i.e. a mode that only determines the number of > units/elements) and have vectorize_related_mode return modes with the > same number of units. This will then result in the sext/zext patterns > matching. The first round where we try the normal mode will not match > those because the related mode has a different number of units. > > So it's somewhat expected that the first try fails. > > My dump shows that we vectorize, so IMHO no problem. I can take a look > at this but it doesn't look like a case for pack/unpack. it's target dependent what we choose first so it's going to be a bit difficult to adjust testcases like this (and it looks like a testsuite issue). I think for this specific testcase changing scan-tree-dump-times to scan-tree-dump is reasonable. Note we really want to check that for the case we choose finally we use the sdot pattern, but I don't see how we can easily constrain the dump-scans. Can we do sth like "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus after the dot-prod pattern dumping allow arbitrary stuff but _not_ a "failed" and then require a "succeeded"? The other way would be to somehow add a dump flag that produces dumps only for the succeeded part. Of course we have targets that evaluate multiple succeeded parts for costing (but luckily costing is disabled for most tests). Richard.
Re: Re: Question about wrapv-vect-reduc-dot-s8b.c
On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote: > I am wondering whether we do have some situations that > vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be beneficial for RVV ? > I have ever met some situation that vec_unpack can be beneficial when working > on SELECT_VL but I don't which case With fixed size vectors you'll face the situation that the vectorizer chooses the "wrong" vector type so yes, I think implementing vec_unpack[s]_{lo,hi} might be useful. But I wouldn't prioritize this until you have a more clear picture of how useful it would be. Richard. > > > juzhe.zh...@rivai.ai > > From: Robin Dapp > Date: 2023-08-30 16:06 > To: Richard Biener; juzhe.zh...@rivai.ai > CC: rdapp.gcc; gcc > Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c > >> To fix it, is it necessary to support 'vec_unpack' ? > > > > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer > > ties its hands by choosing vector types early and based on the number > > of incoming/outgoing vectors it chooses one or the other method. > > > > More precise dumping would probably help here but somewhere earlier you > > should be able to see the vector type used for _2 > We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and > then switch to VNx4QI (i.e. a mode that only determines the number of > units/elements) and have vectorize_related_mode return modes with the > same number of units. This will then result in the sext/zext patterns > matching. The first round where we try the normal mode will not match > those because the related mode has a different number of units. > > So it's somewhat expected that the first try fails. > > My dump shows that we vectorize, so IMHO no problem. I can take a look > at this but it doesn't look like a case for pack/unpack. > > Regards > Robin > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: Question about wrapv-vect-reduc-dot-s8b.c
> it's target dependent what we choose first so it's going to be > a bit difficult to adjust testcases like this (and it looks like > a testsuite issue). I think for this specific testcase changing > scan-tree-dump-times to scan-tree-dump is reasonable. Note we > really want to check that for the case we choose finally > we use the sdot pattern, but I don't see how we can easily constrain > the dump-scans. Can we do sth like > "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus > after the dot-prod pattern dumping allow arbitrary stuff but _not_ > a "failed" and then require a "succeeded"? > > The other way would be to somehow add a dump flag that produces > dumps only for the succeeded part. Of course we have targets that > evaluate multiple succeeded parts for costing (but luckily costing > is disabled for most tests). I'm going to have a try at fixing the test expectations but not before tomorrow. Right now I can see "vect_recog_widen_mult_pattern: detected" even four times in the dump, 2x for each try, so I first need to understand what's going on. Regards Robin
Re: Question about wrapv-vect-reduc-dot-s8b.c
>> I am wondering whether we do have some situations that >> vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be >> beneficial for RVV ? I have ever met some situation that vec_unpack >> can be beneficial when working on SELECT_VL but I don't which >> case > > With fixed size vectors you'll face the situation that the vectorizer > chooses the "wrong" vector type so yes, I think implementing > vec_unpack[s]_{lo,hi} might be useful. But I wouldn't prioritize this > until you have a more clear picture of how useful it would be. Another thing that comes to mind is that we currently don't do vectorizable calls with mismatched vector sizes. So even if we detected e.g. vec_widen_plus early it wouldn't get us much further. On the other hand, I don't think we perform many optimizations on such patterns between vect and combine (where we finally generate those). Regards Robin
Analyzer failure due to missing header
Hi David, I’m seeing the following failures on building GCC with Apple’s clang: > /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3426:16: > error: unexpected type name 'byte_size_t': expected expression >std::max (bytes.m_size_in_bytes, > ^ > /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3426:12: > error: no member named 'max' in namespace 'std'; did you mean 'fmax'? >std::max (bytes.m_size_in_bytes, >~^~~ > fmax > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:420:9: > note: 'fmax' declared here > using ::fmax _LIBCPP_USING_IF_EXISTS; > ^ > /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3450:18: > error: expected '(' for function-style cast or type construction > = std::min ((TREE_STRING_LENGTH (string_cst) > ^ > /Users/fx/devel/gcc/gcc-repo-write/gcc/hwint.h:59:31: note: expanded from > macro 'HOST_WIDE_INT' > # define HOST_WIDE_INT long long > ^ > /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3450:14: > error: no member named 'min' in namespace 'std' > = std::min ((TREE_STRING_LENGTH (string_cst) > ~^ std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not available because is not included. The following patch fixes the build for me: diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 4f31a6dcf0f..1ca8c8839bf 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #define INCLUDE_MEMORY +#define INCLUDE_ALGORITHM #include "system.h" #include "coretypes.h" #include "make-unique.h” Could you check it is what you intended, and push it? Thanks, FX
Re: Analyzer failure due to missing header
On Wed, Aug 30, 2023 at 11:10:57AM +0200, FX Coudert via Gcc wrote: > std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not > available because is not included. The following patch fixes the > build for me: > > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc > index 4f31a6dcf0f..1ca8c8839bf 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see >#include "config.h" > #define INCLUDE_MEMORY > +#define INCLUDE_ALGORITHM > #include "system.h" > #include "coretypes.h" > #include "make-unique.h” > > > Could you check it is what you intended, and push it? Note, when using libstdc++ headers, isn't needed, because includes bits/stl_algobase.h which defines std::min/max. Another possibility is of course use MIN/MAX macros instead of std::min/max. Jakub
Re: Question about wrapv-vect-reduc-dot-s8b.c
> the dump-scans. Can we do sth like > "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus > after the dot-prod pattern dumping allow arbitrary stuff but _not_ > a "failed" and then require a "succeeded"? It took some fighting with tcl syntax until I arrived at the regex pattern below but it looks like it is possible. "vect_recog_dot_prod_pattern: detected(?:(?!failed).)*succeeded". This seems to work for the failing cases and I'm going to send a patch tomorrow if an x86 testsuite run is unchanged. Regards Robin
Re: Analyzer failure due to missing header
> std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not > available because is not included. I originally thought this was only seen in cross-compilers, but it actually broke bootstrap on darwin. Attached patch restores it, OK to commit? FX 0001-Analyzer-include-algorithm-header.patch Description: Binary data
Re: Analyzer failure due to missing header
On Wed, 2023-08-30 at 23:24 +0200, FX Coudert wrote: > > std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, > > are not available because is not included. > > I originally thought this was only seen in cross-compilers, but it > actually broke bootstrap on darwin. > Attached patch restores it, OK to commit? LGTM Thanks Dave
Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm wrote: > > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote: > > Additionally, by using the old model and the pointer per your > > suggestion, > > we are able to find the representative tree and emit a more accurate > > diagnostic! > > > > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ > > but ob_refcnt field is: ‘2’ > >23 | return list; > > | ^~~~ > > ‘create_py_object’: events 1-4 > > | > > |4 | PyObject* item = PyLong_FromLong(3); > > | |^~ > > | || > > | |(1) when ‘PyLong_FromLong’ succeeds > > |5 | PyObject* list = PyList_New(1); > > | |~ > > | || > > | |(2) when ‘PyList_New’ succeeds > > |.. > > | 14 | PyList_Append(list, item); > > | | ~ > > | | | > > | | (3) when ‘PyList_Append’ succeeds, moving buffer > > |.. > > | 23 | return list; > > | | > > | | | > > | | (4) here > > | > > Excellent, that's a big improvement. > > > > > If a representative tree is not found, I decided we should just bail > > out > > of emitting a diagnostic for now, to avoid confusing the user on what > > the problem is. > > Fair enough. > > > > > I've attached the patch for this (on top of the previous one) below. > > If > > it also looks good, I can merge it with the last patch and push it in > > at > > the same time. > > I don't mind either way, but please can you update the tests so that we > have some automated test coverage that the correct name is being > printed in the warning. > > Thanks > Dave > Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to preserve our chain on the list: --- Thanks; pushed to trunk with nits fixed: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4. Incidentally, I updated my formatting settings in VSCode, which I've previously mentioned in passing. In case anyone is interested: "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always, TabWidth: 8, IndentWidth: 2, BinPackParameters: false, AlignAfterOpenBracket: Align, AllowAllParametersOfDeclarationOnNextLine: true }", This fixes some issues with the indent width and also ensures function parameters of appropriate length are aligned properly and on a new line each (like the rest of the analyzer code).
Question about dynamic choosing vectorization factor for RVV
Hi, Richard and Richi. Currently, we are statically returning vectorization factor in 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE' according to compile option. For example: void foo (int32_t *__restrict a, int32_t *__restrict b, int n) { for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; } with --param=riscv-autovec-lmul = m1: vsetvli a5,a2,e32,m1,ta,ma vle32.v v2,0(a0) vle32.v v1,0(a1) vsetvli a6,zero,e32,m1,ta,ma slli a3,a5,2 vadd.vv v1,v1,v2 sub a2,a2,a5 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a4) add a0,a0,a3 add a1,a1,a3 add a4,a4,a3 bne a2,zero,.L3 The 'vadd.vv' is only performing operations on a single register. with --param=riscv-autovec-lmul=m8: vsetvli a5,a2,e8,m2,ta,ma vle32.v v16,0(a0) vle32.v v8,0(a1) vsetvli a6,zero,e32,m8,ta,ma slli a3,a5,2 vadd.vv v8,v8,v16 vsetvli zero,a2,e32,m8,ta,ma sub a2,a2,a5 vse32.v v8,0(a4) add a0,a0,a3 add a1,a1,a3 add a4,a4,a3 bne a2,zero,.L3 The 'vadd.vv' here is performing operations on 8 consecutive registers: vadd.vv [v8 - v15], [v8 - v15], [v16 - v23] Users statically set the vectorization factor is not ideal. We want GCC to dynamic choose vectorization factor to do the auto-vectorization according to loop analysis. Currently, I have implement simplistic loop analysis like analyze live range of each local decl of current function. Here is the analysis, we have 32 vector registers for RVV. So we calculate the live range of current function local decl: the number of decls live at the same time * LMUL <= 32. According to this analysis, I set the vectorization factor in TARGET_VECTORIZE_PREFERRED_SIMD_MODE Then this simplistic algorithm (implemented in RISC-V backend) work well for the testcases I produces. However, I can only choose optimal vectorization for whole function but failed to specific loop. Here is the example: void foo2 (int32_t *__restrict a, int32_t *__restrict b, int32_t *__restrict c, int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2, int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3, int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4, int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5, int n) { // Loop 1 for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; // Loop 2 for (int i = 0; i < n; i++){ a[i] = b[i] + c[i]; a2[i] = b2[i] + c2[i]; a3[i] = b3[i] + c3[i]; a4[i] = b4[i] + c4[i]; a5[i] = a[i] + a4[i]; a[i] = a3[i] + a2[i]+ a5[i]; } } Loop 1 we can aggressively choose LMUL = 8, but Loop 2 should choose LMUL = 4 (since LMUL = 8 will cause vector register spillings). If I split loop 1 and loop 2 into 2 separate functions, my algorithm works well. However, if we put these 2 loop in the same function, I finally pick LMUL = 4 for both loop 1 and loop 2 since as I said above, I do the analysis base on function not base on the loop. I am struggling whether we could have a good idea for such issue. Can we pass through loop_vec_info to 'preferred_simd_mode' target hook? Thanks. juzhe.zh...@rivai.ai