Re: vect_recog_mixed_size_cond_pattern dead code now

2023-08-30 Thread Richard Biener via Gcc
On Tue, 29 Aug 2023, Andrew Pinski wrote:

> Hi,
>   I was reading some code in tree-vect-patterns.cc and I came across
> vect_recog_mixed_size_cond_pattern . The code tries to handle
> comparisons from COND_EXPR but that cannot happen any more (after
> r13-707-g68e0063397ba82).
> Should this code be removed now?

tree-vect-patterns.cc still builds COND_EXPRs with embedded comparisons,
so this might be still used by means of pattern recognition of patterns.

Richard.


Re: vect_recog_mixed_size_cond_pattern dead code now

2023-08-30 Thread Andrew Pinski via Gcc
On Wed, Aug 30, 2023 at 12:26 AM Richard Biener  wrote:
>
> On Tue, 29 Aug 2023, Andrew Pinski wrote:
>
> > Hi,
> >   I was reading some code in tree-vect-patterns.cc and I came across
> > vect_recog_mixed_size_cond_pattern . The code tries to handle
> > comparisons from COND_EXPR but that cannot happen any more (after
> > r13-707-g68e0063397ba82).
> > Should this code be removed now?
>
> tree-vect-patterns.cc still builds COND_EXPRs with embedded comparisons,
> so this might be still used by means of pattern recognition of patterns.

Oh I see and I also seem to have missed the comment at towards the
beginning of the file too:
/* TODO:  Note the vectorizer still builds COND_EXPRs with GENERIC compares
   in the first operand.  Disentangling this is future work, the
   IL is properly transfered to VEC_COND_EXPRs with separate compares.  */

Thanks,
Andrew

>
> Richard.


Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread juzhe.zh...@rivai.ai
Hi, I start to enable "vect" testsuite for RISC-V.

I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c'
It failed at:
FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
"vect_recog_dot_prod_pattern: detected" 1
FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
"vect_recog_widen_mult_pattern: detected" 1

They are found "2" times.

Since at the first time, it failed at the vectorization of conversion:

wrapv-vect-reduc-dot-s8b.c:29:14: missed:   conversion not supported by target.
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand X[i_14], 
type of def: internal
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
vector([16,16]) signed char
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand X[i_14], 
type of def: internal
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
vector([16,16]) signed char
wrapv-vect-reduc-dot-s8b.c:30:17: missed:   not vectorized: relevant stmt not 
supported: _2 = (short int) _1;
wrapv-vect-reduc-dot-s8b.c:29:14: missed:  bad operation or unsupported loop 
bound.

Here loop vectorizer is trying to do the conversion from char -> short with 
both same nunits.
But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see the 
case that vec_unpack can optimize the codegen of autovectorizatio for RVV.

To fix it, is it necessary to support 'vec_unpack' ?

Thanks.


juzhe.zh...@rivai.ai


Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Richard Biener via Gcc
On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, I start to enable "vect" testsuite for RISC-V.
> 
> I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c'
> It failed at:
> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_dot_prod_pattern: detected" 1
> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_widen_mult_pattern: detected" 1
> 
> They are found "2" times.
> 
> Since at the first time, it failed at the vectorization of conversion:
> 
> wrapv-vect-reduc-dot-s8b.c:29:14: missed:   conversion not supported by 
> target.
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand 
> X[i_14], type of def: internal
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
> vector([16,16]) signed char
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand 
> X[i_14], type of def: internal
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
> vector([16,16]) signed char
> wrapv-vect-reduc-dot-s8b.c:30:17: missed:   not vectorized: relevant stmt not 
> supported: _2 = (short int) _1;
> wrapv-vect-reduc-dot-s8b.c:29:14: missed:  bad operation or unsupported loop 
> bound.
> 
> Here loop vectorizer is trying to do the conversion from char -> short with 
> both same nunits.
> But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see 
> the case that vec_unpack can optimize the codegen of autovectorizatio for RVV.
> 
> To fix it, is it necessary to support 'vec_unpack' ?

both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
ties its hands by choosing vector types early and based on the number
of incoming/outgoing vectors it chooses one or the other method.

More precise dumping would probably help here but somewhere earlier you
should be able to see the vector type used for _2

Richard.


Re: Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread juzhe.zh...@rivai.ai
Thanks Richi.

>> both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
Sorry I made a mistake here. They are not the same nunits.

wrapv-vect-reduc-dot-s8b.c:29:14: note:   get vectype for scalar type:  short 
int
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vectype: vector([8,8]) short int
wrapv-vect-reduc-dot-s8b.c:29:14: note:   nunits = [8,8]
wrapv-vect-reduc-dot-s8b.c:29:14: note:   ==> examining statement: _1 = X[i_14];
wrapv-vect-reduc-dot-s8b.c:29:14: note:   precomputed vectype: vector([16,16]) 
signed char
wrapv-vect-reduc-dot-s8b.c:29:14: note:   nunits = [16,16]
wrapv-vect-reduc-dot-s8b.c:29:14: note:   ==> examining statement: _2 = (short 
int) _1;
wrapv-vect-reduc-dot-s8b.c:29:14: note:   get vectype for scalar type: short int
wrapv-vect-reduc-dot-s8b.c:29:14: note:   vectype: vector([8,8]) short int
wrapv-vect-reduc-dot-s8b.c:29:14: note:   get vectype for smallest scalar type: 
signed char
wrapv-vect-reduc-dot-s8b.c:29:14: note:   nunits vectype: vector([16,16]) 
signed char
wrapv-vect-reduc-dot-s8b.c:29:14: note:   nunits = [16,16]

Turns out for _2, it picks vector([8,8]) short int and _1, it picks 
vector([16,16]) signed char
at the first time analysis.

It seems that because we don't support vec_unpacks so that the first time 
analysis failed ? 
Then we end up with "2" times these 2 checks:

> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_dot_prod_pattern: detected" 1
> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_widen_mult_pattern: detected" 1



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-30 15:45
To: juzhe.zh...@rivai.ai
CC: gcc; Robin Dapp
Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c
On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, I start to enable "vect" testsuite for RISC-V.
> 
> I have a question when analyzing the 'wrapv-vect-reduc-dot-s8b.c'
> It failed at:
> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_dot_prod_pattern: detected" 1
> FAIL: gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c scan-tree-dump-times vect 
> "vect_recog_widen_mult_pattern: detected" 1
> 
> They are found "2" times.
> 
> Since at the first time, it failed at the vectorization of conversion:
> 
> wrapv-vect-reduc-dot-s8b.c:29:14: missed:   conversion not supported by 
> target.
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand 
> X[i_14], type of def: internal
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
> vector([16,16]) signed char
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: operand 
> X[i_14], type of def: internal
> wrapv-vect-reduc-dot-s8b.c:29:14: note:   vect_is_simple_use: vectype 
> vector([16,16]) signed char
> wrapv-vect-reduc-dot-s8b.c:30:17: missed:   not vectorized: relevant stmt not 
> supported: _2 = (short int) _1;
> wrapv-vect-reduc-dot-s8b.c:29:14: missed:  bad operation or unsupported loop 
> bound.
> 
> Here loop vectorizer is trying to do the conversion from char -> short with 
> both same nunits.
> But we don't support 'vec_unpack' stuff in RISC-V backend since I don't see 
> the case that vec_unpack can optimize the codegen of autovectorizatio for RVV.
> 
> To fix it, is it necessary to support 'vec_unpack' ?
 
both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
ties its hands by choosing vector types early and based on the number
of incoming/outgoing vectors it chooses one or the other method.
 
More precise dumping would probably help here but somewhere earlier you
should be able to see the vector type used for _2
 
Richard.
 


Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
>> To fix it, is it necessary to support 'vec_unpack' ?
> 
> both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
> ties its hands by choosing vector types early and based on the number
> of incoming/outgoing vectors it chooses one or the other method.
> 
> More precise dumping would probably help here but somewhere earlier you
> should be able to see the vector type used for _2
We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and
then switch to VNx4QI (i.e. a mode that only determines the number of
units/elements) and have vectorize_related_mode return modes with the
same number of units.  This will then result in the sext/zext patterns
matching.  The first round where we try the normal mode will not match
those because the related mode has a different number of units.

So it's somewhat expected that the first try fails.

My dump shows that we vectorize, so IMHO no problem.  I can take a look
at this but it doesn't look like a case for pack/unpack.  

Regards
 Robin


Re: Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread juzhe.zh...@rivai.ai
I am wondering whether we do have some situations that 
vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be beneficial for RVV ?
I have ever met some situation that vec_unpack can be beneficial when working 
on SELECT_VL but I don't which case



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-30 16:06
To: Richard Biener; juzhe.zh...@rivai.ai
CC: rdapp.gcc; gcc
Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c
>> To fix it, is it necessary to support 'vec_unpack' ?
> 
> both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
> ties its hands by choosing vector types early and based on the number
> of incoming/outgoing vectors it chooses one or the other method.
> 
> More precise dumping would probably help here but somewhere earlier you
> should be able to see the vector type used for _2
We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and
then switch to VNx4QI (i.e. a mode that only determines the number of
units/elements) and have vectorize_related_mode return modes with the
same number of units.  This will then result in the sext/zext patterns
matching.  The first round where we try the normal mode will not match
those because the related mode has a different number of units.
 
So it's somewhat expected that the first try fails.
 
My dump shows that we vectorize, so IMHO no problem.  I can take a look
at this but it doesn't look like a case for pack/unpack.  
 
Regards
Robin
 


Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Richard Biener via Gcc
On Wed, 30 Aug 2023, Robin Dapp wrote:

> >> To fix it, is it necessary to support 'vec_unpack' ?
> > 
> > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
> > ties its hands by choosing vector types early and based on the number
> > of incoming/outgoing vectors it chooses one or the other method.
> > 
> > More precise dumping would probably help here but somewhere earlier you
> > should be able to see the vector type used for _2
> We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and
> then switch to VNx4QI (i.e. a mode that only determines the number of
> units/elements) and have vectorize_related_mode return modes with the
> same number of units.  This will then result in the sext/zext patterns
> matching.  The first round where we try the normal mode will not match
> those because the related mode has a different number of units.
> 
> So it's somewhat expected that the first try fails.
> 
> My dump shows that we vectorize, so IMHO no problem.  I can take a look
> at this but it doesn't look like a case for pack/unpack.

it's target dependent what we choose first so it's going to be
a bit difficult to adjust testcases like this (and it looks like
a testsuite issue).  I think for this specific testcase changing
scan-tree-dump-times to scan-tree-dump is reasonable.  Note we
really want to check that for the case we choose finally
we use the sdot pattern, but I don't see how we can easily constrain
the dump-scans.  Can we do sth like
"vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus
after the dot-prod pattern dumping allow arbitrary stuff but _not_
a "failed" and then require a "succeeded"?

The other way would be to somehow add a dump flag that produces
dumps only for the succeeded part.  Of course we have targets that
evaluate multiple succeeded parts for costing (but luckily costing
is disabled for most tests).

Richard.


Re: Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Richard Biener via Gcc
On Wed, 30 Aug 2023, juzhe.zh...@rivai.ai wrote:

> I am wondering whether we do have some situations that 
> vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be beneficial for RVV ?
> I have ever met some situation that vec_unpack can be beneficial when working 
> on SELECT_VL but I don't which case

With fixed size vectors you'll face the situation that the vectorizer
chooses the "wrong" vector type so yes, I think implementing
vec_unpack[s]_{lo,hi} might be useful.  But I wouldn't prioritize this
until you have a more clear picture of how useful it would be.

Richard.

> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-08-30 16:06
> To: Richard Biener; juzhe.zh...@rivai.ai
> CC: rdapp.gcc; gcc
> Subject: Re: Question about wrapv-vect-reduc-dot-s8b.c
> >> To fix it, is it necessary to support 'vec_unpack' ?
> > 
> > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer
> > ties its hands by choosing vector types early and based on the number
> > of incoming/outgoing vectors it chooses one or the other method.
> > 
> > More precise dumping would probably help here but somewhere earlier you
> > should be able to see the vector type used for _2
> We usually try with a "normal" mode like VNx4SI (RVVM1SI or so) and
> then switch to VNx4QI (i.e. a mode that only determines the number of
> units/elements) and have vectorize_related_mode return modes with the
> same number of units.  This will then result in the sext/zext patterns
> matching.  The first round where we try the normal mode will not match
> those because the related mode has a different number of units.
>  
> So it's somewhat expected that the first try fails.
>  
> My dump shows that we vectorize, so IMHO no problem.  I can take a look
> at this but it doesn't look like a case for pack/unpack.  
>  
> Regards
> Robin
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
> it's target dependent what we choose first so it's going to be
> a bit difficult to adjust testcases like this (and it looks like
> a testsuite issue).  I think for this specific testcase changing
> scan-tree-dump-times to scan-tree-dump is reasonable.  Note we
> really want to check that for the case we choose finally
> we use the sdot pattern, but I don't see how we can easily constrain
> the dump-scans.  Can we do sth like
> "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus
> after the dot-prod pattern dumping allow arbitrary stuff but _not_
> a "failed" and then require a "succeeded"?
> 
> The other way would be to somehow add a dump flag that produces
> dumps only for the succeeded part.  Of course we have targets that
> evaluate multiple succeeded parts for costing (but luckily costing
> is disabled for most tests).

I'm going to have a try at fixing the test expectations but not before
tomorrow.  Right now I can see
 "vect_recog_widen_mult_pattern: detected"
even four times in the dump, 2x for each try, so I first need to
understand what's going on.

Regards
 Robin


Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc


>> I am wondering whether we do have some situations that
>> vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be
>> beneficial for RVV ? I have ever met some situation that vec_unpack
>> can be beneficial when working on SELECT_VL but I don't which
>> case
> 
> With fixed size vectors you'll face the situation that the vectorizer
> chooses the "wrong" vector type so yes, I think implementing
> vec_unpack[s]_{lo,hi} might be useful.  But I wouldn't prioritize this
> until you have a more clear picture of how useful it would be.

Another thing that comes to mind is that we currently don't do
vectorizable calls with mismatched vector sizes.  So even if we detected
e.g. vec_widen_plus early it wouldn't get us much further.
On the other hand, I don't think we perform many optimizations on such
patterns between vect and combine (where we finally generate those).

Regards
 Robin



Analyzer failure due to missing header

2023-08-30 Thread FX Coudert via Gcc
Hi David,

I’m seeing the following failures on building GCC with Apple’s clang:

> /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3426:16: 
> error: unexpected type name 'byte_size_t': expected expression
>std::max (bytes.m_size_in_bytes,
> ^
> /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3426:12: 
> error: no member named 'max' in namespace 'std'; did you mean 'fmax'?
>std::max (bytes.m_size_in_bytes,
>~^~~
> fmax
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:420:9:
>  note: 'fmax' declared here
> using ::fmax _LIBCPP_USING_IF_EXISTS;
> ^
> /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3450:18: 
> error: expected '(' for function-style cast or type construction
>   = std::min ((TREE_STRING_LENGTH (string_cst)
>  ^
> /Users/fx/devel/gcc/gcc-repo-write/gcc/hwint.h:59:31: note: expanded from 
> macro 'HOST_WIDE_INT'
> #   define HOST_WIDE_INT long long
>   ^
> /Users/fx/devel/gcc/gcc-repo-write/gcc/analyzer/region-model.cc:3450:14: 
> error: no member named 'min' in namespace 'std'
>   = std::min ((TREE_STRING_LENGTH (string_cst)
> ~^

std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not 
available because  is not included. The following patch fixes the 
build for me:

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 4f31a6dcf0f..1ca8c8839bf 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
   #include "config.h"
 #define INCLUDE_MEMORY
+#define INCLUDE_ALGORITHM
 #include "system.h"
 #include "coretypes.h"
 #include "make-unique.h”


Could you check it is what you intended, and push it?

Thanks,
FX

Re: Analyzer failure due to missing header

2023-08-30 Thread Jakub Jelinek via Gcc
On Wed, Aug 30, 2023 at 11:10:57AM +0200, FX Coudert via Gcc wrote:
> std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not 
> available because  is not included. The following patch fixes the 
> build for me:
> 
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 4f31a6dcf0f..1ca8c8839bf 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
>#include "config.h"
>  #define INCLUDE_MEMORY
> +#define INCLUDE_ALGORITHM
>  #include "system.h"
>  #include "coretypes.h"
>  #include "make-unique.h”
> 
> 
> Could you check it is what you intended, and push it?

Note, when using libstdc++ headers,  isn't needed,
because  includes bits/stl_algobase.h which defines std::min/max.

Another possibility is of course use MIN/MAX macros instead of std::min/max.

Jakub



Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
> the dump-scans.  Can we do sth like
> "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus
> after the dot-prod pattern dumping allow arbitrary stuff but _not_
> a "failed" and then require a "succeeded"?

It took some fighting with tcl syntax until I arrived at the regex
pattern below but it looks like it is possible.

 "vect_recog_dot_prod_pattern: detected(?:(?!failed).)*succeeded".

This seems to work for the failing cases and I'm going to send a patch
tomorrow if an x86 testsuite run is unchanged.

Regards
 Robin


Re: Analyzer failure due to missing header

2023-08-30 Thread FX Coudert via Gcc
> std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573, are not 
> available because  is not included.

I originally thought this was only seen in cross-compilers, but it actually 
broke bootstrap on darwin.
Attached patch restores it, OK to commit?

FX



0001-Analyzer-include-algorithm-header.patch
Description: Binary data


Re: Analyzer failure due to missing header

2023-08-30 Thread David Malcolm via Gcc
On Wed, 2023-08-30 at 23:24 +0200, FX Coudert wrote:
> > std::max and std::min, introduced by d99d73c77d1e and 2bad0eeb5573,
> > are not available because  is not included.
> 
> I originally thought this was only seen in cross-compilers, but it
> actually broke bootstrap on darwin.
> Attached patch restores it, OK to commit?

LGTM

Thanks
Dave



Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-30 Thread Eric Feng via Gcc
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm  wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >23 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > |..
> > |   23 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).


Question about dynamic choosing vectorization factor for RVV

2023-08-30 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.

Currently, we are statically returning vectorization factor in 
'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
according to compile option.

For example:
void
foo (int32_t *__restrict a, int32_t *__restrict b, int n)
{
  for (int i = 0; i < n; i++)
a[i] = a[i] + b[i];
}

with --param=riscv-autovec-lmul = m1:

vsetvli a5,a2,e32,m1,ta,ma
vle32.v v2,0(a0)
vle32.v v1,0(a1)
vsetvli a6,zero,e32,m1,ta,ma
slli a3,a5,2
vadd.vv v1,v1,v2
sub a2,a2,a5
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a4)
add a0,a0,a3
add a1,a1,a3
add a4,a4,a3
bne a2,zero,.L3

The 'vadd.vv' is only performing operations on a single register.

with --param=riscv-autovec-lmul=m8:

  vsetvli a5,a2,e8,m2,ta,ma
  vle32.v v16,0(a0)
  vle32.v v8,0(a1)
  vsetvli a6,zero,e32,m8,ta,ma
  slli a3,a5,2
  vadd.vv v8,v8,v16
  vsetvli zero,a2,e32,m8,ta,ma
  sub a2,a2,a5
  vse32.v v8,0(a4)
  add a0,a0,a3
  add a1,a1,a3
  add a4,a4,a3
  bne a2,zero,.L3

The 'vadd.vv' here is performing operations on 8 consecutive registers:

vadd.vv [v8 - v15], [v8 - v15], [v16 - v23]

Users statically set the vectorization factor is not ideal.

We want GCC to dynamic choose vectorization factor to do the auto-vectorization 
according to loop analysis.

Currently, I have implement simplistic loop analysis like analyze live range of 
each local decl of current function.

Here is the analysis, we have 32 vector registers for RVV.
So we calculate the live range of current function local decl:

the number of decls live at the same time * LMUL <= 32. 
According to this analysis, I set the vectorization factor in 
TARGET_VECTORIZE_PREFERRED_SIMD_MODE

Then this simplistic algorithm (implemented in RISC-V backend) work well for 
the testcases I produces.

However, I can only choose optimal vectorization for whole function but failed 
to specific loop.

Here is the example:

void foo2 (int32_t *__restrict a,
  int32_t *__restrict b,
  int32_t *__restrict c,
  int32_t *__restrict a2,
  int32_t *__restrict b2,
  int32_t *__restrict c2,
  int32_t *__restrict a3,
  int32_t *__restrict b3,
  int32_t *__restrict c3,
  int32_t *__restrict a4,
  int32_t *__restrict b4,
  int32_t *__restrict c4,
  int32_t *__restrict a5,
  int32_t *__restrict b5,
  int32_t *__restrict c5,
  int n)
{
// Loop 1
for (int i = 0; i < n; i++)
   a[i] = a[i] + b[i];
// Loop 2
for (int i = 0; i < n; i++){
  a[i] = b[i] + c[i];
  a2[i] = b2[i] + c2[i];
  a3[i] = b3[i] + c3[i];
  a4[i] = b4[i] + c4[i];
  a5[i] = a[i] + a4[i];
  a[i] = a3[i] + a2[i]+ a5[i];
}
}

Loop 1 we can aggressively choose LMUL = 8, but Loop 2 should choose LMUL = 4 
(since LMUL = 8 will cause vector register spillings).

If I split loop 1 and loop 2 into 2 separate functions, my algorithm works well.

However, if we put these 2 loop in the same function, I finally pick LMUL = 4 
for both loop 1 and loop 2 since as I said above, I do the analysis base on 
function not base
on the loop.

I am struggling whether we could have a good idea for such issue. Can we pass 
through loop_vec_info
to 'preferred_simd_mode' target hook?

Thanks.


juzhe.zh...@rivai.ai