Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

Richard Sandiford via Gcc-patches Mon, 22 Aug 2022 01:31:53 -0700

钟居哲 <juzhe.zh...@rivai.ai> writes:
> After deep analysis and several tries. I have made a analysis and conclusion 
> as follows.
> I really appreciate if you could spend some time take a look at the following 
> analysis that I made:
>
> According to the codes in test_vector_subregs_fore_back:
>
> ~~~
>  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>   unsigned int min_nunits = constant_lower_bound (nunits);
>   scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>   unsigned int count = gcd (min_nunits, 4);
>
>   rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
> ~~~~
>
> Analyze the codes and tried several patterns:
>
> For poly_uint16 (2,2):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
>
> For poly_uint16 (4,4):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>         ]
>     ])
>
> So, I think I can conclude the pattern rule as follows:
>
> poly_uint16 (n,n):
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         (const_int 1 [0x1])
>         (const_int 2 [0x2])
>         (const_int 3 [0x3])
>         ..................
>         (const_int n [hex(n)])
>         repeat [
>             (const_int 0 [0])
>             (const_int -1 [0xffffffffffffffff])
>             (const_int -2 [0xfffffffffffffffe])
>             (const_int -3 [0xfffffffffffffffd])
>             .........
>              (const_int -n [hex(-n)])
>         ]
>     ])
> Am I understanding right?
>
> Let's first take a look at the codes you write:
>
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>   rtx x = builder.build ();
>
> So for poly_uint (1,1):
> Ideally according to the codes, you want to generate the pattern like this:
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 0 [0])
>         ]
>     ])


Ah, right, the two subsequences are supposed to be different.

> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration 
> make i always is 0.
> In this case, i = -i = 0.
>
> So we will end up with the first value = 0, and repeat value is also 0, in 
> "rtx x = builder.build ();".
> Because GCC thinks all the value are 0, so the pattern is changed as follows 
> which is not a stepped vector:
>
> x = (const_vector:VNx1DI repeat [
>         (const_int 0 [0])
>     ])
>
> This is a const_vector duplicate vector.
> This is not the correct pattern you want so it fails in the following check.
>
> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is 
> the only corner case will go wrong.
>
> To fix this, we should adjust test pattern generator to specifically 
> poly_uint16 (1,1). We should make first value
> and repeat value different. I tried 2 solution and they both pass for 
> poly_uint16 (1,1):
>
> ================
> Original codes:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
> ================
>
> ================
> 1st solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (-(int) i, int_mode));
>
> x = (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int 1 [0x1])
>         ]
>     ])
> ================
>
> ================
> 2nd solution:
> rtx_vector_builder builder (inner_mode, count, 2);
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (i, int_mode));
>   for (unsigned int i = 0; i < count; ++i)
>     builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));

I don't think we need to single out count == 1.  It should be OK to use
-1 - (int) i unconditionally.

Thanks,
Richard

> x= 
> (const_vector:VNx1DI [
>         (const_int 0 [0])
>         repeat [
>             (const_int -1 [0xffffffffffffffff])
>         ]
>     ])
> ================
>
> Do you agree with these solution? Or you could offer a better solution to fix 
> this? Thank you so much.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 20:52
> To: juzhe.zhong\@rivai.ai
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
> and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
>>> Ah, right, sorry for the bogus suggestion.
>>> In that case, what specifically goes wrong?  Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute 
>> test_vector_subregs_modes.
>> The fail ICE:
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: 
>> FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>   expected: (nil)
>>
>>   actual: (const_int 0 [0])
>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, 
>> rtx_def*, rtx_def*)
>>         ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>> 0x1332504 test_vector_subregs_modes
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>> 0x1332988 test_vector_subregs_fore_back
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>> 0x1332ae7 test_vector_subregs
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>> 0x1332c57 test_vector_ops
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>         ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>> 0x21318fa selftest::run_tests()
>>         ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>> 0x1362a76 toplev::run_self_tests()
>>         ../../../riscv-gcc/gcc/toplev.cc:2205
>>
>> I analyzed the codes:
>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = 
>> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>> So the assertion fails.
>  
> Hmm, ok, so the subreg operation is unexpected succeeding.
>  
>> This is the test for stepped vector using 2 element per pattern.  For 
>> poly_uint16 (1,1), it's true it is possible only has 1 element. 
>  
> The stepped case is 3 elements per pattern rather than 2.  In a stepped
> pattern: a, b, b+n are represented explicitly, then the rest are
> implicitly b+n*2, b+n*3, etc.
>  
> The case being handled by this code is instead the 2-element case:
> a, b are represented explicitly, then the rest are implicitly all b.
>  
> Why is (1,1) different though?  The test is constructing:
>  
>   nunits: 1 + 1x
>   shape: nelts_per_pattern == 2, npatterns == 1
>   elements: a, b[, b, b, b, b, ...]
>  
> It then tests subregs starting at 0 + 1x (so starting at the first b).
> But for (2,2) we should have:
>  
>   nunits: 2 + 2x
>   shape: nelts_per_pattern == 2, npatterns == 2
>   elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>  
> and we should test subregs starting at 0 + 2x (so starting at the
> first b1).  The two cases should be very similar, it's just that the
> (2,2) case doubles the number of patterns.
>  
>> I think it makes sense to fail the test. However for poly (1,1) machine 
>> mode, can we have the chance that some target define this
>> kind of machine mode only used for intrinsics?  I already developed full RVV 
>> support in GCC (including intrinsc and auto-vectorization).
>> I only enable auto-vectorization with mode larger than (2,2) and test it 
>> fully.
>> From my experience, it seems the stepped vector only created during VLA 
>> auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics 
>> will
>> not create issues. Am I understanding wrong ？Feel free to correct me. Thanks 
>> ~
>  
> Following on from what I said above, it doesn't look like this particular
> case is related to stepped vectors.
>  
> (1,1) shouldn't (need to) be a special case though.  Any potentital
> problems that would occur for (1,1) with npatterns==1 would also occur
> for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
> would be problematic for (2,2).
>  
> So yeah, preventing a mode being used for autovectorisation would allow
> the target to have a bit more control over which constants are actually
> generated.  But it shouldn't be necessary to do that for correctness.
>  
> Thanks,
> Richard
>  
>> juzhe.zh...@rivai.ai
>>  
>> From: Richard Sandiford
>> Date: 2022-08-19 17:35
>> To: juzhe.zhong\@rivai.ai
>> CC: rguenther; gcc-patches; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 
>> 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
>>> Hi, Richard. I tried the codes:
>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true 
>>> value.
>>  
>> Ah, right, sorry for the bogus suggestion.
>>  
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>>  
>> Thanks,
>> Richard
>>  
>>> But I tried:
>>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It pass. But it report a warning: "warning: comparison between signed and 
>>> unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>
>>> Finally, I tried:
>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It passed with no warning.
>>>
>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>> Thanks!
>>>
>>>
>>> juzhe.zh...@rivai.ai
>>>  
>>> From: Richard Sandiford
>>> Date: 2022-08-19 16:03
>>> To: juzhe.zhong
>>> CC: gcc-patches; rguenther; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 
>>> 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> juzhe.zh...@rivai.ai writes:
>>>> From: zhongjuzhe <juzhe.zh...@rivai.ai>
>>>>
>>>> Hello. This patch is preparing for following RVV support.
>>>>
>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant 
>>>> of ARM SVE is always 128-bit blocks.
>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
>>>> sub-extension and 64bit in 'Zve*' sub-extension.
>>>>
>>>> So I define the machine_mode as follows:
>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>
>>>> The compilation is failed for the stepped vector test:
>>>> (const_vector:VNx1DI repeat [
>>>>         (const_int 8 [0x8])
>>>>         (const_int 7 [0x7])
>>>>     ])
>>>>
>>>> I understand for stepped vector should always have aleast 2 elements and 
>>>> stepped vector initialization is common
>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
>>>> report fail for stepped vector of poly_uint16 (1, 1).
>>>>
>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
>>>> intrinsics. And I would like to enable RVV auto-vectorization
>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V 
>>>> backend. I think it will not create issue if we define
>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or 
>>>> offer me some other better solutions. Thanks!
>>>>
>>>>   
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>         * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for 
>>>> poly_uint16 (1, 1).
>>>>
>>>> ---
>>>>  gcc/simplify-rtx.cc | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>> --- a/gcc/simplify-rtx.cc
>>>> +++ b/gcc/simplify-rtx.cc
>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode 
>>>> inner_mode)
>>>>    rtx x = builder.build ();
>>>>  
>>>>    test_vector_subregs_modes (x);
>>>> -  if (!nunits.is_constant ())
>>>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>      test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>  
>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>> Something like poly_uint16 (1, 2) would have the same problem as
>>> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
>>> principle.)
>>>  
>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>  
>>>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>       && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>     {
>>>       test_vector_ops_series (mode, scalar_reg);
>>>       test_vector_subregs (mode);
>>>     }
>>>  
>>> Thanks,
>>> Richard
>>>  
>>  
>

Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

Reply via email to