钟居哲 <juzhe.zh...@rivai.ai> writes: > After deep analysis and several tries. I have made a analysis and conclusion > as follows. > I really appreciate if you could spend some time take a look at the following > analysis that I made: > > According to the codes in test_vector_subregs_fore_back: > > ~~~ > poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); > unsigned int min_nunits = constant_lower_bound (nunits); > scalar_mode int_mode = GET_MODE_INNER (inner_mode); > unsigned int count = gcd (min_nunits, 4); > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > ~~~~ > > Analyze the codes and tried several patterns: > > For poly_uint16 (2,2): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > ] > ]) > > For poly_uint16 (4,4): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ] > ]) > > So, I think I can conclude the pattern rule as follows: > > poly_uint16 (n,n): > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > .................. > (const_int n [hex(n)]) > repeat [ > (const_int 0 [0]) > (const_int -1 [0xffffffffffffffff]) > (const_int -2 [0xfffffffffffffffe]) > (const_int -3 [0xfffffffffffffffd]) > ......... > (const_int -n [hex(-n)]) > ] > ]) > Am I understanding right? > > Let's first take a look at the codes you write: > > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > rtx x = builder.build (); > > So for poly_uint (1,1): > Ideally according to the codes, you want to generate the pattern like this: > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 0 [0]) > ] > ])
Ah, right, the two subsequences are supposed to be different. > In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration > make i always is 0. > In this case, i = -i = 0. > > So we will end up with the first value = 0, and repeat value is also 0, in > "rtx x = builder.build ();". > Because GCC thinks all the value are 0, so the pattern is changed as follows > which is not a stepped vector: > > x = (const_vector:VNx1DI repeat [ > (const_int 0 [0]) > ]) > > This is a const_vector duplicate vector. > This is not the correct pattern you want so it fails in the following check. > > I think the test pattern generator goes wrong in poly_uint16 (1,1) which is > the only corner case will go wrong. > > To fix this, we should adjust test pattern generator to specifically > poly_uint16 (1,1). We should make first value > and repeat value different. I tried 2 solution and they both pass for > poly_uint16 (1,1): > > ================ > Original codes: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > ================ > > ================ > 1st solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (-(int) i, int_mode)); > > x = (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int 1 [0x1]) > ] > ]) > ================ > > ================ > 2nd solution: > rtx_vector_builder builder (inner_mode, count, 2); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (i, int_mode)); > for (unsigned int i = 0; i < count; ++i) > builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); I don't think we need to single out count == 1. It should be OK to use -1 - (int) i unconditionally. Thanks, Richard > x= > (const_vector:VNx1DI [ > (const_int 0 [0]) > repeat [ > (const_int -1 [0xffffffffffffffff]) > ] > ]) > ================ > > Do you agree with these solution? Or you could offer a better solution to fix > this? Thank you so much. > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2022-08-19 20:52 > To: juzhe.zhong\@rivai.ai > CC: gcc-patches; rguenther; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) > and allow the machine_mode definition with poly_uint16 (1, 1) > "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>> Ah, right, sorry for the bogus suggestion. >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute >> test_vector_subregs_modes. >> The fail ICE: >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: >> FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte)) >> expected: (nil) >> >> actual: (const_int 0 [0]) >> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, >> rtx_def*, rtx_def*) >> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >> 0x1332504 test_vector_subregs_modes >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >> 0x1332988 test_vector_subregs_fore_back >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >> 0x1332ae7 test_vector_subregs >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >> 0x1332c57 test_vector_ops >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >> 0x1332c7b selftest::simplify_rtx_cc_tests() >> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >> 0x21318fa selftest::run_tests() >> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >> 0x1362a76 toplev::run_self_tests() >> ../../../riscv-gcc/gcc/toplev.cc:2205 >> >> I analyzed the codes: >> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = >> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >> So the assertion fails. > > Hmm, ok, so the subreg operation is unexpected succeeding. > >> This is the test for stepped vector using 2 element per pattern. For >> poly_uint16 (1,1), it's true it is possible only has 1 element. > > The stepped case is 3 elements per pattern rather than 2. In a stepped > pattern: a, b, b+n are represented explicitly, then the rest are > implicitly b+n*2, b+n*3, etc. > > The case being handled by this code is instead the 2-element case: > a, b are represented explicitly, then the rest are implicitly all b. > > Why is (1,1) different though? The test is constructing: > > nunits: 1 + 1x > shape: nelts_per_pattern == 2, npatterns == 1 > elements: a, b[, b, b, b, b, ...] > > It then tests subregs starting at 0 + 1x (so starting at the first b). > But for (2,2) we should have: > > nunits: 2 + 2x > shape: nelts_per_pattern == 2, npatterns == 2 > elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] > > and we should test subregs starting at 0 + 2x (so starting at the > first b1). The two cases should be very similar, it's just that the > (2,2) case doubles the number of patterns. > >> I think it makes sense to fail the test. However for poly (1,1) machine >> mode, can we have the chance that some target define this >> kind of machine mode only used for intrinsics? I already developed full RVV >> support in GCC (including intrinsc and auto-vectorization). >> I only enable auto-vectorization with mode larger than (2,2) and test it >> fully. >> From my experience, it seems the stepped vector only created during VLA >> auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics >> will >> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks >> ~ > > Following on from what I said above, it doesn't look like this particular > case is related to stepped vectors. > > (1,1) shouldn't (need to) be a special case though. Any potentital > problems that would occur for (1,1) with npatterns==1 would also occur > for (n,n) with npatterns==n. E.g. if stepped vectors are problematic > for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) > would be problematic for (2,2). > > So yeah, preventing a mode being used for autovectorisation would allow > the target to have a bit more control over which constants are actually > generated. But it shouldn't be necessary to do that for correctness. > > Thanks, > Richard > >> juzhe.zh...@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 17:35 >> To: juzhe.zhong\@rivai.ai >> CC: rguenther; gcc-patches; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, >> 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>> Hi, Richard. I tried the codes: >>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true >>> value. >> >> Ah, right, sorry for the bogus suggestion. >> >> In that case, what specifically goes wrong? Which test in >> test_vector_subregs_modes fails, and what does the ICE look like? >> >> Thanks, >> Richard >> >>> But I tried: >>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It pass. But it report a warning: "warning: comparison between signed and >>> unsigned integer expressions [-Wsign-compare]" during the compilation. >>> >>> Finally, I tried: >>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> It passed with no warning. >>> >>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>> Thanks! >>> >>> >>> juzhe.zh...@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 16:03 >>> To: juzhe.zhong >>> CC: gcc-patches; rguenther; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, >>> 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> juzhe.zh...@rivai.ai writes: >>>> From: zhongjuzhe <juzhe.zh...@rivai.ai> >>>> >>>> Hello. This patch is preparing for following RVV support. >>>> >>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector. >>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant >>>> of ARM SVE is always 128-bit blocks. >>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' >>>> sub-extension and 64bit in 'Zve*' sub-extension. >>>> >>>> So I define the machine_mode as follows: >>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>> >>>> The compilation is failed for the stepped vector test: >>>> (const_vector:VNx1DI repeat [ >>>> (const_int 8 [0x8]) >>>> (const_int 7 [0x7]) >>>> ]) >>>> >>>> I understand for stepped vector should always have aleast 2 elements and >>>> stepped vector initialization is common >>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that >>>> report fail for stepped vector of poly_uint16 (1, 1). >>>> >>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in >>>> intrinsics. And I would like to enable RVV auto-vectorization >>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V >>>> backend. I think it will not create issue if we define >>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or >>>> offer me some other better solutions. Thanks! >>>> >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for >>>> poly_uint16 (1, 1). >>>> >>>> --- >>>> gcc/simplify-rtx.cc | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>> --- a/gcc/simplify-rtx.cc >>>> +++ b/gcc/simplify-rtx.cc >>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode >>>> inner_mode) >>>> rtx x = builder.build (); >>>> >>>> test_vector_subregs_modes (x); >>>> - if (!nunits.is_constant ()) >>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>> >>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>> the fore_back tests require vectors that have a minimum of 2 elements. >>> Something like poly_uint16 (1, 2) would have the same problem as >>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>> principle.) >>> >>> This corresponds to the minimum of 3 elements for the stepped tests: >>> >>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>> { >>> test_vector_ops_series (mode, scalar_reg); >>> test_vector_subregs (mode); >>> } >>> >>> Thanks, >>> Richard >>> >> >