On Mon, 17 Jul 2023 at 17:44, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > > Hi Richard, > This is reworking of patch to extend fold_vec_perm to handle VLA vectors. > The attached patch unifies handling of VLS and VLA vector_csts, while > using fallback code > for ctors. > > For VLS vector, the patch ignores underlying encoding, and > uses npatterns = nelts, and nelts_per_pattern = 1. > > For VLA patterns, if sel has a stepped sequence, then it > only chooses elements from a particular pattern of a particular > input vector. > > To make things simpler, the patch imposes following constraints: > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2. > (b) The step size for a stepped sequence is a power of 2, and > multiple of npatterns of chosen input vector. > (c) Runtime vector length of sel is a multiple of sel_npatterns. > So, we don't handle sel.length = 2 + 2x and npatterns = 4. > > Eg: > op0, op1: npatterns = 2, nelts_per_pattern = 3 > op0_len = op1_len = 16 + 16x. > sel = { 0, 0, 2, 0, 4, 0, ... } > npatterns = 2, nelts_per_pattern = 3. > > For pattern {0, 2, 4, ...} > Let, > a1 = 2 > S = step size = 2 > > Let Esel denote number of elements per pattern in sel at runtime. > Esel = (16 + 16x) / npatterns_sel > = (16 + 16x) / 2 > = (8 + 8x) > > So, last element of pattern: > ae = a1 + (Esel - 2) * S > = 2 + (8 + 8x - 2) * 2 > = 14 + 16x > > a1 /trunc arg0_len = 2 / (16 + 16x) = 0 > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0 > Since both are equal with quotient = 0, we select elements from op0. > > Since step size (S) is a multiple of npatterns(op0), we select > all elements from same pattern of op0. > > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns)) > = max (2, max (2, 2) > = 2 > > res_nelts_per_pattern = max (op0_nelts_per_pattern, > max (op1_nelts_per_pattern, > > sel_nelts_per_pattern)) > = max (3, max (3, 3)) > = 3 > > So res has encoding with npatterns = 2, nelts_per_pattern = 3. > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... } > > Unfortunately, this results in an issue for poly_int_cst index: > For example, > op0, op1: npatterns = 1, nelts_per_pattern = 3 > op0_len = op1_len = 4 + 4x > > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1 > > In this case, > a1 = 5 + 4x > S = (6 + 4x) - (5 + 4x) = 1 > Esel = 4 + 4x > > ae = a1 + (esel - 2) * S > = (5 + 4x) + (4 + 4x - 2) * 1 > = 7 + 8x > > IIUC, 7 + 8x will always be index for last element of op1 ? > if x = 0, len = 4, 7 + 8x = 7 > if x = 1, len = 8, 7 + 8x = 15, etc. > So the stepped sequence will always choose elements > from op1 regardless of vector length for above case ? > > However, > ae /trunc op0_len > = (7 + 8x) / (4 + 4x) > which is not defined because 7/4 != 8/4 > and we return NULL_TREE, but I suppose the expected result would be: > res: { op1[0], op1[1], op1[2], ... } ? > > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve, > and on x86_64-unknown-linux-gnu. > I would be grateful for suggestions on how to proceed. Hi Richard, ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624675.html
Thanks, Prathamesh > > Thanks, > Prathamesh