On November 4, 2019 4:30:57 PM GMT+01:00, Joel Hutton <joel.hut...@arm.com> wrote: >First copy bounced from mailing list. > > > On 30/10/2019 13:49, Richard Biener wrote: > > > >> >> > >> * expr.c (store_constructor): Add case for constructor >of > > > vectors. > > > > Why do you need this? The vectorizer already creates such >CTORs. Any > > > > testcase that you can show? > > > > > > > > typedef long v2di __attribute__((vector_size(16))); > > > > v2di v; > > > > void > > > > foo() > > > > { > > > > v = (v2di){v[1], v[0]}; > > > > } >> > There is a special case for single element vectors, where the >vector > > > mode and > > > the element mode are the same. >> > I've changed this slightly, as your solution of using VECTOR_MODE_P > >didn't > > > work for my case where: > > > emode = E_DImode > > > eltmode = E_DImode > > > On aarch64. As E_DImode is not a vector mode. >> > Based on some checks I found in verify_gimple, I set the type of >the > > > replacement >> > constructor to the same as the original constructor, as >verify_gimple > > > seems to >> > expect flattened types. i.e. a vector of vectors of ints should >have > > > type vector > > > of ints. > > > > Huh. On aarch64 for the above testcase I see mode V2DImode and > > emode = eltmode = DImode. That's just a regular CTOR with > > non-vector elements so not sure why you need any adjustment at all > > here? > > > > It looks like your vectorization of the CTOR introduces a > > V2DImode CTOR of vector(1) long elements which incidentially > > have DImode. That's because somehow V2DI vectorization isn't > > profitable but V1DI one is. In the end it's a noop transform > > but the testcase shows that for V2DI vectorization we fail > > to cost the CTOR in the scalar cost computation (you have to > > include the pattern root there I guess). > > >Yes, sorry, I was unclear about this, it's the new constructor that the > >change is needed for. > > > Technically we feed valid GIMPLE to the expander so we indeed > > shouldn't ICE. So I'd like to have the earlier keying on > > vec_vec_init_p match the later assert we run into, thus > > sth like > > > > Index: gcc/expr.c > > =================================================================== > > --- gcc/expr.c (revision 277765) > > +++ gcc/expr.c (working copy) > > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, > > && n_elts.is_constant (&const_n_elts)) > > { > > machine_mode emode = eltmode; > > + bool vector_typed_elts_p = false; > > > > if (CONSTRUCTOR_NELTS (exp) > > && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, > > 0)->value)) > > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, >> * TYPE_VECTOR_SUBPARTS (etype), > > n_elts)); > > emode = TYPE_MODE (etype); > > + vector_typed_elts_p = true; > > } >> icode = convert_optab_handler (vec_init_optab, mode, >emode); > > if (icode != CODE_FOR_nothing) > > { > > unsigned int n = const_n_elts; > > > > - if (emode != eltmode) > > + if (vector_typed_elts_p) > > { > > n = CONSTRUCTOR_NELTS (exp); > > vec_vec_init_p = true; > > > > >I've used this, thank you. > > > > > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > > > if (is_a <loop_vec_info> (vinfo)) > > > > >> > > the ChangeLog doesn't mention this. I guess this isn't >necessary > > > > unless you fail vectorizing late which you shouldn't. > > > > > > > It's necessary to avoid: > > > > > > removing stmts twice for constructors of the form >{_1,_1,_1,_1} >> > removing stmts that define ssa names used elsewhere (which >> > previously wasn't a consideration because the scalar_stmts were >stores > > > to memory, not assignments to ssa names) > > > > OK, but the code you are patching is just supposed to remove stores > > from the final scalar stmt seeds - it doesn't expect any loads there > > which is probably what happens. I'd instead add a > > > > /* Do not CSE the stmts feeding a CTOR, they might have uses > > outside of the vectorized stmts. */ > > if (SLP_INSTANCE_ROOT_STMT (instance)) > > continue; > > > > before the loop over SLP_TREE_SCALAR_STMTS. >Done. > > > + if (!subparts.is_constant () || !(subparts.to_constant () >> + == CONSTRUCTOR_NELTS >(rhs))) > > + continue; > > > > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS >(rhs))) >Done. > > > +/* Vectorize the instance root. Return success or failure. */ > > + > > +void > > > > since it's now void the comment has to be adjusted. >Done. > > > + vect_schedule_slp_instance (node, > > instance, bst_map); > > > > this now fits in one line. >Done. > > > OK with those changes. >Tested and bootstrapped on aarch64. >OK?
Ok. Thanks, Richard. >2019-11-04 Joel Hutton <joel.hut...@arm.com> > > * expr.c (store_constructor): Modify to handle single element >vectors. > * tree-vect-slp.c (vect_analyze_slp_instance): Add case for >vector constructors. > (vect_slp_check_for_constructors): New function. > (vect_slp_analyze_bb_1): Call new function to check for vector >constructors. > (vectorize_slp_instance_root_stmt): New function. > (vect_schedule_slp): Call new function to vectorize root stmt >of vector constructors. > * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > >gcc/testsuite/ChangeLog: > >2019-11-04 Joel Hutton <joel.hut...@arm.com> > > * gcc.dg/vect/bb-slp-40.c: New test. > * gcc.dg/vect/bb-slp-41.c: New test.