Richard Biener <rguent...@suse.de> writes: > On Tue, 14 May 2019, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > The following makes SSA rewrite (update-address-taken) recognize >> > sets of aligned sub-vectors in aligned position >> > (v2qi into v16qi, but esp. v8qi into v16qi). It uses the >> > BIT_INSERT_EXPR support for this, enabling that for vector >> > typed values. This makes us turn for example >> > >> > typedef unsigned char v16qi __attribute__((vector_size(16))); >> > v16qi load (const void *p) >> > { >> > v16qi r; >> > __builtin_memcpy (&r, p, 8); >> > return r; >> > } >> > >> > into the following >> > >> > load (const void * p) >> > { >> > v16qi r; >> > long unsigned int _3; >> > v16qi _5; >> > vector(8) unsigned char _7; >> > >> > <bb 2> : >> > _3 = MEM[(char * {ref-all})p_2(D)]; >> > _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3); >> > r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>; >> > _5 = r_9; >> > return _5; >> > >> > this isn't yet nicely expanded since the BIT_INSERT_EXPR >> > expansion simply goes through store_bit_field and there's >> > no vector-mode vec_set. >> > >> > Similar as to the single-element insert SSA rewrite already >> > handles the transform is conditional on the involved >> > vector types having non-BLKmode. This is somewhat bad >> > since the transform is supposed to enable SSA optimizations >> > by rewriting memory vectors into SSA form. Since splitting >> > of larger generic vectors happens very much later only >> > this pessimizes their use. But the BIT_INSERT_EXPR >> > expansion doesn't cope with BLKmode entities (source or >> > destination). >> > >> > Extending BIT_INSERT_EXPR this way seems natural given >> > the support of CONSTRUCTORs with smaller vectors. >> > BIT_FIELD_REF isn't particularly restricted so can be >> > used to extract sub-vectors as well. >> > >> > Code generation is as bad as before (RTL expansion eventually >> > spills) but SSA optimizations are enabled on less trivial >> > testcases. >> > >> > Boostrap / regtest running on x86_64-unknown-linux-gnu. >> > >> > Comments? >> > >> > Richard. >> > >> > 2019-05-14 Richard Biener <rguent...@suse.de> >> > >> > PR tree-optimization/90424 >> > * tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from >> > aligned subvectors. >> > (execute_update_addresses_taken): Likewise. >> > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. >> > >> > * g++.target/i386/pr90424-1.C: New testcase. >> > * g++.target/i386/pr90424-2.C: Likewise. >> > >> > Index: gcc/tree-ssa.c >> > =================================================================== >> > --- gcc/tree-ssa.c (revision 271155) >> > +++ gcc/tree-ssa.c (working copy) >> > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs) >> > if (DECL_P (decl) >> > && VECTOR_TYPE_P (TREE_TYPE (decl)) >> > && TYPE_MODE (TREE_TYPE (decl)) != BLKmode >> > - && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), >> > - TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0) >> > + && multiple_of_p (sizetype, >> > + TYPE_SIZE_UNIT (TREE_TYPE (decl)), >> > + TYPE_SIZE_UNIT (TREE_TYPE (lhs))) >> > && known_ge (mem_ref_offset (lhs), 0) >> > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))), >> > mem_ref_offset (lhs)) >> > && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1), >> > TYPE_SIZE_UNIT (TREE_TYPE (lhs)))) >> > - return false; >> > + { >> > + if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), >> > + TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), >> > + 0)) >> > + return false; >> > + /* For sub-vector inserts the insert vector mode has to be >> > + supported. */ >> > + tree vtype = build_vector_type >> > + (TREE_TYPE (TREE_TYPE (decl)), >> > + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))) >> > + / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl))))); >> >> AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't >> poly-int clean. Is there a guarantee that lhs is a multiple of the >> element size even for integers? Or are we just relying on a vector >> of 0 elements being rejected? > > That it is a multiple of the element size is tested indirectly by > >> > + && multiple_of_p (sizetype, >> > + TYPE_SIZE_UNIT (TREE_TYPE (decl)), >> > + TYPE_SIZE_UNIT (TREE_TYPE (lhs))) > > given 'decl' has vector type. That also guarantees non-zero size?
I was thinking of the case where the element size is still bigger than lhs, so the division would end up being 0. Maybe one of those conditions indirectly prevents that though. > But yes, I guess TYPE_SIZE_UNIT might be a poly-int. > >> Maybe something like: >> >> tree elt_type = TREE_TYPE (TREE_TYPE (decl)); >> unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type)); >> poly_uint64 lhs_bits, nelts; >> if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits) >> && multiple_p (lhs_bits, elt_bits, &nelts)) >> { >> tree vtype = build_vector_type (elt_type, nelts); > > This should work. > > Of course it even more so makes duplicating all the bits in > two places ugly :/ > > I guess I might take it as opportunity to refactor the pass Sounds good :-) > (as excuse to not work on that SLP thing...). except for that bit. > > Thanks, > Richard.