On Tue, 14 May 2019, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > The following makes SSA rewrite (update-address-taken) recognize > > sets of aligned sub-vectors in aligned position > > (v2qi into v16qi, but esp. v8qi into v16qi). It uses the > > BIT_INSERT_EXPR support for this, enabling that for vector > > typed values. This makes us turn for example > > > > typedef unsigned char v16qi __attribute__((vector_size(16))); > > v16qi load (const void *p) > > { > > v16qi r; > > __builtin_memcpy (&r, p, 8); > > return r; > > } > > > > into the following > > > > load (const void * p) > > { > > v16qi r; > > long unsigned int _3; > > v16qi _5; > > vector(8) unsigned char _7; > > > > <bb 2> : > > _3 = MEM[(char * {ref-all})p_2(D)]; > > _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3); > > r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>; > > _5 = r_9; > > return _5; > > > > this isn't yet nicely expanded since the BIT_INSERT_EXPR > > expansion simply goes through store_bit_field and there's > > no vector-mode vec_set. > > > > Similar as to the single-element insert SSA rewrite already > > handles the transform is conditional on the involved > > vector types having non-BLKmode. This is somewhat bad > > since the transform is supposed to enable SSA optimizations > > by rewriting memory vectors into SSA form. Since splitting > > of larger generic vectors happens very much later only > > this pessimizes their use. But the BIT_INSERT_EXPR > > expansion doesn't cope with BLKmode entities (source or > > destination). > > > > Extending BIT_INSERT_EXPR this way seems natural given > > the support of CONSTRUCTORs with smaller vectors. > > BIT_FIELD_REF isn't particularly restricted so can be > > used to extract sub-vectors as well. > > > > Code generation is as bad as before (RTL expansion eventually > > spills) but SSA optimizations are enabled on less trivial > > testcases. > > > > Boostrap / regtest running on x86_64-unknown-linux-gnu. > > > > Comments? > > > > Richard. > > > > 2019-05-14 Richard Biener <rguent...@suse.de> > > > > PR tree-optimization/90424 > > * tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from > > aligned subvectors. > > (execute_update_addresses_taken): Likewise. > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > > > * g++.target/i386/pr90424-1.C: New testcase. > > * g++.target/i386/pr90424-2.C: Likewise. > > > > Index: gcc/tree-ssa.c > > =================================================================== > > --- gcc/tree-ssa.c (revision 271155) > > +++ gcc/tree-ssa.c (working copy) > > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs) > > if (DECL_P (decl) > > && VECTOR_TYPE_P (TREE_TYPE (decl)) > > && TYPE_MODE (TREE_TYPE (decl)) != BLKmode > > - && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), > > - TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0) > > + && multiple_of_p (sizetype, > > + TYPE_SIZE_UNIT (TREE_TYPE (decl)), > > + TYPE_SIZE_UNIT (TREE_TYPE (lhs))) > > && known_ge (mem_ref_offset (lhs), 0) > > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))), > > mem_ref_offset (lhs)) > > && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1), > > TYPE_SIZE_UNIT (TREE_TYPE (lhs)))) > > - return false; > > + { > > + if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), > > + TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), > > + 0)) > > + return false; > > + /* For sub-vector inserts the insert vector mode has to be > > + supported. */ > > + tree vtype = build_vector_type > > + (TREE_TYPE (TREE_TYPE (decl)), > > + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))) > > + / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl))))); > > AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't > poly-int clean. Is there a guarantee that lhs is a multiple of the > element size even for integers? Or are we just relying on a vector > of 0 elements being rejected?
That it is a multiple of the element size is tested indirectly by > > + && multiple_of_p (sizetype, > > + TYPE_SIZE_UNIT (TREE_TYPE (decl)), > > + TYPE_SIZE_UNIT (TREE_TYPE (lhs))) given 'decl' has vector type. That also guarantees non-zero size? But yes, I guess TYPE_SIZE_UNIT might be a poly-int. > Maybe something like: > > tree elt_type = TREE_TYPE (TREE_TYPE (decl)); > unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type)); > poly_uint64 lhs_bits, nelts; > if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits) > && multiple_p (lhs_bits, elt_bits, &nelts)) > { > tree vtype = build_vector_type (elt_type, nelts); This should work. Of course it even more so makes duplicating all the bits in two places ugly :/ I guess I might take it as opportunity to refactor the pass (as excuse to not work on that SLP thing...). Thanks, Richard.