On Tue, 14 May 2019, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The following makes SSA rewrite (update-address-taken) recognize
> > sets of aligned sub-vectors in aligned position
> > (v2qi into v16qi, but esp. v8qi into v16qi).  It uses the
> > BIT_INSERT_EXPR support for this, enabling that for vector
> > typed values.  This makes us turn for example
> >
> > typedef unsigned char v16qi __attribute__((vector_size(16)));
> > v16qi load (const void *p)
> > {
> >   v16qi r;
> >   __builtin_memcpy (&r, p, 8);
> >   return r;
> > }
> >
> > into the following
> >
> > load (const void * p)
> > {
> >   v16qi r;
> >   long unsigned int _3;
> >   v16qi _5;
> >   vector(8) unsigned char _7;
> >
> >   <bb 2> :
> >   _3 = MEM[(char * {ref-all})p_2(D)];
> >   _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3);
> >   r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>;
> >   _5 = r_9;
> >   return _5;
> >
> > this isn't yet nicely expanded since the BIT_INSERT_EXPR
> > expansion simply goes through store_bit_field and there's
> > no vector-mode vec_set.
> >
> > Similar as to the single-element insert SSA rewrite already
> > handles the transform is conditional on the involved
> > vector types having non-BLKmode.  This is somewhat bad
> > since the transform is supposed to enable SSA optimizations
> > by rewriting memory vectors into SSA form.  Since splitting
> > of larger generic vectors happens very much later only
> > this pessimizes their use.  But the BIT_INSERT_EXPR
> > expansion doesn't cope with BLKmode entities (source or
> > destination).
> >
> > Extending BIT_INSERT_EXPR this way seems natural given
> > the support of CONSTRUCTORs with smaller vectors.
> > BIT_FIELD_REF isn't particularly restricted so can be
> > used to extract sub-vectors as well.
> >
> > Code generation is as bad as before (RTL expansion eventually
> > spills) but SSA optimizations are enabled on less trivial
> > testcases.
> >
> > Boostrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Comments?
> >
> > Richard.
> >
> > 2019-05-14  Richard Biener  <rguent...@suse.de>
> >
> >     PR tree-optimization/90424
> >     * tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from
> >     aligned subvectors.
> >     (execute_update_addresses_taken): Likewise.
> >     * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> >
> >     * g++.target/i386/pr90424-1.C: New testcase.
> >     * g++.target/i386/pr90424-2.C: Likewise.
> >
> > Index: gcc/tree-ssa.c
> > ===================================================================
> > --- gcc/tree-ssa.c  (revision 271155)
> > +++ gcc/tree-ssa.c  (working copy)
> > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs)
> >        if (DECL_P (decl)
> >       && VECTOR_TYPE_P (TREE_TYPE (decl))
> >       && TYPE_MODE (TREE_TYPE (decl)) != BLKmode
> > -     && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > -                         TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0)
> > +     && multiple_of_p (sizetype,
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> >       && known_ge (mem_ref_offset (lhs), 0)
> >       && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))),
> >                    mem_ref_offset (lhs))
> >       && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1),
> >                         TYPE_SIZE_UNIT (TREE_TYPE (lhs))))
> > -   return false;
> > +   {
> > +     if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)),
> > +                          TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))),
> > +                          0))
> > +       return false;
> > +     /* For sub-vector inserts the insert vector mode has to be
> > +        supported.  */
> > +     tree vtype = build_vector_type
> > +         (TREE_TYPE (TREE_TYPE (decl)),
> > +          tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs)))
> > +          / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl)))));
> 
> AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't
> poly-int clean.  Is there a guarantee that lhs is a multiple of the
> element size even for integers?  Or are we just relying on a vector
> of 0 elements being rejected?

That it is a multiple of the element size is tested indirectly by

> > +     && multiple_of_p (sizetype, 
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (decl)),
> > +                       TYPE_SIZE_UNIT (TREE_TYPE (lhs)))

given 'decl' has vector type.  That also guarantees non-zero size?

But yes, I guess TYPE_SIZE_UNIT might be a poly-int.

> Maybe something like:
> 
>         tree elt_type = TREE_TYPE (TREE_TYPE (decl));
>         unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type));
>         poly_uint64 lhs_bits, nelts;
>         if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits)
>             && multiple_p (lhs_bits, elt_bits, &nelts))
>             {
>             tree vtype = build_vector_type (elt_type, nelts);

This should work.

Of course it even more so makes duplicating all the bits in
two places ugly :/

I guess I might take it as opportunity to refactor the pass
(as excuse to not work on that SLP thing...).

Thanks,
Richard.

Reply via email to