> The following experiment resulted from looking at making
> array_ref_low_bound and array_ref_element_size non-mutating.  Again
> I wondered why we do this strange scaling by offset/element alignment.

The idea is to expose the alignment factor to the RTL expander:

        tree tem
          = get_inner_reference (exp, &bitsize, &bitpos, &offset, &mode1,
                                 &unsignedp, &reversep, &volatilep, true);

[...]

            rtx offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode,
                                          EXPAND_SUM);

[...]

            op0 = offset_address (op0, offset_rtx,
                                  highest_pow2_factor (offset));

With the scaling, offset is something like _69 * 4 so highest_pow2_factor can 
see the factor and passes it down to offset_address:

(gdb) p debug_rtx(op0)
(mem/c:SI (plus:SI (reg/f:SI 193)
        (reg:SI 194)) [3 *s.16_63 S4 A32])

With your patch in the same situation:

(gdb) p debug_rtx(op0)
(mem/c:SI (plus:SI (reg/f:SI 139)
        (reg:SI 116 [ _33 ])) [3 *s.16_63 S4 A8])

On strict-alignment targets, this makes a big difference, e.g. SPARC:

        ld      [%i4+%i5], %i0

vs

        ldub    [%i5+%i4], %g1
        sll     %g1, 24, %g1
        add     %i5, %i4, %i5
        ldub    [%i5+1], %i0
        sll     %i0, 16, %i0
        or      %i0, %g1, %i0
        ldub    [%i5+2], %g1
        sll     %g1, 8, %g1
        or      %g1, %i0, %g1
        ldub    [%i5+3], %i0
        or      %i0, %g1, %i0


Now this is mitigated by a couple of things:

  1. the above pessimization only happens on the RHS; on the LHS, the expander 
calls highest_pow2_factor_for_target instead of highest_pow2_factor and the 
former takes into account the type's alignment thanks to the MAX:

/* Similar, except that the alignment requirements of TARGET are
   taken into account.  Assume it is at least as aligned as its
   type, unless it is a COMPONENT_REF in which case the layout of
   the structure gives the alignment.  */

static unsigned HOST_WIDE_INT
highest_pow2_factor_for_target (const_tree target, const_tree exp)
{
  unsigned HOST_WIDE_INT talign = target_align (target) / BITS_PER_UNIT;
  unsigned HOST_WIDE_INT factor = highest_pow2_factor (exp);

  return MAX (factor, talign);
}

  2. highest_pow2_factor can be rescued by the set_nonzero_bits machinery of 
the SSA CCP pass because it calls tree_ctz.  The above example was compiled 
with -O -fno-tree-ccp on SPARC; at -O, the code isn't pessimized.

> So - the following patch gets rid of that scaling.  For a "simple"
> C testcase
> 
> void bar (void *);
> void foo (int n)
> {
>   struct S { struct R { int b[n]; } a[2]; int k; } s;
>   s.k = 1;
>   s.a[1].b[7] = 3;
>   bar (&s);
> }

This only exposes the LHS case, here's a more complete testcase:

void bar (void *);

int foo (int n)
{
  struct S { struct R { char b[n]; } a[2]; int k; } s;
  s.k = 1;
  s.a[1].b[7] = 3;
  bar (&s);
  return s.k;
}

-- 
Eric Botcazou

Reply via email to