Re: [PATCH v4] dse: Remove partial load after full store for high part access[PR71309]

Richard Sandiford Fri, 31 Jul 2020 09:30:19 -0700

Sorry for the slow reply.

luoxhu <luo...@linux.ibm.com> writes:
> Hi Richard,
>
> This is the updated version that could pass all regression test on
> Power9-LE. 
> Just need another  "maybe_lt (GET_MODE_SIZE (new_mode), access_size)"
> before generating shift for store_info->const_rhs to ensure correct
> constant is generated, take testsuite/gfortran1/equiv_2.x for example:
>
> equiv_2.x-equiv_2.f90.264r.cse2:
> 5: r119:DI=0x6867666564636261
> 6: [sfp:DI+0x20]=r119:DI
> 18: r123:SI=0x65646362
> 19: r124:SI=[sfp:DI+0x21]
> 20: r125:CC=cmp(r124:SI,r123:SI)
>
> Bad code due to get SImode subreg0 and shift right 8 bits got 0x646362 in
> insn #32, equiv_2.x-equiv_2.f90.265r.dse1:
> 5:  r119:DI=0x6867666564636261
> 32: r126:SI=0x646362
> 6:  [sfp:DI+0x20]=r119:DI
> 18: r123:SI=0x65646362
> 19: r124:SI=r126:SI
> 20: r125:CC=cmp(r124:SI,r123:SI)
>
> Good code, get 0x65646362 in insn #32, equiv_2.x-equiv_2.f90.265r.dse1:
> 5:  r119:DI=0x6867666564636261
> 32: r126:SI=0x65646362
> 6:  [sfp:DI+0x20]=r119:DI
> 18: r123:SI=0x65646362
> 19: r124:SI=r126:SI
> 20: r125:CC=cmp(r124:SI,r123:SI)
>
> So the new_mode size should also be GE than access_size for const_rhs shift.
> It seems a bit more complicated now...


Yeah.  The structure of this code makes it hard to make changes to it. :-/

Since the const_rhs code is only trying to fold the operation at compile
time, without regard for what the target supports at runtime, I think it
*does* make sense to use smallest_int_mode_for_size there.  I.e. we should
be able to hoist the code out of the loop and do:

  if (store_info->const_rhs)
    {
      auto new_mode = smallest_int_mode_for_size (access_size * BITS_PER_UNIT);
      auto byte = subreg_lowpart_offset (new_mode, store_mode);
      rtx ret = simplify_subreg (new_mode, store_info->const_rhs,
                                 store_mode, byte);
      if (ret && CONSTANT_P (ret))
        …
    }

  if (require_cst)
    return NULL_RTX;

  …the current loop…

I don't see any reason to restrict the const_rhs case to BITS_PER_WORD.
And if the simplify_subreg above fails for new_mode, I can't think it
would succeed for a wider mode (which would have to provide *more*
constant data than the failed subreg).

> @@ -1728,8 +1728,7 @@ find_shift_sequence (poly_int64 access_size,
>       the machine.  */
>  
>    opt_scalar_int_mode new_mode_iter;
> -  FOR_EACH_MODE_FROM (new_mode_iter,
> -                   smallest_int_mode_for_size (access_size * BITS_PER_UNIT))
> +  FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT)

Sorry, I now realise it would be better to start at:

smallest_int_mode_for_size (GET_MODE_BITSIZE (read_mode))

Thanks,
Richard

Re: [PATCH v4] dse: Remove partial load after full store for high part access[PR71309]

Reply via email to