Sudakshina Das <sudi....@arm.com> writes:
> Apologies for the delay. I have attached another version of the patch.
> I have disabled the test cases for ILP32. This is only because function body 
> check
> fails because there is an addition unsigned extension instruction for src 
> pointer in
> every test (uxtw    x0, w0). The actual inlining is not different.

Yeah, agree that's the best way of handling the ILP32 difference.

> […]
> +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant.  Without
> +   -mstrict-align, make decisions in "setmem".  Otherwise follow a sensible
> +   default:  when optimizing for size adjust the ratio to account for the

nit: should just be one space after “:”

> […]
> @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
>    return true;
>  }
>  
> +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
> +   *src is a register we have created with the duplicated value to be set.  
> */

“*src” -> SRC
since there's no dereference now

> […]
> +  /* In case we are optimizing for size or if the core does not
> +     want to use STP Q regs, lower the max_set_size.  */
> +  max_set_size = (!speed_p
> +               || (aarch64_tune_params.extra_tuning_flags
> +                   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> +               ? max_set_size/2 : max_set_size;

Formatting nit: should be a space either side of “/”.

> +  while (n > 0)
> +    {
> +      /* Find the largest mode in which to do the copy in without
> +      over writing.  */

s/in without/without/

> +      opt_scalar_int_mode mode_iter;
> +      FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> +     if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> +       cur_mode = mode_iter.require ();
> +
> +      gcc_assert (cur_mode != BLKmode);
> +
> +      mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> +      aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);
> +
> +      n -= mode_bits;
> +
> +      /* Do certain trailing copies as overlapping if it's going to be
> +      cheaper.  i.e. less instructions to do so.  For instance doing a 15
> +      byte copy it's more efficient to do two overlapping 8 byte copies than
> +      8 + 4 + 2 + 1.  */
> +      if (n > 0 && n < copy_limit / 2)
> +     {
> +       next_mode = smallest_mode_for_size (n, MODE_INT);
> +       int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();

Sorry for the runaround, but looking at this again, I'm a bit worried
that we only indirectly test that n_bits is within the length of the
original set.  I guess it is because if n < copy_limit / 2 then
n < mode_bits, and so n_bits will never exceed mode_bits.  I think
it might be worth adding an assert to make that “clearer” (maybe
only to me, probably obvious to everyone else):

          gcc_assert (n_bits <= mode_bits);

OK with those changes, thanks.

Richard

> +       dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> +       n = n_bits;
> +     }
> +    }
> +
> +  return true;
> +}
> +
> +
>  /* Split a DImode store of a CONST_INT SRC to MEM DST as two
>     SImode stores.  Handle the case when the constant has identical
>     bottom and top halves.  This is beneficial when the two stores can be

Reply via email to