Sudakshina Das <sudi....@arm.com> writes: > Apologies for the delay. I have attached another version of the patch. > I have disabled the test cases for ILP32. This is only because function body > check > fails because there is an addition unsigned extension instruction for src > pointer in > every test (uxtw x0, w0). The actual inlining is not different.
Yeah, agree that's the best way of handling the ILP32 difference. > […] > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant. Without > + -mstrict-align, make decisions in "setmem". Otherwise follow a sensible > + default: when optimizing for size adjust the ratio to account for the nit: should just be one space after “:” > […] > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands) > return true; > } > > +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where > + *src is a register we have created with the duplicated value to be set. > */ “*src” -> SRC since there's no dereference now > […] > + /* In case we are optimizing for size or if the core does not > + want to use STP Q regs, lower the max_set_size. */ > + max_set_size = (!speed_p > + || (aarch64_tune_params.extra_tuning_flags > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) > + ? max_set_size/2 : max_set_size; Formatting nit: should be a space either side of “/”. > + while (n > 0) > + { > + /* Find the largest mode in which to do the copy in without > + over writing. */ s/in without/without/ > + opt_scalar_int_mode mode_iter; > + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) > + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) > + cur_mode = mode_iter.require (); > + > + gcc_assert (cur_mode != BLKmode); > + > + mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); > + aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode); > + > + n -= mode_bits; > + > + /* Do certain trailing copies as overlapping if it's going to be > + cheaper. i.e. less instructions to do so. For instance doing a 15 > + byte copy it's more efficient to do two overlapping 8 byte copies than > + 8 + 4 + 2 + 1. */ > + if (n > 0 && n < copy_limit / 2) > + { > + next_mode = smallest_mode_for_size (n, MODE_INT); > + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); Sorry for the runaround, but looking at this again, I'm a bit worried that we only indirectly test that n_bits is within the length of the original set. I guess it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will never exceed mode_bits. I think it might be worth adding an assert to make that “clearer” (maybe only to me, probably obvious to everyone else): gcc_assert (n_bits <= mode_bits); OK with those changes, thanks. Richard > + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT); > + n = n_bits; > + } > + } > + > + return true; > +} > + > + > /* Split a DImode store of a CONST_INT SRC to MEM DST as two > SImode stores. Handle the case when the constant has identical > bottom and top halves. This is beneficial when the two stores can be