Hi Richard,

(that's quick!)

> +  if (size > max_copy_size || size > max_mops_size)
> +    return aarch64_expand_cpymem_mops (operands, is_memmove);
>
> Could you explain this a bit more?  If I've followed the logic correctly,
> max_copy_size will always be 0 for movmem, so this "if" condition will
> always be true for movmem (given that the caller can be relied on to
> optimise away zero-length copies).  So doesn't this function reduce to:

In this patch it is zero yes, but there is no real reason for that. The goal is 
to
share as much code as possible. I have a patch that inlines memmove like
memcpy.

> when is_memmove is true?  If so, I think it would be clearer to do that
> directly, rather than go through aarch64_expand_cpymem.  max_copy_size
> is really an optimisation threshold, whereas the above seems to be
> leaning on it for correctness.

In principle we could for the time being add a assert (!is_memmove) if that
makes it clearer memmove isn't yet handled.

> ...I think we might as well keep this pattern conditional on TARGET_MOPS.

But then we have inconsistencies in the conditions of the expanders, which
is what led to all these bugs in the first place (I lost count, there are 4 or 5
different bugs I fixed). Ensuring everything is 100% identical between
memcpy and memmove makes the code much easier to follow.

> I think we can then also split:
>
>   /* All three registers are changed by the instruction, so each one
>      must be a fresh pseudo.  */
>   rtx dst_addr = copy_to_mode_reg (Pmode, XEXP (operands[0], 0));
>   rtx src_addr = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));
>   rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>   rtx src_mem = replace_equiv_address (operands[1], src_addr);
>   rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
>
> out of aarch64_expand_cpymem_mops into a new function (say
> aarch64_prepare_mops_operands) and call it from the movmemdi
> expander.  There should then be no need for the extra staging
> expander (aarch64_movmemdi).

So you're saying we could remove aarch64_cpymemdi/movmemdi if
aarch64_expand_cpymem_mops did massage the operands in the
right way so that we can immediately match the underlying instruction?

Hmm, does that actually work, as in we don't lose the extra alias info that
gets lost in the current memmove expander? (another bug/inconsistency)

And the MOPS code would be separated from aarch64_expand_cpymem
so we'd do all the MOPS size tests inside aarch64_expand_cpymem_mops
and the expander tries using MOPS first and if it fails try inline expansion?

So something like:

(define_expand "movmemdi"
....
  if (aarch64_try_mops_expansion (operands, is_memmove))
    DONE;
  if (aarch64_try_inline_copy_expansion (operands, is_memmove))
    DONE;
  FAIL;
)

> IMO the STRICT_ALIGNMENT stuff should be a separate patch,
> with its own testcases.

We will need backports to fix all these bugs, so the question is whether it
is worth doing a lot of cleanups now?

Cheers,
Wilco

Reply via email to