Richard Biener writes:

> On Thu, 24 Nov 2016, Richard Biener wrote:
>
>> On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:
>> 
>> > Hi,
>> > 
>> >   I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the avr
>> >   target. I found that the (dump) failure is because there are 4
>> >   instances of memcpy, while the testcase expects only 2 for a
>> >   non-strict align target like the avr.
>> > 
>> >   Comparing that with a dump generated by x64_64-pc-linux, I found that
>> >   the extra memcpy's come from the forwprop pass, when it replaces
>> >   strcat with strlen and memcpy. For x86_64, the memcpy generated gets
>> >   folded into a load/store in gimple_fold_builtin_memory_op. That
>> >   doesn't happen for the avr because len (2) happens to be bigger than
>> >   MOVE_MAX (1).
>> > 
>> >   The avr can only move 1 byte efficiently from reg <-> memory, but it's
>> >   more efficient to load and store 2 bytes than to call memcpy, so
>> >   MOVE_MAX_PIECES is set to 2.
>> > 
>> >   Given that gimple_fold_builtin_memory_op gets to choose between
>> >   leaving the memcpy call as is, or breaking it down to a by-pieces
>> >   move, shouldn't it use MOVE_MAX_PIECES instead of
>> >   MOV_MAX?
>> > 
>> >   That is what the below patch does, and that makes the test
>> >   pass. Does this sound right?
>> 
>> No, as we handle both memcpy and memmove this way we rely on
>> the whole storage fit in a single register so we do the
>> right thing for overlapping memory.
>
> So actually your patch doesn't chnage that, the ordering is ensured
> by emitting a single GIMPLE load/store pair.  There are only
> four targets defining MOVE_MAX_PIECES, and one (s390) even has
> a smaller MOVE_MAX_PIECES than MOVE_MAX (huh).  AVR has larger
> MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much
> sense to me given their very similar description plus the
> fact that AVR can only load a single byte at a time...
>
> The x86 comment says
>
> /* MOVE_MAX_PIECES is the number of bytes at a time which we can
>    move efficiently, as opposed to  MOVE_MAX which is the maximum
>    number of bytes we can move with a single instruction.
>
> which doesn't match up with
>
> @defmac MOVE_MAX
> The maximum number of bytes that a single instruction can move quickly
> between memory and registers or between two memory locations.
> @end defmac
>
> note "quickly" here.  But OTOH
>
> @defmac MOVE_MAX_PIECES
> A C expression used by @code{move_by_pieces} to determine the largest unit
> a load or store used to copy memory is.  Defaults to @code{MOVE_MAX}.
> @end defmac
>
> here the only difference is "copy memory".  But we try to special
> case the one load - one store case, not generally "copy memory".
>
> So I think MOVE_MAX matches my intent when writing the code.

Ok, but isn't that inconsistent with tree-inline.c:estimate_move_cost, which
considers MOVE_MAX_PIECES and MOVE_RATIO to decide between a libcall and
by-pieces move?

Regards
Senthil

>
> Richard.
>
>> Richard.
>> 
>> > Regards
>> > Senthil
>> > 
>> > Index: gcc/gimple-fold.c
>> > ===================================================================
>> > --- gcc/gimple-fold.c      (revision 242741)
>> > +++ gcc/gimple-fold.c      (working copy)
>> > @@ -703,7 +703,7 @@
>> >        src_align = get_pointer_alignment (src);
>> >        dest_align = get_pointer_alignment (dest);
>> >        if (tree_fits_uhwi_p (len)
>> > -    && compare_tree_int (len, MOVE_MAX) <= 0
>> > +    && compare_tree_int (len, MOVE_MAX_PIECES) <= 0
>> >      /* ???  Don't transform copies from strings with known length this
>> >         confuses the tree-ssa-strlen.c.  This doesn't handle
>> >         the case in gcc.dg/strlenopt-8.c which is XFAILed for that
>> > 
>> > 
>> 
>> 

Reply via email to