On 11/25/2016 01:04 AM, Richard Biener wrote:
On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:


Richard Biener writes:

On Thu, 24 Nov 2016, Richard Biener wrote:

On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote:

Hi,

  I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the avr
  target. I found that the (dump) failure is because there are 4
  instances of memcpy, while the testcase expects only 2 for a
  non-strict align target like the avr.

  Comparing that with a dump generated by x64_64-pc-linux, I found that
  the extra memcpy's come from the forwprop pass, when it replaces
  strcat with strlen and memcpy. For x86_64, the memcpy generated gets
  folded into a load/store in gimple_fold_builtin_memory_op. That
  doesn't happen for the avr because len (2) happens to be bigger than
  MOVE_MAX (1).

  The avr can only move 1 byte efficiently from reg <-> memory, but it's
  more efficient to load and store 2 bytes than to call memcpy, so
  MOVE_MAX_PIECES is set to 2.

  Given that gimple_fold_builtin_memory_op gets to choose between
  leaving the memcpy call as is, or breaking it down to a by-pieces
  move, shouldn't it use MOVE_MAX_PIECES instead of
  MOV_MAX?

  That is what the below patch does, and that makes the test
  pass. Does this sound right?

No, as we handle both memcpy and memmove this way we rely on
the whole storage fit in a single register so we do the
right thing for overlapping memory.

So actually your patch doesn't chnage that, the ordering is ensured
by emitting a single GIMPLE load/store pair.  There are only
four targets defining MOVE_MAX_PIECES, and one (s390) even has
a smaller MOVE_MAX_PIECES than MOVE_MAX (huh).  AVR has larger
MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much
sense to me given their very similar description plus the
fact that AVR can only load a single byte at a time...

The x86 comment says

/* MOVE_MAX_PIECES is the number of bytes at a time which we can
   move efficiently, as opposed to  MOVE_MAX which is the maximum
   number of bytes we can move with a single instruction.

which doesn't match up with

@defmac MOVE_MAX
The maximum number of bytes that a single instruction can move quickly
between memory and registers or between two memory locations.
@end defmac

note "quickly" here.  But OTOH

@defmac MOVE_MAX_PIECES
A C expression used by @code{move_by_pieces} to determine the largest unit
a load or store used to copy memory is.  Defaults to @code{MOVE_MAX}.
@end defmac

here the only difference is "copy memory".  But we try to special
case the one load - one store case, not generally "copy memory".

So I think MOVE_MAX matches my intent when writing the code.

Ok, but isn't that inconsistent with tree-inline.c:estimate_move_cost, which
considers MOVE_MAX_PIECES and MOVE_RATIO to decide between a libcall and
by-pieces move?

Well, I don't understand why we have both MOVE_MAX and MOVE_MAX_PIECES.
There are exactly two uses of MOVE_MAX in GCC AFAICS, the gimple-fold.c
one and caller-save.c which derives MOVE_MAX_WORDS from it.
MOVE_MAX_PIECES has the only use in block move expansion plus the
single use in tree-inline.c.

So I can't give a reason why one or the other should be more valid
but the tree-inline.c one tries to match memcpy expansion (obviously),
while the gimple-fold.c one tries to get at the maximum possible
single-insn move amount (and AVR is odd here having that lower than
MOVE_MAX_PIECES, compared to say s390 which has it the opposite way
around).
It's probably historical. MOVE_MAX_PIECES, IIRC, was primarily to guide whether or not to emit a structure copy inline vs call memcpy.

MOVE_MAX, IIRC, was supposed to control how many consecutive registers we'd try to squash into a single save/restore during caller-saves. ie, it might be profitable to squash a pair of 32 bit saves/restores into a single 64 bit save/restore. But it may not have been profitable to squash a pair of 64bit saves/restores into a 128bit save/restore.



There may have been other uses of each that have been removed over time.

jeff

Reply via email to