Richard Biener writes: > On Thu, 24 Nov 2016, Richard Biener wrote: > >> On Thu, 24 Nov 2016, Senthil Kumar Selvaraj wrote: >> >> > Hi, >> > >> > I've been analyzing a failing regtest (gcc.dg/strlenopt-8.c) for the avr >> > target. I found that the (dump) failure is because there are 4 >> > instances of memcpy, while the testcase expects only 2 for a >> > non-strict align target like the avr. >> > >> > Comparing that with a dump generated by x64_64-pc-linux, I found that >> > the extra memcpy's come from the forwprop pass, when it replaces >> > strcat with strlen and memcpy. For x86_64, the memcpy generated gets >> > folded into a load/store in gimple_fold_builtin_memory_op. That >> > doesn't happen for the avr because len (2) happens to be bigger than >> > MOVE_MAX (1). >> > >> > The avr can only move 1 byte efficiently from reg <-> memory, but it's >> > more efficient to load and store 2 bytes than to call memcpy, so >> > MOVE_MAX_PIECES is set to 2. >> > >> > Given that gimple_fold_builtin_memory_op gets to choose between >> > leaving the memcpy call as is, or breaking it down to a by-pieces >> > move, shouldn't it use MOVE_MAX_PIECES instead of >> > MOV_MAX? >> > >> > That is what the below patch does, and that makes the test >> > pass. Does this sound right? >> >> No, as we handle both memcpy and memmove this way we rely on >> the whole storage fit in a single register so we do the >> right thing for overlapping memory. > > So actually your patch doesn't chnage that, the ordering is ensured > by emitting a single GIMPLE load/store pair. There are only > four targets defining MOVE_MAX_PIECES, and one (s390) even has > a smaller MOVE_MAX_PIECES than MOVE_MAX (huh). AVR has larger > MOVE_MAX_PIECES than MOVE_MAX, but that seems to not make much > sense to me given their very similar description plus the > fact that AVR can only load a single byte at a time... > > The x86 comment says > > /* MOVE_MAX_PIECES is the number of bytes at a time which we can > move efficiently, as opposed to MOVE_MAX which is the maximum > number of bytes we can move with a single instruction. > > which doesn't match up with > > @defmac MOVE_MAX > The maximum number of bytes that a single instruction can move quickly > between memory and registers or between two memory locations. > @end defmac > > note "quickly" here. But OTOH > > @defmac MOVE_MAX_PIECES > A C expression used by @code{move_by_pieces} to determine the largest unit > a load or store used to copy memory is. Defaults to @code{MOVE_MAX}. > @end defmac > > here the only difference is "copy memory". But we try to special > case the one load - one store case, not generally "copy memory". > > So I think MOVE_MAX matches my intent when writing the code.
Ok, but isn't that inconsistent with tree-inline.c:estimate_move_cost, which considers MOVE_MAX_PIECES and MOVE_RATIO to decide between a libcall and by-pieces move? Regards Senthil > > Richard. > >> Richard. >> >> > Regards >> > Senthil >> > >> > Index: gcc/gimple-fold.c >> > =================================================================== >> > --- gcc/gimple-fold.c (revision 242741) >> > +++ gcc/gimple-fold.c (working copy) >> > @@ -703,7 +703,7 @@ >> > src_align = get_pointer_alignment (src); >> > dest_align = get_pointer_alignment (dest); >> > if (tree_fits_uhwi_p (len) >> > - && compare_tree_int (len, MOVE_MAX) <= 0 >> > + && compare_tree_int (len, MOVE_MAX_PIECES) <= 0 >> > /* ??? Don't transform copies from strings with known length this >> > confuses the tree-ssa-strlen.c. This doesn't handle >> > the case in gcc.dg/strlenopt-8.c which is XFAILed for that >> > >> > >> >>