Hi, On Wed, 15 May 2019, Aaron Sawdey wrote:
> Yes this would be a nice thing to get to, a single move/copy underlying > builtin, to which we communicate what the compiler's analysis tells us > about whether the operands overlap and by how much. > > Next question would be how do we move from the existing movmem pattern > (which Michael Matz tells us should be renamed cpymem anyway) to this > new thing. Are you proposing that we still have both movmem and cpymem > optab entries underneath to call the patterns but introduce this new > memmove_with_hints() to be used by things called by > expand_builtin_memmove() and expand_builtin_memcpy()? I'd say so. There are multiple levels at play: a) exposal to user: probably a new __builtint_memmove, or a new combined builtin with a hint param to differentiate (but we can't get rid of __builtin_memcpy/mempcpy/strcpy, which all can go through the same route in the middleend) b) getting it through the gimple pipeline, probably just a new builtin code, trivial c) expanding the new builtin, with the help of next items d) RTL block moves: they are defined as non-overlapping and I don't think we should change this (essentially they're the reflection of struct copies in C) e) how any of the above (builtins and RTL block moves) are implemented: currently non-overlapping only, using movmem pattern when possible; ultimately all sitting in the emit_block_move_hints() routine. So, I'd add a new method to emit_block_move_hints indicating possible overlap, disabling the use of move_by_pieces. Then in emit_block_move_via_movmem (alse getting an indication of overlap), do the equivalent of: finished = 0; if (overlap_possible) { if (optab[movmem]) finished = emit(movmem) } else { if (optab[cpymem]) finished = emit(cpymem); if (!finished && optab[movmem]) // can use movmem also for overlap finished = emit(movmem); } The overlap_possible method would only ever be used from the builtin expansion, and never from the RTL block move expand. Additionally a target may optionally only define the movmem pattern if it's just as good as the cpymem pattern (e.g. because it only handles fixed small sizes and uses a load-all then store-all sequence). Ciao, Michael.