On Tue, Mar 22, 2022 at 9:20 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Mar 22, 2022 at 3:51 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Mon, Mar 14, 2022 at 8:44 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > > > > > > Richard Biener <richard.guent...@gmail.com> writes: > > > > On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford > > > > <richard.sandif...@arm.com> wrote: > > > >> > > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > >> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > >> >> > > > >> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote: > > > >> >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches > > > >> >> > <gcc-patches@gcc.gnu.org> wrote: > > > >> >> > > > > > >> >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to > > > >> >> > > fold memcpy. > > > >> >> > > The default is > > > >> >> > > > > > >> >> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun)) > > > >> >> > > > > > >> >> > > For x86, it is MOVE_MAX to restore the old behavior before > > > >> >> > > > > >> >> > I know we've discussed this to death in the PR, I just want to > > > >> >> > repeat here > > > >> >> > that the GIMPLE folding expects to generate a single load and a > > > >> >> > single > > > >> >> > store (that is what it does on the GIMPLE level) which is why > > > >> >> > MOVE_MAX > > > >> >> > was chosen originally (it's documented to what a "single > > > >> >> > instruction" does). > > > >> >> > In practice MOVE_MAX does not seem to cover vector register sizes > > > >> >> > so Richard pulled MOVE_RATIO which is really intended to cover > > > >> >> > the case of using multiple instructions for moving memory (but > > > >> >> > then I > > > >> >> > don't remember whether for the ARM case the single load/store > > > >> >> > GIMPLE > > > >> >> > will be expanded to multiple load/store instructions). > > > >> >> > > > > >> >> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution, > > > >> >> > being very specific for memcpy folding (we also fold memmove btw). > > > >> >> > > > > >> >> > There is also MOVE_MAX_PIECES which _might_ be more appropriate > > > >> >> > than MOVE_MAX here and still honor the idea of single > > > >> >> > instructions. > > > >> >> > Now neither arm nor aarch64 define this and it defaults to > > > >> >> > MOVE_MAX, > > > >> >> > not MOVE_MAX * MOVE_RATIO. > > > >> >> > > > > >> >> > So if we need a new hook then that hook should at least get the > > > >> >> > 'speed' argument of MOVE_RATIO and it should get a better name. > > > >> >> > > > > >> >> > I still think that it should be possible to improve the insn > > > >> >> > check to > > > >> >> > avoid use of "disabled" modes, maybe that's also a point to add > > > >> >> > a new hook like .move_with_mode_p or so? To quote, we do > > > >> >> > > > >> >> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P. > > > >> > > > > >> > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely > > > >> > mentions "a load or store used TO COPY MEMORY" (emphasis mine) > > > >> > and whose x86 implementation would already be fine (doing larger > > > >> > moves > > > >> > and also not doing too large moves). But appearantly the arm folks > > > >> > decided that that's not fit and instead (mis-?)used MOVE_MAX * > > > >> > MOVE_RATIO. > > > >> > > > >> It seems like there are old comments and old documentation that justify > > > >> both interpretations, so there are good arguments on both sides. But > > > >> with this kind of thing I think we have to infer the meaning of the > > > >> macro from the way it's currently used, rather than trusting such old > > > >> and possibly out-of-date and contradictory information. > > > >> > > > >> FWIW, I agree that (if we exclude old reload, which we should!) the > > > >> only direct uses of MOVE_MAX before the patch were not specific to > > > >> integer registers and so MOVE_MAX should include vectors if the > > > >> target wants vector modes to be used for general movement. > > > >> > > > >> Even if people disagree that that's the current meaning, I think it's > > > >> at least a sensible meaning. It provides information that AFAIK isn't > > > >> available otherwise, and it avoids overlap with MAX_FIXED_MODE_SIZE. > > > >> > > > >> So FWIW, I think it'd be reasonable to change non-x86 targets if they > > > >> want vector modes to be used for single-insn copies. > > > > > > > > Note a slight complication in the GIMPLE folding case is that we > > > > do not end up using vector modes but we're using "fake" > > > > integer modes like OImode which x86 has move patterns for. > > > > If we'd use vector modes we could use existing target hooks to > > > > eventually decide whether auto-using those is desired or not. > > > > > > Hmm, yeah. Certainly we shouldn't require the target to support > > > a scalar integer equivalent of every vector mode. > > > > > > > I'd like to resolve this before GCC 12 is released. > > I've come to the conclusion that we should revert r12-3482-g5f6a6c91d7c592, > it abuses MOVE_MAX * MOVE_RATIO to trick GIMPLE into thinking we can > move a larger amount of data with a single instruction while in reality it > wants > to use multiple load/store stmts - that's something the GIMPLE folding > doesn't do. > If arm wants to use larger moves with a single instruction it should > adjust MOVE_MAX > instead. In fact the motivating testcase doesn't use larger loads - > we are just able > to elide the memcpy without the stack copy.
Btw, the memcpy would have been folded by the following code but chickens out due to arm being STRICT_ALIGNMENT. /* Now that we chose an access type express the other side in terms of it if the target allows that with respect to alignment constraints. */ if (srcvar == NULL_TREE) { if (src_align >= TYPE_ALIGN (desttype)) srcvar = fold_build2 (MEM_REF, desttype, src, off0); else { if (STRICT_ALIGNMENT) return false; that path could be improved to handle the case where the desttype mode is not BLKmode and thus we can check movmisalign_optab. I'm going to test the attached. I've verified that with this I can revert r12-3482-g5f6a6c91d7c592 without regressing uint64_t bar64(const uint8_t *rData1) { uint64_t buffer; memcpy(&buffer, rData1, sizeof(buffer)); return buffer; } on arm. Richard. > > Alternatively as Richard hints above, the folding code should try > using integer vector > modes (and adhere to the vector target hooks as to which modes to "auto" use). > > That said, iff we do not want to "regress" arm by reversion the new > hackish target hook > should be on the arm side - sth like MOVE_MAX_FOR_GIMPLE, defaulting > to MOVE_MAX. > > If we want to use vector integer types/modes or fold to multiple (up > to MOVE_RATIO) stmts > that's stage1 material. > > Richard. > > > > > Thanks. > > > > > > -- > > H.J.