From: Thomas Gleixner > Sent: 20 March 2018 09:41 > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > * Thomas Gleixner <t...@linutronix.de> wrote: ... > > > And if we go down that road then we want a AVX based memcpy() > > > implementation which is runtime conditional on the feature bit(s) and > > > length dependent. Just slapping a readqq() at it and use it in a loop does > > > not make any sense. > > > > Yeah, so generic memcpy() replacement is only feasible I think if the most > > optimistic implementation is actually correct: > > > > - if no preempt disable()/enable() is required > > > > - if direct access to the AVX[2] registers does not disturb legacy FPU > > state in > > any fashion > > > > - if direct access to the AVX[2] registers cannot raise weird exceptions > > or have > > weird behavior if the FPU control word is modified to non-standard > > values by > > untrusted user-space > > > > If we have to touch the FPU tag or control words then it's probably only > > good for > > a specialized API. > > I did not mean to have a general memcpy replacement. Rather something like > magic_memcpy() which falls back to memcpy when AVX is not usable or the > length does not justify the AVX stuff at all.
There is probably no point for memcpy(). Where it would make a big difference is memcpy_fromio() for PCIe devices (where longer TLP make a big difference). But any code belongs in its implementation not in every driver. The implementation of memcpy_toio() is nothing like as critical. If might be the code would need to fallback to 64bit accesses if the AVX(2) registers can't currently be accessed - maybe some obscure state.... However memcpy_to/fromio() are both horrid at the moment because they result in byte copies! David