On Tue, 20 Mar 2018, Ingo Molnar wrote: > * Thomas Gleixner <t...@linutronix.de> wrote: > > > > So I do think we could do more in this area to improve driver > > > performance, if the > > > code is correct and if there's actual benchmarks that are showing real > > > benefits. > > > > If it's about hotpath performance I'm all for it, but the use case here is > > a debug facility... > > > > And if we go down that road then we want a AVX based memcpy() > > implementation which is runtime conditional on the feature bit(s) and > > length dependent. Just slapping a readqq() at it and use it in a loop does > > not make any sense. > > Yeah, so generic memcpy() replacement is only feasible I think if the most > optimistic implementation is actually correct: > > - if no preempt disable()/enable() is required > > - if direct access to the AVX[2] registers does not disturb legacy FPU state > in > any fashion > > - if direct access to the AVX[2] registers cannot raise weird exceptions or > have > weird behavior if the FPU control word is modified to non-standard values > by > untrusted user-space > > If we have to touch the FPU tag or control words then it's probably only good > for > a specialized API.
I did not mean to have a general memcpy replacement. Rather something like magic_memcpy() which falls back to memcpy when AVX is not usable or the length does not justify the AVX stuff at all. Thanks, tglx