Hi Naveen, On Wed, Jun 06, 2018 at 12:06:09PM +0530, Naveen N. Rao wrote: > Simon Guo wrote: > >Hi Michael, > >On Tue, Jun 05, 2018 at 12:16:22PM +1000, Michael Ellerman wrote: > >>Hi Simon, > >> > >>wei.guo.si...@gmail.com writes: > >>> From: Simon Guo <wei.guo.si...@gmail.com> > >>> > >>> There is some room to optimize memcmp() in powerpc 64 bits version for > >>> following 2 cases: > >>> (1) Even src/dst addresses are not aligned with 8 bytes at the beginning, > >>> memcmp() can align them and go with .Llong comparision mode without > >>> fallback to .Lshort comparision mode do compare buffer byte by byte. > >>> (2) VMX instructions can be used to speed up for large size comparision, > >>> currently the threshold is set for 4K bytes. Notes the VMX instructions > >>> will lead to VMX regs save/load penalty. This patch set includes a > >>> patch to add a 32 bytes pre-checking to minimize the penalty. > >>> > >>> It did the similar with glibc commit dec4a7105e (powerpc: > >>Improve memcmp > performance for POWER8). Thanks Cyril Bur's > >>information. > >>> This patch set also updates memcmp selftest case to make it compiled and > >>> incorporate large size comparison case. > >> > >>I'm seeing a few crashes with this applied, I haven't had time to look > >>into what is happening yet, sorry. > >> > > > >The bug is due to memcmp() invokes a C function enter_vmx_ops() > >who will load some PIC value based on r2. > > > >memcmp() doesn't use r2 and if the memcmp() is invoked from kernel > >itself, everything is fine. But if memcmp() is invoked from > >modules[test_user_copy], r2 will be required to be setup > >correctly. Otherwise the enter_vmx_ops() will refer to an > >incorrect/unexisting data location based on wrong r2 value. > > > >Following patch will fix this issue: > >------------ > >diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S > >index 5eba49744a5a..24d093fa89bb 100644 > >--- a/arch/powerpc/lib/memcmp_64.S > >+++ b/arch/powerpc/lib/memcmp_64.S > >@@ -102,7 +102,7 @@ > > * 2) src/dst has different offset to the 8 bytes boundary. The handlers > > * are named like .Ldiffoffset_xxxx > > */ > >-_GLOBAL(memcmp) > >+_GLOBAL_TOC(memcmp) > > cmpdi cr1,r5,0 > > > > /* Use the short loop if the src/dst addresses are not > >---------- > > > >It means the memcmp() fun entry will have additional 2 instructions. Is there > >any way to save these 2 instructions when the memcmp() is actually invoked > >from kernel itself? > > That will be the case. We will end up entering the function via the > local entry point skipping the first two instructions. The Global > entry point is only used for cross-module calls. >
Yes. Thanks :) - Simon