On Fri, May 25, 2018 at 12:07:33PM +0800, wei.guo.si...@gmail.com wrote: > _GLOBAL(memcmp) > cmpdi cr1,r5,0 > > - /* Use the short loop if both strings are not 8B aligned */ > - or r6,r3,r4 > + /* Use the short loop if the src/dst addresses are not > + * with the same offset of 8 bytes align boundary. > + */ > + xor r6,r3,r4 > andi. r6,r6,7 > > - /* Use the short loop if length is less than 32B */ > - cmpdi cr6,r5,31 > + /* Fall back to short loop if compare at aligned addrs > + * with less than 8 bytes. > + */ > + cmpdi cr6,r5,7 > > beq cr1,.Lzero > - bne .Lshort > - bgt cr6,.Llong > + bgt cr6,.Lno_short
If this doesn't use cr0 anymore, you can do rlwinm r6,r6,0,7 instead of andi r6,r6,7 . > +.Lsameoffset_8bytes_make_align_start: > + /* attempt to compare bytes not aligned with 8 bytes so that > + * rest comparison can run based on 8 bytes alignment. > + */ > + andi. r6,r3,7 > + > + /* Try to compare the first double word which is not 8 bytes aligned: > + * load the first double word at (src & ~7UL) and shift left appropriate > + * bits before comparision. > + */ > + clrlwi r6,r3,29 > + rlwinm r6,r6,3,0,28 Those last two lines are together just rlwinm r6,r3,3,0x1c > + subfc. r5,r6,r5 Why subfc? You don't use the carry. > + rlwinm r6,r6,3,0,28 That's slwi r6,r6,3 > + bgt cr0,8f > + li r3,-1 > +8: > + blr blelr li r3,-1 blr (and more of the same things elsewhere). Segher