On Sat, Mar 25, 2017 at 07:41:17PM +0300, Alexey Dobriyan wrote: > Current addr4_match() code has special test for /0 prefixes because of > standard required undefined behaviour. However, it is possible to omit > it on 64-bit because shifting can be done within a 64-bit register and > then truncated to the expected value (which is 0 mask). > > Implicit truncation by htonl() fits nicely into R32-within-R64 model > on x86-64. > > Space savings: none (coincidence) > Branch savings: 1 > > Before: > > movzx eax,BYTE PTR [rdi+0x2a] # ->prefixlen_d > test al,al > jne xfrm_selector_match + 0x23f > ... > movzx eax,BYTE PTR [rbx+0x2b] # ->prefixlen_s > test al,al > je xfrm_selector_match + 0x1c7 > > After (no branches): > > mov r8d,0x20 > mov rdx,0xffffffffffffffff > mov esi,DWORD PTR [rsi+0x2c] > mov ecx,r8d > sub cl,BYTE PTR [rdi+0x2a] > xor esi,DWORD PTR [rbx] > mov rdi,rdx > xor eax,eax > shl rdi,cl > bswap edi > > Signed-off-by: Alexey Dobriyan <adobri...@gmail.com>
Also applied to ipsec-next, thanks for the patches Alexey!