https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84431
Bug ID: 84431 Summary: Suboptimal code for masked shifts (x86/x86-64) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nruslan_devel at yahoo dot com Target Milestone: --- In x86 and x86-64, the assumption is that upper bits of the CL register are unused (i.e., masked) when doing a shift operation. It is not possible to do shift for more than (WORD_BITS - 1) positions. Normally, the compiler has to check whether the specified shift value exceeds the word size before generating corresponding shld/shl commands (shrd/shr, etc). Now, if the shift value is given by some variable, it is normally unknown at compile time whether it is exceeding (WORD_BITS - 1), so the compiler has to generate corresponding checks. On the other hand, it is very easy to give a hint to the compiler (if it is known that the shift < WORD_BITS) by masking shift value like this (the example below is for i386; for x86-64 the type will be __uint128_t and mask 63): unsigned long long func(unsigned long long a, unsigned shift) { return a << (shift & 31); } In the ideal scenario, the compiler has to just load value to CL without even masking it because it is implied already by the shift operation. Note that clang/LLVM recognizes this pattern (at least for i386) by generating the following assembly code: func: # @func pushl %esi movl 8(%esp), %esi movb 16(%esp), %cl movl 12(%esp), %edx movl %esi, %eax shldl %cl, %esi, %edx shll %cl, %eax popl %esi retl GCC generates suboptimal code in this case: func: pushl %esi pushl %ebx movl 20(%esp), %ecx movl 16(%esp), %esi movl 12(%esp), %ebx andl $31, %ecx movl %esi, %edx shldl %ebx, %edx movl %ebx, %eax xorl %ebx, %ebx sall %cl, %eax andl $32, %ecx cmovne %eax, %edx cmovne %ebx, %eax popl %ebx popl %esi ret