https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634
Bug ID: 93634 Summary: Improving modular calculations (e.g. divisibility tests). Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: bool f(unsigned n) { return n % 6 == 4; } at -O3 the code generated for x86_64 is mov %edi,%eax mov $0xaaaaaaab,%edx imul %rdx,%rax shr $0x22,%rax lea (%rax,%rax,2),%eax add %eax,%eax sub %eax,%edi cmp $0x4,%edi sete %al retq whereas it could be sub $0x4,%edi imul $0xaaaaaaab,%edi,%edi ror %edi cmp $0x2aaaaaa9,%edi setbe %al retq Notice the later is quite similar to what gcc generates for n % 6 == 3: imul $0xaaaaaaab,%edi,%edi sub $0x1,%edi ror %edi cmp $0x2aaaaaaa,%edi setbe %al retq It's true that there's a small mathematical difference for the cases r <= 3 and r >= 4 but not enough to throw away the faster algorithm. I reckon this is not obvious and I refer to https://accu.org/var/uploads/journals/Overload154.pdf#page=13 which presents the overall idea and some benchmarks. In addition, it makes some comments on gcc's generated code for other cases of n % d == r. References therein provide mathematical proofs and extra benchmarks. FWIW: 1) This relates to bug 82853 and bug 12849 and to a lesser extend bug 89845. 2) Specifically, it confirms the idea (for unsigned integers) described by Orr Shalom Dvory in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853#c33