https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682
Bug ID: 113682 Summary: Branches in branchless binary search rather than cmov/csel/csinc Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- I've been trying to eliminate unpredictable branches in a hot function where perf counters show a high mispredict percentage. Unfortunately, I haven't been able to find an incantation to get gcc to generate branchless code other than inline asm, which I'd rather avoid. In this case I've even laid out the critical lines so that they exactly match the behavior of the csinc and csel instructions on arm64, but they are still not used. Somewhat minimized repro: typedef unsigned long size_t; struct ITEM {char* addr; size_t len;}; int cmp(ITEM* user, ITEM* tree); size_t bsearch2(ITEM* user, ITEM** tree, size_t treeSize) { auto pos = tree; size_t low = 0; size_t high = treeSize; while (low < high) { size_t i = (low + high) / 2; int res = cmp(user, tree[i]); // These should be cmp + csinc + csel on arm // and lea + test + cmov + cmov on x86. low = res > 0 ? i + 1 : low; // csinc high = res < 0 ? i: high; // csel if (res == 0) return i; } return -1; } On arm64 that generates a conditional branch on res > 0: bl cmp(ITEM*, ITEM*) cmp w0, 0 bgt .L15 // does low = i + 1 then loops mov x20, x19 bne .L4 // loop On x86_64 it does similar: call cmp(ITEM*, ITEM*) test eax, eax jg .L16 jne .L17 Note that clang produces the desired codegen for both: arm: bl cmp(ITEM*, ITEM*) cmp w0, #0 csinc x23, x23, x22, le csel x19, x22, x19, lt cbnz w0, .LBB0_1 x86: call cmp(ITEM*, ITEM*)@PLT lea rcx, [r12 + 1] test eax, eax cmovg r13, rcx cmovs rbx, r12 jne .LBB0_1 (full output for all 4 available at https://www.godbolt.org/z/aWrKbYPTG. Code snippets from trunk, but also repos on 13.2) While ideally gcc would generate the branchless output for the supplied code, if there is some (reasonable) incantation that would cause it to produce branchless output, I'd be happy to have that too.