https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96846
--- Comment #4 from andysem at mail dot ru --- (In reply to Jakub Jelinek from comment #3) > mov edx, DWORD PTR [rdi] > cmp edx, esi > sete al > cmp edx, r9d > sete dl > or eax, edx > movzx eax, al > This isn't what the peepholes are looking for, there are several other insns > in between, and peephole2s only work on exact insn sequences, doing anything > more complex would require doing it in some machine specific pass. Yes, I think, this optimization needs to happen at an earlier stage. Rewriting fixed instruction sequences doesn't allow for further optimizations like hoisting the xor out of the loop body. > Note, while in theory it could add xor eax, eax before the cmp edx, esi > insn, it can't add xor edx, edx because the second comparison uses that > register. I don't think it should generate "xor edx, edx". I think, the logic has to be roughly something like this: 1. Check if there is a spare register that we can use for the test result. If there is, allocate it. 2. If we have a register, clear it with a xor before the test. Ideally, move that xor out of the loop. 3. If not, decide if we are going to reuse one of the source registers or spill some other register. 4. In the former case, keep the test/setcc/movxz sequence. In the latter, we can still use xor/test/setcc, after spilling the victim register. I.e. the main point is that it shouldn't try reusing the source register as much; only reuse when you have to. Maybe, this requires some help from the register allocator. I admit, I have little knowledge how gcc internally works, so I may be talking nonsense. That's just my naive thoughts about it.