[Bug tree-optimization/115529] New: Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)

Explorer09 at gmail dot com via Gcc-bugs Mon, 17 Jun 2024 18:36:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115529


            Bug ID: 115529
           Summary: Optimization with "bit mask and compare equality" ((x
                    & C1) == C2), ((x | C3) == C4)
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

I thought this pattern is commonly seen, but I didn't find any issue report in
GCC mentioning this.

```c
#include <stdbool.h>

bool pred_bitor(unsigned int x) {
    return (x | 0x1F) == 0x381F;
}
bool pred_bitand(unsigned int x) {
    return (x & ~(uint32_t)0x1F) == 0x3800;
}
bool pred_shift(unsigned int x) {
    return (x >> 5) == (0x3800 >> 5);
}
```

The three predicates are equivalent, but GCC doesn't seem to recognize one can
be converted to another. Clang does recognize that, however.

My optimization request is this:
For the patterns ((x & C1) == C2) and ((x | C3) == C4), where C1 to C4 are all
compile time constants, try converting one code pattern to another, and figure
out which can generate faster or smaller code (or maybe both).

* If any constant happens to be already loaded into one register, then reusing
that constant can save code size by not needing to load the constant into
register for compare op. In other words, which pattern is the best depends a
lot on the surrounding code, as well is CPU instruction sets.

* My personal testing says that pred_bitand() can win in more cases than the
other two:
** In 32-bit ARM, the constant 0x3800 can fit into an immediate operand but
0x381F cannot, which means there would be an additional "mov" instruction for
0x381F.
** In RISC-V, the "andi" has a 16-bit encoding (with the "C"/Compressed
instruction extension) that works here, but there is no 16-bit encoding for
"ori" instruction.

* The pred_shift() case can work only if the mask constant is in the `((1 << N)
- 1)` form, or `((unsigned)-1 << M)` form. It doesn't work with all mask
constants but I think you knew that already.

[Bug tree-optimization/115529] New: Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)

Reply via email to