Issue 137983
Summary Missed combining shr and shrx in collatz_f1()
Labels new issue
Assignees
Reporter BreadTom
    See [godbolt](https://godbolt.org/z/5Wh8sG958) and [GCC bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120038).
```
#include <stdint.h>
#include <stdbool.h>

uint64_t
collatz_onlyoddstep (uint64_t oddnum){
  return (3 * oddnum + 1);
}

uint64_t
collatz_oddstep (uint64_t oddnum)
{
  return (3 * oddnum + 1) / 2;
}

uint64_t
collatz_div2tillodd (uint64_t num)
{
  num >>= __builtin_ctzg (num);
  return num;
}

uint64_t
collatz_f0 (uint64_t oddnum)
{
  oddnum = collatz_onlyoddstep (oddnum);
  return collatz_div2tillodd (oddnum);
}

uint64_t
collatz_f1 (uint64_t oddnum)
{
  oddnum = collatz_oddstep (oddnum);
  return collatz_div2tillodd (oddnum);
}
```
collatz_f1() uses shr then tzcnt then shrx.
collatz_f0() uses only tzcnt then shrx.

collatz_f0() speeds up by 10% when I tested it.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to