Issue |
137983
|
Summary |
Missed combining shr and shrx in collatz_f1()
|
Labels |
new issue
|
Assignees |
|
Reporter |
BreadTom
|
See [godbolt](https://godbolt.org/z/5Wh8sG958) and [GCC bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120038).
```
#include <stdint.h>
#include <stdbool.h>
uint64_t
collatz_onlyoddstep (uint64_t oddnum){
return (3 * oddnum + 1);
}
uint64_t
collatz_oddstep (uint64_t oddnum)
{
return (3 * oddnum + 1) / 2;
}
uint64_t
collatz_div2tillodd (uint64_t num)
{
num >>= __builtin_ctzg (num);
return num;
}
uint64_t
collatz_f0 (uint64_t oddnum)
{
oddnum = collatz_onlyoddstep (oddnum);
return collatz_div2tillodd (oddnum);
}
uint64_t
collatz_f1 (uint64_t oddnum)
{
oddnum = collatz_oddstep (oddnum);
return collatz_div2tillodd (oddnum);
}
```
collatz_f1() uses shr then tzcnt then shrx.
collatz_f0() uses only tzcnt then shrx.
collatz_f0() speeds up by 10% when I tested it.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs