https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #15 from Siarhei Volkau <lis8215 at gmail dot com> --- Created attachment 58437 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58437&action=edit application to test performance of shift Here is the test application (MIPS32 specific) I wrote. It allows to detect execution cycles and extra pipeline stalls for SLL if they take place. for XBurst 1 (jz4725b) result is the following: `SLL to use latency test` execution median: 168417 ns, min: 168416 ns `SLL to use latency test with nop` execution median: 196250 ns, min: 196166 ns `SLL to branch latency test` execution median: 196250 ns, min: 196166 ns `SLL to branch latency test with nop` execution median: 224000 ns, min: 224000 ns `SLL by 7 to use latency test` execution median: 168417 ns, min: 168416 ns `SLL by 15 to use latency test` execution median: 168417 ns, min: 168416 ns `SLL by 23 to use latency test` execution median: 168417 ns, min: 168416 ns `SLL by 31 to use latency test` execution median: 168417 ns, min: 168416 ns `LUI>AND>BEQZ reference test` execution median: 196250 ns, min: 196166 ns `SLL>BGEZ reference test` execution median: 168417 ns, min: 168416 ns and what does it mean: `SLL to use latency test` 168417 ns and `.. with nop` 196250 ns means that there's no extra stall cycles between SLL and further use by ALU operation. `SLL to branch latency test` and `.. with nop` result means that there's no extra stall cycles between SLL and further use by branch operations. `SLL by N` results means that SLL execution time doesn't depend on shift amount. and finally, the reference test results showcases that SLL>BGEZ approach is faster than LUI>AND>BEQZ.