https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #15 from Siarhei Volkau <lis8215 at gmail dot com> ---
Created attachment 58437
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58437&action=edit
application to test performance of shift

Here is the test application (MIPS32 specific) I wrote.

It allows to detect execution cycles and extra pipeline stalls for SLL if they
take place.

for XBurst 1 (jz4725b) result is the following:

`SLL to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL to use latency test with nop` execution median: 196250 ns, min: 196166 ns

`SLL to branch latency test` execution median: 196250 ns, min: 196166 ns
`SLL to branch latency test with nop` execution median: 224000 ns, min: 224000
ns

`SLL by 7 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 15 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 23 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 31 to use latency test` execution median: 168417 ns, min: 168416 ns

`LUI>AND>BEQZ reference test` execution median: 196250 ns, min: 196166 ns
`SLL>BGEZ reference test` execution median: 168417 ns, min: 168416 ns



and what does it mean:
`SLL to use latency test` 168417 ns and `.. with nop` 196250 ns
means that there's no extra stall cycles between SLL and further use by ALU
operation.

`SLL to branch latency test` and `.. with nop` result
means that there's no extra stall cycles between SLL and further use by branch
operations.

`SLL by N` results means that SLL execution time doesn't depend on shift
amount.

and finally, the reference test results showcases that SLL>BGEZ approach is
faster than LUI>AND>BEQZ.

Reply via email to