https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
--- Comment #12 from Siarhei Volkau <lis8215 at gmail dot com> --- Highly likely it's because of data dependency, and not direct cost of shift operations on LoongArch, although can't find information to prove that. So, I guess it still might get performance benefit in cases where scheduler can put some instruction(s) between SLL and BGEZ. Since you have access to hardware you can measure performace of two variants: 1) SLL+BGEZ 2) SLL+NOT+BGEZ if their performance is equal then I'm correct and scheduling automaton for GS464 seems have to be fixed. >From my side I can confirm that SLL+BGEZ is faster than LUI+AND+BEQ on Ingenic XBurst 1 cores.