Paolo Bonzini <pbonz...@redhat.com> writes: I think that would be faster on 32-bit hosts, truncs are cheap. And slower perhaps on 64-bit hosts, at least for operations where additional explicit trunctation will be needed (such as before comparisions and after right shifts).
> There could be a disadvantage of this compared to the old code, since > this has a chained algebraic dependency, while the old code's many > instructions might have been more independent. What about these alternatives: setcond LT, t0, arg0, arg1 setcond EQ, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0 subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1 sub s0, s0, s1 ; < 4 == 1 > 2 shli s0, s0, 1 ; < 8 == 2 > 4 ======= setcond LT, t0, arg0, arg1 setcond NE, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 add s0, s0, s1 ; < 2 == 0 > 1 movi s1, 1 add s0, s0, s1 ; < 3 == 1 > 2 shl s1, s1, s0 ; < 8 == 2 > 4 Surely there are many alternative forms. Is your aim to add micro-parallelism? (Your sequences look a bit curious. Did you use a super-optimiser?) -- Torbjörn