Am 22.01.24 um 08:45 schrieb Richard Biener:
On Fri, Jan 19, 2024 at 5:06 PM Georg-Johann Lay <a...@gjlay.de> wrote:Am 18.01.24 um 20:54 schrieb Roger Sayle:This patch tweaks RTL expansion of multi-word shifts and rotates to use PLUS rather than IOR for disjunctive operations. During expansion of these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2) where the constants C1 and C2 guarantee that bits don't overlap. Hence the IOR can be performed by any any_or_plus operation, such as IOR, XOR or PLUS; for word-size operations where carry chains aren't an issue these should all be equally fast (single-cycle) instructions. The benefit of this change is that targets with shift-and-add insns, like x86's lea, can benefit from the LSHIFT-ADD form. An example of a backend that benefits is ARC, which is demonstrated by these two simple functions:But there are also back-ends where this is bad. The reason is that with ORI, the back-end needs only to operate no these sub-words where the sub-mask is non-zero. But for PLUS this is not the case because the back-end does not know that intermediate carry will be zero. Hence, with PLUS, more instructions are needed. An example is AVR, but maybe much more target with multi-word operations are affected in a bad way. Take for example the case with 2 words and a value of 1. LO |= 1 HI |= 0 can be optimized to LO |= 1 but for addition this is not the case: LO += 1 HI +=c 0 ;; Does not know that always carry = 0.I wonder if the PLUS can be done on the lowpart only to make this detail obvious?
For AVR, word_mode is HImode, but the hardware has only 8-bit registers. Moreover splitting insns is not wanted or not possible (due to CCmode). Johann
unsigned long long foo(unsigned long long x) { return x<<2; } which with -O2 is currently compiled to: foo: lsr r2,r0,30 asl_s r1,r1,2 asl_s r0,r0,2 j_s.d [blink] or_s r1,r1,r2 with this patch becomes: foo: lsr r2,r0,30 add2 r1,r2,r1 j_s.d [blink] asl_s r0,r0,2 unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); } which with -O2 is currently compiled to 6 insns + return: bar: lsr r12,r0,30 asl_s r3,r1,2 asl_s r0,r0,2 lsr_s r1,r1,30 or_s r0,r0,r1 j_s.d [blink] or r1,r12,r3 with this patch becomes 4 insns + return: bar: lsr r3,r1,30 lsr r2,r0,30 add2 r1,r2,r1 j_s.d [blink] add2 r0,r3,r0 This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-18 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab to generate PLUS instead or IOR when unioning disjoint bitfields. * optabs.cc (expand_subword_shift): Likewise. (expand_binop): Likewise for double-word rotate. Thanks in advance, Roger