On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <sudi....@arm.com> wrote: > > > > > Sorry about the delayed response but looking at the above discussion, should > I conclude that this is a valid tree simplification?
Yes, I think so. Jakub requested code to undo this at RTL expansion based on target costs, not sure if we really should require that from you given the user could have written the target sequence himself. Few comments about the patch: +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) I think this warrants a single_use check on the minus (note :s isn't enough as with the unsigned case we'd happily ignore it by design). + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) You can relax this with using && wi::eq_p (@1, TYPE_PRECISION (type) - 1) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @1) @2) + (with + { tree utype = unsigned_type_for (type); } + (convert:type (rshift (lshift (convert:utype @0) @1) @2)))))) + You can write (convert (rshift ...)), without the :type. I'm leaving it to Jakub whether you need to write that RTL expansion tweak. Thanks, Richard. > I am pasting the diff of the assembly that AArch64 generates with the test > case that I added. I see fewer instructions generated with the patch. > > --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 > +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 > @@ -24,10 +24,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f2, .-f2 > @@ -39,10 +37,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f3, .-f3 > @@ -52,11 +48,9 @@ > f4: > sub sp, sp, #16 > str w0, [sp, 12] > - mov w1, 31 > ldr w0, [sp, 12] > - sub w0, w1, w0 > - mov w1, 1 > - lsl w0, w1, w0 > + mov w1, -2147483648 > + lsr w0, w1, w0 > add sp, sp, 16 > ret > .size f4, .-f4 > > > Thanks > > Sudi > > > > > From: Wilco Dijkstra > Sent: Thursday, April 13, 2017 1:01 PM > To: Richard Biener; Jakub Jelinek > Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > Richard Biener wrote: >> It is IMHO a valid GIMPLE optimization / canonicalization. >> >> movabsq $-9223372036854775808, %rax >> >> so this should then have been generated as 1<<63? >> >> At some point variable shifts were quite expensive as well.. > > Yes I don't see a major difference between movabsq and > > movl $1, %eax > salq $63, %rax > > on my Sandy Bridge, but if the above is faster then that is what the x64 > backend should emit - it's 1 byte smaller as well, so probably better in all > cases. > > Wilco