Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Richard Biener Fri, 04 Aug 2017 03:17:04 -0700

On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <sudi....@arm.com> wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should 
> I conclude that this is a valid tree simplification?


Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

          && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test 
> case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f2, .-f2
> @@ -39,10 +37,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>      sub    sp, sp, #16
>      str    w0, [sp, 12]
> -    mov    w1, 31
>      ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>      add    sp, sp, 16
>      ret
>      .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>        movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
>         movl    $1, %eax
>         salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Reply via email to