On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:
Hi all,
This patch improves RTL if-conversion on sequences that perform a
conditional select on integer constants.
Most of the smart logic to do that already exists in the
noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two
immediates and performs a conditional
select between them. It returns true and then the caller
noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that
noce_try_store_flag_constants allows!
Calling noce_try_store_flag_constants before noce_try_cmove allows for
the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for
the C code:
int
foo (int a, int b)
{
return ((a & (1 << 25)) ? 5 : 4);
}
i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
generates the naive:
and w2, w0, 33554432 // mask away all bits except bit 25
mov w1, 4
cmp w2, wzr
mov w0, 5
csel w0, w0, w1, ne
whereas with this patch this can be transformed into the much better:
ubfx x0, x0, 25, 1 // extract bit 25
add w0, w0, 4
I suspect the PA would benefit from this as well, probably several
other architectures with reasonable bitfield extraction capabilities.
Another issue I encountered is that the code that claims to perform the
transformation:
/* if (test) x = 3; else x = 4;
=> x = 3 + (test == 0); */
doesn't seem to do exactly that in all cases. In fact for that case it
will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional
increment operation.
I vaguely recall targets that don't have const - X insns, but do have X
+ const style insns. And more generally I think we're better off
generating the PLUS version.
This patch tweaks that code to always try to generate an addition of the
condition rather than
a subtraction.
Anyway, for code:
int
fooinc (int x)
{
return x ? 1025 : 1026;
}
we currently generate:
mov w2, 1025
mov w1, 1026
cmp w0, wzr
csel w0, w2, w1, ne
whereas with this patch we will generate:
cmp w0, wzr
cset w0, eq
add w0, w0, 1025
Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?
Thanks,
Kyrill
P.S. noce_try_store_flag_constants is currently gated on
!targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For
example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a
separate investigation.
2015-07-29 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
-STORE_FLAG and condition is reversable. Prefer to add to the
flag value.
(noce_process_if_block): Try noce_try_store_flag_constants before
noce_try_cmove.
2015-07-29 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* gcc.target/aarch64/csel_bfx_1.c: New test.
* gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
ifcvt-csel-imms.patch
commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov<kyrylo.tkac...@arm.com>
Date: Tue Jul 28 14:59:46 2015 +0100
[RTL-ifcvt] Improve conditional increment ops on immediates
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info
*if_info)
reversep = 0;
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
- normalize = 0;
+ normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
We generally avoid using a ',' operator like this. However, I can see
that you're just following existing convention in that code. So I won't
object.
else if (ifalse == 0 && exact_log2 (itrue) >= 0
&& (STORE_FLAG_VALUE == 1
|| if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info
*if_info)
=> x = 3 + (test == 0); */
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
{
- target = expand_simple_binop (mode,
- (diff == STORE_FLAG_VALUE
- ? PLUS : MINUS),
- gen_int_mode (ifalse, mode), target,
+ rtx_code code = reversep ? PLUS :
+ (diff == STORE_FLAG_VALUE ? PLUS
+ : MINUS);
+ HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+ target = expand_simple_binop (mode, code,
+ gen_int_mode (to_add, mode), target,
if_info->x, 0, OPTAB_WIDEN);
}
Note that STORE_FLAG_VALUE can potentially be any value. Most targets
use "1", but others use -1 (m68k for example, there are others). I'm
concerned that the reversep ? MIN (ifalse, itrue) won't do what we want
on targets such as the m68k.
The logic here is also somewhat convoluted. When reversep is true, we
always use PLUS, but is that actually correct? Normally we'd compute
the code, then invert the code. ie, if we normally want PLUS, then
we've flip to MINUS and vice-versa.
@@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
goto success;
if (noce_try_abs (if_info))
goto success;
+ if (!targetm.have_conditional_execution ()
+ && noce_try_store_flag_constants (if_info))
+ goto success;
if (HAVE_conditional_move
&& noce_try_cmove (if_info))
goto success;
if (! targetm.have_conditional_execution ())
{
- if (noce_try_store_flag_constants (if_info))
- goto success;
if (noce_try_addcc (if_info))
goto success;
if (noce_try_store_flag_mask (if_info))
This part seems fine and ought to be able to go forward independently.
Consider this hunk and its associated testcase approved if you want to
test it independently and get it installed while we iterate on the other
hunks,
jeff