On 09/18/2015 05:21 PM, Jiong Wang wrote:
Current conditional compare (CCMP) support in GCC aim to optimize
short circuit for cascade comparision, given a simple conditional
compare candidate:
if (a == 17 || a == 32)
[...]
The problem is current implementation always expand t0 first, then
t1. While the expand order need to consider the rtx costs, because "cmp"
and "ccmp" may have different restrictions that the expand order will
result in performance differences.
For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
while "cmp" accept wider range, so if we expand "a == 32" in the second
step, then it will use "ccmp", and thus an extra "mov reg, 32"
instruction is generated because "32" is out of the range. While if we
expand "a == 32" first, then it's fine as "cmp" can encoding it.
I've played with this patch a bit with an aarch64 cross compiler. First
of all - it doesn't seem to work, I get identical costs and the swapping
doesn't happen. Did you forget to include a rtx_cost patch?
I was a little worried about whether this would be expensive for longer
sequences of conditions, but it seems like it looks only at leafs where
we have two comparisons, so that cost should be minimal. However, it's
possible there's room for improvement in code generation. I was curious
and looked at a slightly more complex testcase
int
foo (int a)
{
if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
return 1;
else
return 2;
}
and this doesn't generate a sequence of ccmps as might have been
expected; we only get pairs of comparisons merged with a bit_ior:
D.2699 = a == 17;
D.2700 = a == 32;
D.2701 = D.2699 | D.2700;
if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
<D.2702>:
D.2703 = a == 47;
D.2704 = a == 53;
D.2705 = D.2703 | D.2704;
if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
and the ccmp expander doesn't see the entire thing. I found that a
little surprising TBH.
Bernd