Bernd Schmidt writes: > On 09/18/2015 05:21 PM, Jiong Wang wrote: >> >> Current conditional compare (CCMP) support in GCC aim to optimize >> short circuit for cascade comparision, given a simple conditional >> compare candidate: >> >> if (a == 17 || a == 32) > [...] >> The problem is current implementation always expand t0 first, then >> t1. While the expand order need to consider the rtx costs, because "cmp" >> and "ccmp" may have different restrictions that the expand order will >> result in performance differences. >> >> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31 >> while "cmp" accept wider range, so if we expand "a == 32" in the second >> step, then it will use "ccmp", and thus an extra "mov reg, 32" >> instruction is generated because "32" is out of the range. While if we >> expand "a == 32" first, then it's fine as "cmp" can encoding it. > > I've played with this patch a bit with an aarch64 cross compiler. First > of all - it doesn't seem to work, I get identical costs and the swapping > doesn't happen. Did you forget to include a rtx_cost patch?
No. Please see NOTE part of the description. AArch64 doesn't cost ccmp currently. It will be fixed by a seperate patch later. The testcase is thus marked as XFAIL. > > I was a little worried about whether this would be expensive for longer > sequences of conditions, but it seems like it looks only at leafs where > we have two comparisons, so that cost should be minimal. However, it's > possible there's room for improvement in code generation. I was curious > and looked at a slightly more complex testcase > > int > foo (int a) > { > if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72) > return 1; > else > return 2; > } Yes, this patch only cost the inner most call of the recursive call of expand_ccmp_expr_1 which is simple and common situation. > > and this doesn't generate a sequence of ccmps as might have been > expected; we only get pairs of comparisons merged with a bit_ior: > > D.2699 = a == 17; > D.2700 = a == 32; > D.2701 = D.2699 | D.2700; > if (D.2701 != 0) goto <D.2697>; else goto <D.2702>; > <D.2702>: > D.2703 = a == 47; > D.2704 = a == 53; > D.2705 = D.2703 | D.2704; > if (D.2705 != 0) goto <D.2697>; else goto <D.2706>; > > and the ccmp expander doesn't see the entire thing. I found that a > little surprising TBH. It's because fold_truth_andor has re-written the tree to avoid the use short circuit given LOGICAL_OP_NON_SHORT_CIRCUIT returns true. -- Regards, Jiong