Hi, The patch is updated according to the comments. Changes include: * Rewrite codes according to Richard Biener's comments. * Change the algorithm to recursively combine compares. So it can handle any number of compares. * Add a set of instruction patterns in ARM backend to match the conditional compares.
With this patch, the conditional compare sequence is like CC1 = CCMP (CMP (a, b), CMP (c, d)); CC2 = CCMP (NE (CC1, 0), CMP (e, f)); ... CCn/reg = CCMP (NE (CCn-1, 0), CMP (...)); To test more than two compares, you need the patch discussed in the thread. http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg63743.html Bootstrap and no make check regression for both ARM and THUMB modes on ARM Chromebook. ChangeLog: 2013-10-22 Zhenqiang Chen <zhenqiang.c...@linaro.org> * config/arm/arm.c (arm_fixed_condition_code_regs, arm_ccmode_to_code, arm_select_dominance_ccmp_mode): New functions. (arm_select_dominance_cc_mode_1): New function extracted from arm_select_dominance_cc_mode. (arm_select_dominance_cc_mode): Call arm_select_dominance_cc_mode_1. * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior, ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc, ccmp_and_scc_scc_cmp): New. * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode): New. * expr.c (ccmp_candidate_p, used_in_cond_stmt_p, expand_ccmp_expr_2, expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr): New. (expand_expr_real_1): Handle ccmp. * optabs.c: Include gimple.h. (expand_ccmp_op): New. (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. * optabs.def (ccmp): New. * optabs.h (expand_ccmp_op): New. * doc/md.texi (ccmp): New index. Thanks! -Zhenqiang > -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen > Sent: Monday, September 23, 2013 2:50 PM > To: Richard Earnshaw > Cc: 'Richard Biener'; GCC Patches > Subject: RE: [PATCH 1/n] Add conditional compare support > > > > > -----Original Message----- > > From: Richard Earnshaw > > Sent: Thursday, September 19, 2013 5:13 PM > > To: Zhenqiang Chen > > Cc: 'Richard Biener'; GCC Patches > > Subject: Re: [PATCH 1/n] Add conditional compare support > > > > On 18/09/13 10:45, Zhenqiang Chen wrote: > > > > > >> -----Original Message----- > > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > > >> Sent: Tuesday, August 27, 2013 8:18 PM > > >> To: Richard Earnshaw > > >> Cc: Zhenqiang Chen; GCC Patches > > >> Subject: Re: [PATCH 1/n] Add conditional compare support > > >> > > >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw > > >> <rearn...@arm.com> > > >> wrote: > > >>> On 27/08/13 12:10, Richard Biener wrote: > > >>>> What's this for and what's the desired semantics? I don't like > > >>>> having extra tree codes for this. Is this for a specific > > >>>> instruction set feature? > > >>> > > >>> The background is to support the conditional compare instructions > > >>> in ARM (more effectively) and AArch64 at all. > > >>> > > >>> The current method used in ARM is to expand into a series of > > >>> store-flag instructions and then hope that combine can optimize > > >>> them away (though that fails far too often, particularly when the > > >>> first instruction in the sequence is combined into another > > >>> pattern). To make it work at all the compiler has to lie about > > >>> the costs of various store-flag type operations which overall > > >>> risks producing worse code and means we also have to support many > > >>> more complex multi-instruction patterns than is desirable. I > > >>> really don't want to go down the same > > > route > > >> for AArch64. > > >>> > > >>> The idea behind all this is to capture potential conditional > > >>> compare operations early enough in the mid end that we can keep > > >>> track of them until RTL expand time and then to emit the correct > > >>> logic on all targets depending on what is the best thing for that > > >>> target. The current method of lowering into store-flag sequences > > >>> doesn't really cut > > > it. > > >> > > >> It seems to me that then the initial instruction selection process > > >> (aka > > > RTL > > >> expansion) needs to be improved. As we are expanding with having > > >> the CFG around it should be easy enough to detect AND/ORIF cases > > >> and do better here. Yeah, I suppose this asks to turn existing > > >> jump expansion > > > optimizations > > >> up-side-down to optimize with the GIMPLE CFG in mind. > > >> > > >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly > bogus > > - > > >> fold-const.c is way too early to decide this. Similar to the > > >> ongoing work > > > of > > >> expanding / building-up switch expressions in a GIMPLE pass, moving > > >> expand complexity up the pipeline this asks for a GIMPLE phase that > > >> moves this decision down closer to RTL expansion. > > >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform > > >> pass) > > >> > > > > > > The patch is updated according to your comments. It is a basic > > > support, which does not touch ifcombine and jump related > > > optimizations > > yet. > > > > > > Current method is: > > > 1) In fold-const, if HAVE_conditional_compare, set > > > LOGICAL_OP_NON_SHORT_CIRCUIT > > > to optimize_function_for_speed_p. So we do not depend on > > BRANCH_COST. > > > 2) Identify CCMP during expanding. A CCMP candidate is a > BIT_AND_EXPR > > > or BIT_IOR_EXPR, whose operators are compares. > > > 3) Add a new op in optab to expand the CCMP to optimized RTL, > > > e.g. and_scc_scc/ior_scc_scc in ARM. > > > > > > Bootstrap on ARM Chrome book. > > > No make check regression on Pandaboard. > > > > > > Thanks! > > > -Zhenqiang > > > > > > ChangeLog: > > > 2013-09-18 Zhenqiang Chen <zhenqiang.c...@linaro.org> > > > > > > * config/arm/arm.md (conditional_compare): New. > > > * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr): > > > New. > > > (expand_expr_real_1): Identify conditional compare. > > > * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update. > > > * optabs.c (expand_ccmp_op): New. > > > (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. > > > * optabs.def (ccmp_optab): New. > > > * optabs.h (expand_ccmp_op): New. > > > > > > > > > basic-conditional-compare-support2.patch > > > > > > > > > N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý > > > > > > > Some general comments. > > > > 1) How do we get to a conditional branch from this code. It seems > > your new pattern generates a store-flag value rather than a branch > expansion. > > Are you expecting combine or some other later pass to clean that up? > > The patch still depend on combine pass to optimize the cmp to generate a > "cmp_ior/cmp_and". > > It is not necessary a branch. e.g. "return a && b;" is not a branch when > expanding. > > The attached patch is updated to distinguish the two cases. If the CCMP is > used in a GIMPLE_COND statement, it generates instruction to match > "cmp_and/cmp_ior". An new expand "cbranchcc4" is added to match the > branch instruction. Then, it does not depend on later passes to optimize the > CCMP. > > > 2) Assuming we do end up with branches, why would we not want to do > > this optimization when optimzing for space? > > I mis-checked http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43920, which > was only for THUMB mode. The attached patch removed it. > > > cmp r0, r1 > > cmpne r2, r3 > > beq L1 > > > > is shorter than > > > > cmp r0, r1 > > beq L1 > > cmp r2, r3 > > beq L1 > > > 3) Is there any way to generalize this for more complex expressions? > > Eg > > > > if (a == b || a == c || a == d) > > > > should become > > > > cmp a, b > > cmpne a, c > > cmpne a, d > > ... > > > > Obviously, when optimizing for speed there will probably be a limit on > > the number of compares that are desirable, but I don't see why we > > should arbitrarily cap it at 2. > > Based on current gcc, there are "no more than 2" cmps. Need update front- > end or ifcombine to recover it. I planned to enhance the ifcombine to > identify more CCMP opportunity. When we have more than 2 cmps, it is not > hard to support it based on current patch. E.g. > > (a == b || a == c || a == d) > > will create two ccmp: > > CC_DEQ = CCMP (ior (EQ (a, b), EQ (a, c))) CC_DEQ = CCMP (ior (EQ (CC_DEQ, > 0), EQ (a, d))) > > The first CCMP will generate > cmp a, b > cmpne a, c > > The second CCMP will only generate > cmpne a, d > CC_DEQ is used for determine the condition (ne). > > And from the MODE of CC_DEQ, we can distinguish the two CCMP. > > BTW: > How to determine the max number? It seams not necessary better with > more conditional compare. > > Thanks! > -Zhenqiang
ccmp.patch
Description: Binary data