Hi Hans, on 2020/9/6 上午10:47, Hans-Peter Nilsson wrote: > On Tue, 1 Sep 2020, Bin.Cheng via Gcc-patches wrote: >>> Great idea! With explicitly specified -funroll-loops, it's bootstrapped >>> but the regression testing did show one failure (the only one): >>> >>> PASS->FAIL: gcc.dg/sms-4.c scan-rtl-dump-times sms "SMS succeeded" 1 >>> >>> It exposes two issues: >>> >>> 1) Currently address_cost hook on rs6000 always return zero, but at least >>> from Power7, pre_inc/pre_dec kind instructions are cracked, it means we >>> have to take the address update into account (scalar normal operation). >>> Since IVOPTs reduces the cost_step for ainc candidates, it makes us prefer >>> ainc candidates. In this case, the cand/group cost is -4 (minus cost_step), >>> with scaling up, the off becomes much. With one simple hack on for pre_inc/ >>> pre_dec in rs6000 address_cost, the case passed. It should be handled in >>> one separated issue. >>> >>> 2) This case makes me think we should exclude ainc candidates in function >>> mark_reg_offset_candidates. The justification is that: ainc candidate >>> handles step update itself and when we calculate the cost for it against >>> its ainc_use, the cost_step has been reduced. When unrolling happens, >>> the ainc computations are replicated and it doesn't save step updates >>> like normal reg_offset_p candidates. >> Though auto-inc candidate embeds stepping operation into memory >> access, we might want to avoid it in case of unroll if there are many >> sequences of memory accesses, and if the unroll factor is big. The >> rationale is embedded stepping is a u-arch operation and does have its >> cost. > > Forgive me for barging in here (though the context is powerpc, > the dialogue and the patch seems to be generic ivopts), but > that's not a general remark I hope, about auto-inc (always) > having a cost? > > For some architectures, auto-inc *is* free, as free as > register-indirect, so the more auto-inc use, the better. All > this should be reflected by the address-cost, IMHO, and not > hardcoded into ivopts. >
Yeah, now ivopts doesn't hardcode the cost for auto-inc (always), instead it allows targets to set its cost by themselves through address_cost hook. As the function get_address_cost_ainc, it checks auto-inc operations supported or not and set the cost as address_cost hook further. One example on Power is listed as below: Group 0: cand cost compl. inv.expr. inv.vars 1 4 1 NIL; 1 3 0 0 NIL; NIL; 4 0 1 NIL; 1 5 0 1 NIL; NIL; 13 0 1 NIL; NIL; 18 -4 0 NIL; NIL; Cand 18 is one auto-inc candidate, whose group 0/cand cost is -4 (minus step_cost), the iv_cost of cand 18 is 5 (step_cost + non-original_iv cost), when it's selected, the step_cost parts counteract, the remaining cost (1) is for non-original iv, it shows it doesn't put any hardcoded cost to this ainc_cost candidate. I guess some misunderstanding was derived from some discussion above. Sorry if some of my previous comments misled you. BR, Kewen