Hi Bin, >> 2) This case makes me think we should exclude ainc candidates in function >> mark_reg_offset_candidates. The justification is that: ainc candidate >> handles step update itself and when we calculate the cost for it against >> its ainc_use, the cost_step has been reduced. When unrolling happens, >> the ainc computations are replicated and it doesn't save step updates >> like normal reg_offset_p candidates. > Though auto-inc candidate embeds stepping operation into memory > access, we might want to avoid it in case of unroll if there are many > sequences of memory accesses, and if the unroll factor is big. The > rationale is embedded stepping is a u-arch operation and does have its > cost. >
Thanks for the comments! Agree! Excluding them from reg_offset_p candidates here is consistent with this expectation, it makes us consider the unroll factor effect when checking the corresponding step cost and the embedded stepping cost (in group/candidate cost, minus step cost and use the cost from the address_cost hook). >> >> I've updated the patch to punt ainc_use candidates as below: >> >>> + /* Skip AINC candidate since it contains address update itself, >>> + the replicated AINC computations when unrolling still have >>> + updates, unlike reg_offset_p candidates can save step updates >>> + when unrolling. */ >>> + if (cand->ainc_use) >>> + continue; >>> + >> >> For this new attached patch, it's bootstrapped and regress-tested without >> explicit unrolling, while the only one failure has been identified as >> rs6000 specific address_cost issue in bootstrapping and regression testing >> with explicit unrolling. >> >> By the way, with above simple hack of address_cost, I also did one >> bootstrapping and regression testing with explicit unrolling, the above >> sms-4.c did pass as I expected but had two failures instead: >> >> PASS->FAIL: gcc.dg/sms-compare-debug-1.c (test for excess errors) >> PASS->FAIL: gcc.dg/tree-ssa/ivopts-lt.c scan-tree-dump-times ivopts "PHI" 2 >> >> By further investigation, the 2nd one is expected due to the adddress_cost >> hook >> hack, while the 1st one one exposed -fcompare-debug issue in sms. The RTL >> sequence starts to off from sms, just some NOTE_INSN_DELETED positions >> change. >> I believe it's just exposed by this combination unluckily/luckily ;-) I will >> send a patch separately for it once I got time to look into it, but it should >> be unrelated to this patch series for sure. > This is the kind of situation I intended to avoid before. IMHO, this > isn't a neat change (it can't be given we are predicting the future > transformation in compilation pipeline), accumulation of such changes > could make IVOPTs break in one way or another. So as long as you make > sure it doesn't have functional impact in case of no-rtl_unroll, I am > fine. Yeah, I admit it's not neat, but the proposals in the previous discussions without predicting unroll factor can not work well for all scenarios with different unroll factors, they could over-blame some kind of candidates. For the case of no-rtl_unroll, unroll factor estimation should set loop->estimated_unroll to zero, all these changes won't take effect. The estimation function follows the same logics as that of RTL unroll factor calculation, I did test with explicit unrolling disablement before, it worked expectedly. BR, Kewen