Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <li...@linux.ibm.com> wrote: >> >> This patch series follows Richi's suggestion at the link [1], >> which suggest structuring vectorizable_load to make costing >> next to the transform, in order to make it easier to keep >> costing and the transform in sync.
FTR, I was keeping quiet given that this was following an agreed plan :) Thanks for organising the series this way. It made it easier to review. >> For now, it's a known >> issue that what we cost can be inconsistent with what we >> transform, as the case in PR82255 and some other associated >> test cases in the patches of this series show. >> >> Basically this patch series makes costing not call function >> vect_model_load_cost any more. To make the review and >> bisection easy, I organized the changes according to the >> memory access types of vector load. For each memory access >> type, firstly it follows the handlings in the function >> vect_model_load_costto avoid any missing, then refines >> further by referring to the transform code, I also checked >> them with some typical test cases to verify. Hope the >> subjects of patches are clear enough. >> >> The whole series can be bootstrapped and regtested >> incrementally on: >> - x86_64-redhat-linux >> - aarch64-linux-gnu >> - powerpc64-linux-gnu P7, P8 and P9 >> - powerpc64le-linux-gnu P8, P9 and P10 >> >> By considering the current vector test buckets are mainly >> tested without cost model, I also verified the whole patch >> series was neutral for SPEC2017 int/fp on Power9 at O2, >> O3 and Ofast separately. > > I went through the series now and I like it overall (well, I suggested > the change). > Looking at the changes I think we want some followup to reduce the > mess in the final loop nest. We already have some VMAT_* cases handled > separately, maybe we can split out some more cases. Maybe we should > bite the bullet and duplicate that loop nest for the different VMAT_* cases. > Maybe we can merge some of the if (!costing_p) checks by clever > re-ordering. So what > this series doesn't improve is overall readability of the code (indent and our > 80 char line limit). > > The change also makes it more difficult(?) to separate analysis and transform > though in the end I hope that analysis will actually "code generate" to a > (SLP) > data structure so the target will have a chance to see the actual flow of > insns. > > That said, I'd like to hear from Richard whether he thinks this is a step > in the right direction. Yeah, agree that it's probably better on balance. It's going to need a bit of discipline to make sure that we don't accidentally change the IR during the analysis phase, but I guess that already exists to a lesser extent with the “before !vec_stmt”/“after !vec_stmt” split. Thanks, Richard