Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <li...@linux.ibm.com> wrote:
>>
>> This patch series follows Richi's suggestion at the link [1],
>> which suggest structuring vectorizable_load to make costing
>> next to the transform, in order to make it easier to keep
>> costing and the transform in sync.

FTR, I was keeping quiet given that this was following an agreed plan :)

Thanks for organising the series this way.  It made it easier to review.

>> For now, it's a known
>> issue that what we cost can be inconsistent with what we
>> transform, as the case in PR82255 and some other associated
>> test cases in the patches of this series show.
>>
>> Basically this patch series makes costing not call function
>> vect_model_load_cost any more.  To make the review and
>> bisection easy, I organized the changes according to the
>> memory access types of vector load.  For each memory access
>> type, firstly it follows the handlings in the function
>> vect_model_load_costto avoid any missing, then refines
>> further by referring to the transform code, I also checked
>> them with some typical test cases to verify.  Hope the
>> subjects of patches are clear enough.
>>
>> The whole series can be bootstrapped and regtested
>> incrementally on:
>>   - x86_64-redhat-linux
>>   - aarch64-linux-gnu
>>   - powerpc64-linux-gnu P7, P8 and P9
>>   - powerpc64le-linux-gnu P8, P9 and P10
>>
>> By considering the current vector test buckets are mainly
>> tested without cost model, I also verified the whole patch
>> series was neutral for SPEC2017 int/fp on Power9 at O2,
>> O3 and Ofast separately.
>
> I went through the series now and I like it overall (well, I suggested
> the change).
> Looking at the changes I think we want some followup to reduce the
> mess in the final loop nest.  We already have some VMAT_* cases handled
> separately, maybe we can split out some more cases.  Maybe we should
> bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> Maybe we can merge some of the if (!costing_p) checks by clever
> re-ordering.  So what
> this series doesn't improve is overall readability of the code (indent and our
> 80 char line limit).
>
> The change also makes it more difficult(?) to separate analysis and transform
> though in the end I hope that analysis will actually "code generate" to a 
> (SLP)
> data structure so the target will have a chance to see the actual flow of 
> insns.
>
> That said, I'd like to hear from Richard whether he thinks this is a step
> in the right direction.

Yeah, agree that it's probably better on balance.  It's going to need a
bit of discipline to make sure that we don't accidentally change the IR
during the analysis phase, but I guess that already exists to a lesser
extent with the “before !vec_stmt”/“after !vec_stmt” split.

Thanks,
Richard

Reply via email to