Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Evgeny Stupachenko Thu, 06 Mar 2014 06:44:48 -0800
Missed attachment.

On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote:
> I've separated the patch into 2: cost model tuning and load/store
> groups parallelism.
> SLM tuning was partially introduced in the patch:
> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
> The patch introducing vectorization for load/store groups of size 3 attached.
>
> Is it ok for stage1?
>
> ChangeLog:
>
> 2014-03-06  Evgeny Stupachenko  <evstu...@gmail.com>
>
>        * tree-vect-data-refs.c (vect_grouped_store_supported): New
>        check for stores group of length 3.
>        (vect_permute_store_chain): New permutations for stores group of
>        length 3.
>        (vect_grouped_load_supported): New check for loads group of length 3.
>        (vect_permute_load_chain): New permutations for loads group of length 
> 3.
>        * tree-vect-stmts.c (vect_model_store_cost): Change cost
>        of vec_perm_shuffle for the new permutations.
>        (vect_model_load_cost): Ditto.
>
>
>
> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote:
>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>
>>> Missed patch attached in plain-text.
>>>
>>> I have copyright assignment on file with the FSF covering work on GCC.
>>>
>>> Load/stores groups of length 3 is the most frequent non-power-of-2
>>> case. It is used in RGB image processing (like test case in PR52252).
>>> For sure we can extend the patch to length 5 and more. However, this
>>> potentially affect performance on some other architectures and
>>> requires larger testing. So length 3 it is just first step.The
>>> algorithm in the patch could be modified for a general case in several
>>> steps.
>>>
>>> I understand that the patch should wait for the stage 1, however since
>>> its ready we can discuss it right now and make some changes (like
>>> general size of group).
>>
>> Other than that I'd like to see a vectorizer hook querying the cost of a
>> vec_perm_const expansion instead of adding vec_perm_shuffle
>> (thus requires the constant shuffle mask to be passed as well
>> as the vector type).  That's more useful for other uses that
>> would require (arbitrary) shuffles.
>>
>> Didn't look at the rest of the patch yet - queued in my review
>> pipeline.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Evgeny
>>>
>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote:
>>> >
>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > The patch gives an expected 3 times gain for the test case in the 
>>> > > PR52252
>>> > > (and even 6 times for AVX2).
>>> > > It passes make check and bootstrap on x86.
>>> > > spec2000/spec2006 got no regressions/gains on x86.
>>> > >
>>> > > Is this patch ok?
>>> >
>>> > I've worked on generalizing the permutation support in the light
>>> > of the availability of the generic shuffle support in the IL
>>> > but hit some road-blocks in the way code-generation works for
>>> > group loads with permutations (I don't remember if I posted all patches).
>>> >
>>> > This patch seems to be to a slightly different place but it again
>>> > special-cases a specific permutation.  Why's that?  Why can't we
>>> > support groups of size 7 for example?  So - can this be generalized
>>> > to support arbitrary non-power-of-two load/store groups?
>>> >
>>> > Other than that the patch has to wait for stage1 to open again,
>>> > of course.  And it misses a testcase.
>>> >
>>> > Btw, do you have a copyright assignment on file with the FSF covering
>>> > work on GCC?
>>> >
>>> > Thanks,
>>> > Richard.
>>> >
>>> > > ChangeLog:
>>> > >
>>> > > 2014-02-11  Evgeny Stupachenko  <evstu...@gmail.com>
>>> > >
>>> > >         * target.h (vect_cost_for_stmt): Defining new cost 
>>> > > vec_perm_shuffle.
>>> > >         * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>> > >         check for stores group of length 3.
>>> > >         (vect_permute_store_chain): New permutations for stores group of
>>> > >         length 3.
>>> > >         (vect_grouped_load_supported): New check for loads group of 
>>> > > length
>>> > > 3.
>>> > >         (vect_permute_load_chain): New permutations for loads group of
>>> > > length 3.
>>> > >         * tree-vect-stmts.c (vect_model_store_cost): New cost
>>> > > vec_perm_shuffle
>>> > >         for the new permutations.
>>> > >         (vect_model_load_cost): Ditto.
>>> > >         * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>>> > >         vec_perm_shuffle cost as equvivalent of vec_perm cost.
>>> > >         * config/arm/arm.c: Ditto.
>>> > >         * config/rs6000/rs6000.c: Ditto.
>>> > >         * config/spu/spu.c: Ditto.
>>> > >         * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
>>> > > byte
>>> > >         shuffle on some x86 architectures.
>>> > >         * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>>> > >         * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>>> > >         (ix86_builtin_vectorization_cost): Adding cost for the new
>>> > > permutations.
>>> > >         Fixing cost for other permutations.
>>> > >         (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>>> > >         slow (TARGET_SLOW_PHUFFB).
>>> > >         (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>>> > >         Adding new shuffle cost only when byte shuffle is expected.
>>> > >         Fixing cost model for Silvermont.
>>> > >
>>> > > Thanks,
>>> > > Evgeny
>>> > >
>>> >
>>> > --
>>> > Richard Biener <rguent...@suse.de>
>>> > SUSE / SUSE Labs
>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>>>
>>
>> --
>> Richard Biener <rguent...@suse.de>
>> SUSE / SUSE Labs
>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
vect3.patch
Description: Binary data
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Reply via email to