Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Evgeny Stupachenko Thu, 06 Mar 2014 06:43:09 -0800

I've separated the patch into 2: cost model tuning and load/store
groups parallelism.
SLM tuning was partially introduced in the patch:
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
The patch introducing vectorization for load/store groups of size 3 attached.


Is it ok for stage1?

ChangeLog:

2014-03-06  Evgeny Stupachenko  <evstu...@gmail.com>

       * tree-vect-data-refs.c (vect_grouped_store_supported): New
       check for stores group of length 3.
       (vect_permute_store_chain): New permutations for stores group of
       length 3.
       (vect_grouped_load_supported): New check for loads group of length 3.
       (vect_permute_load_chain): New permutations for loads group of length 3.
       * tree-vect-stmts.c (vect_model_store_cost): Change cost
       of vec_perm_shuffle for the new permutations.
       (vect_model_load_cost): Ditto.



On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote:
> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>
>> Missed patch attached in plain-text.
>>
>> I have copyright assignment on file with the FSF covering work on GCC.
>>
>> Load/stores groups of length 3 is the most frequent non-power-of-2
>> case. It is used in RGB image processing (like test case in PR52252).
>> For sure we can extend the patch to length 5 and more. However, this
>> potentially affect performance on some other architectures and
>> requires larger testing. So length 3 it is just first step.The
>> algorithm in the patch could be modified for a general case in several
>> steps.
>>
>> I understand that the patch should wait for the stage 1, however since
>> its ready we can discuss it right now and make some changes (like
>> general size of group).
>
> Other than that I'd like to see a vectorizer hook querying the cost of a
> vec_perm_const expansion instead of adding vec_perm_shuffle
> (thus requires the constant shuffle mask to be passed as well
> as the vector type).  That's more useful for other uses that
> would require (arbitrary) shuffles.
>
> Didn't look at the rest of the patch yet - queued in my review
> pipeline.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Evgeny
>>
>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote:
>> >
>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>> >
>> > > Hi,
>> > >
>> > > The patch gives an expected 3 times gain for the test case in the PR52252
>> > > (and even 6 times for AVX2).
>> > > It passes make check and bootstrap on x86.
>> > > spec2000/spec2006 got no regressions/gains on x86.
>> > >
>> > > Is this patch ok?
>> >
>> > I've worked on generalizing the permutation support in the light
>> > of the availability of the generic shuffle support in the IL
>> > but hit some road-blocks in the way code-generation works for
>> > group loads with permutations (I don't remember if I posted all patches).
>> >
>> > This patch seems to be to a slightly different place but it again
>> > special-cases a specific permutation.  Why's that?  Why can't we
>> > support groups of size 7 for example?  So - can this be generalized
>> > to support arbitrary non-power-of-two load/store groups?
>> >
>> > Other than that the patch has to wait for stage1 to open again,
>> > of course.  And it misses a testcase.
>> >
>> > Btw, do you have a copyright assignment on file with the FSF covering
>> > work on GCC?
>> >
>> > Thanks,
>> > Richard.
>> >
>> > > ChangeLog:
>> > >
>> > > 2014-02-11  Evgeny Stupachenko  <evstu...@gmail.com>
>> > >
>> > >         * target.h (vect_cost_for_stmt): Defining new cost 
>> > > vec_perm_shuffle.
>> > >         * tree-vect-data-refs.c (vect_grouped_store_supported): New
>> > >         check for stores group of length 3.
>> > >         (vect_permute_store_chain): New permutations for stores group of
>> > >         length 3.
>> > >         (vect_grouped_load_supported): New check for loads group of 
>> > > length
>> > > 3.
>> > >         (vect_permute_load_chain): New permutations for loads group of
>> > > length 3.
>> > >         * tree-vect-stmts.c (vect_model_store_cost): New cost
>> > > vec_perm_shuffle
>> > >         for the new permutations.
>> > >         (vect_model_load_cost): Ditto.
>> > >         * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>> > >         vec_perm_shuffle cost as equvivalent of vec_perm cost.
>> > >         * config/arm/arm.c: Ditto.
>> > >         * config/rs6000/rs6000.c: Ditto.
>> > >         * config/spu/spu.c: Ditto.
>> > >         * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
>> > > byte
>> > >         shuffle on some x86 architectures.
>> > >         * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>> > >         * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>> > >         (ix86_builtin_vectorization_cost): Adding cost for the new
>> > > permutations.
>> > >         Fixing cost for other permutations.
>> > >         (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>> > >         slow (TARGET_SLOW_PHUFFB).
>> > >         (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>> > >         Adding new shuffle cost only when byte shuffle is expected.
>> > >         Fixing cost model for Silvermont.
>> > >
>> > > Thanks,
>> > > Evgeny
>> > >
>> >
>> > --
>> > Richard Biener <rguent...@suse.de>
>> > SUSE / SUSE Labs
>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>>
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Reply via email to