Missed attachment. On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > I've separated the patch into 2: cost model tuning and load/store > groups parallelism. > SLM tuning was partially introduced in the patch: > http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html > The patch introducing vectorization for load/store groups of size 3 attached. > > Is it ok for stage1? > > ChangeLog: > > 2014-03-06 Evgeny Stupachenko <evstu...@gmail.com> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New > check for stores group of length 3. > (vect_permute_store_chain): New permutations for stores group of > length 3. > (vect_grouped_load_supported): New check for loads group of length 3. > (vect_permute_load_chain): New permutations for loads group of length > 3. > * tree-vect-stmts.c (vect_model_store_cost): Change cost > of vec_perm_shuffle for the new permutations. > (vect_model_load_cost): Ditto. > > > > On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote: >> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >> >>> Missed patch attached in plain-text. >>> >>> I have copyright assignment on file with the FSF covering work on GCC. >>> >>> Load/stores groups of length 3 is the most frequent non-power-of-2 >>> case. It is used in RGB image processing (like test case in PR52252). >>> For sure we can extend the patch to length 5 and more. However, this >>> potentially affect performance on some other architectures and >>> requires larger testing. So length 3 it is just first step.The >>> algorithm in the patch could be modified for a general case in several >>> steps. >>> >>> I understand that the patch should wait for the stage 1, however since >>> its ready we can discuss it right now and make some changes (like >>> general size of group). >> >> Other than that I'd like to see a vectorizer hook querying the cost of a >> vec_perm_const expansion instead of adding vec_perm_shuffle >> (thus requires the constant shuffle mask to be passed as well >> as the vector type). That's more useful for other uses that >> would require (arbitrary) shuffles. >> >> Didn't look at the rest of the patch yet - queued in my review >> pipeline. >> >> Thanks, >> Richard. >> >>> Thanks, >>> Evgeny >>> >>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote: >>> > >>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >>> > >>> > > Hi, >>> > > >>> > > The patch gives an expected 3 times gain for the test case in the >>> > > PR52252 >>> > > (and even 6 times for AVX2). >>> > > It passes make check and bootstrap on x86. >>> > > spec2000/spec2006 got no regressions/gains on x86. >>> > > >>> > > Is this patch ok? >>> > >>> > I've worked on generalizing the permutation support in the light >>> > of the availability of the generic shuffle support in the IL >>> > but hit some road-blocks in the way code-generation works for >>> > group loads with permutations (I don't remember if I posted all patches). >>> > >>> > This patch seems to be to a slightly different place but it again >>> > special-cases a specific permutation. Why's that? Why can't we >>> > support groups of size 7 for example? So - can this be generalized >>> > to support arbitrary non-power-of-two load/store groups? >>> > >>> > Other than that the patch has to wait for stage1 to open again, >>> > of course. And it misses a testcase. >>> > >>> > Btw, do you have a copyright assignment on file with the FSF covering >>> > work on GCC? >>> > >>> > Thanks, >>> > Richard. >>> > >>> > > ChangeLog: >>> > > >>> > > 2014-02-11 Evgeny Stupachenko <evstu...@gmail.com> >>> > > >>> > > * target.h (vect_cost_for_stmt): Defining new cost >>> > > vec_perm_shuffle. >>> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New >>> > > check for stores group of length 3. >>> > > (vect_permute_store_chain): New permutations for stores group of >>> > > length 3. >>> > > (vect_grouped_load_supported): New check for loads group of >>> > > length >>> > > 3. >>> > > (vect_permute_load_chain): New permutations for loads group of >>> > > length 3. >>> > > * tree-vect-stmts.c (vect_model_store_cost): New cost >>> > > vec_perm_shuffle >>> > > for the new permutations. >>> > > (vect_model_load_cost): Ditto. >>> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding >>> > > vec_perm_shuffle cost as equvivalent of vec_perm cost. >>> > > * config/arm/arm.c: Ditto. >>> > > * config/rs6000/rs6000.c: Ditto. >>> > > * config/spu/spu.c: Ditto. >>> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow >>> > > byte >>> > > shuffle on some x86 architectures. >>> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost. >>> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost. >>> > > (ix86_builtin_vectorization_cost): Adding cost for the new >>> > > permutations. >>> > > Fixing cost for other permutations. >>> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are >>> > > slow (TARGET_SLOW_PHUFFB). >>> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. >>> > > Adding new shuffle cost only when byte shuffle is expected. >>> > > Fixing cost model for Silvermont. >>> > > >>> > > Thanks, >>> > > Evgeny >>> > > >>> > >>> > -- >>> > Richard Biener <rguent...@suse.de> >>> > SUSE / SUSE Labs >>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer >>> >> >> -- >> Richard Biener <rguent...@suse.de> >> SUSE / SUSE Labs >> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
vect3.patch
Description: Binary data