I've separated the patch into 2: cost model tuning and load/store groups parallelism. SLM tuning was partially introduced in the patch: http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html The patch introducing vectorization for load/store groups of size 3 attached.
Is it ok for stage1? ChangeLog: 2014-03-06 Evgeny Stupachenko <evstu...@gmail.com> * tree-vect-data-refs.c (vect_grouped_store_supported): New check for stores group of length 3. (vect_permute_store_chain): New permutations for stores group of length 3. (vect_grouped_load_supported): New check for loads group of length 3. (vect_permute_load_chain): New permutations for loads group of length 3. * tree-vect-stmts.c (vect_model_store_cost): Change cost of vec_perm_shuffle for the new permutations. (vect_model_load_cost): Ditto. On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguent...@suse.de> wrote: > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: > >> Missed patch attached in plain-text. >> >> I have copyright assignment on file with the FSF covering work on GCC. >> >> Load/stores groups of length 3 is the most frequent non-power-of-2 >> case. It is used in RGB image processing (like test case in PR52252). >> For sure we can extend the patch to length 5 and more. However, this >> potentially affect performance on some other architectures and >> requires larger testing. So length 3 it is just first step.The >> algorithm in the patch could be modified for a general case in several >> steps. >> >> I understand that the patch should wait for the stage 1, however since >> its ready we can discuss it right now and make some changes (like >> general size of group). > > Other than that I'd like to see a vectorizer hook querying the cost of a > vec_perm_const expansion instead of adding vec_perm_shuffle > (thus requires the constant shuffle mask to be passed as well > as the vector type). That's more useful for other uses that > would require (arbitrary) shuffles. > > Didn't look at the rest of the patch yet - queued in my review > pipeline. > > Thanks, > Richard. > >> Thanks, >> Evgeny >> >> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguent...@suse.de> wrote: >> > >> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: >> > >> > > Hi, >> > > >> > > The patch gives an expected 3 times gain for the test case in the PR52252 >> > > (and even 6 times for AVX2). >> > > It passes make check and bootstrap on x86. >> > > spec2000/spec2006 got no regressions/gains on x86. >> > > >> > > Is this patch ok? >> > >> > I've worked on generalizing the permutation support in the light >> > of the availability of the generic shuffle support in the IL >> > but hit some road-blocks in the way code-generation works for >> > group loads with permutations (I don't remember if I posted all patches). >> > >> > This patch seems to be to a slightly different place but it again >> > special-cases a specific permutation. Why's that? Why can't we >> > support groups of size 7 for example? So - can this be generalized >> > to support arbitrary non-power-of-two load/store groups? >> > >> > Other than that the patch has to wait for stage1 to open again, >> > of course. And it misses a testcase. >> > >> > Btw, do you have a copyright assignment on file with the FSF covering >> > work on GCC? >> > >> > Thanks, >> > Richard. >> > >> > > ChangeLog: >> > > >> > > 2014-02-11 Evgeny Stupachenko <evstu...@gmail.com> >> > > >> > > * target.h (vect_cost_for_stmt): Defining new cost >> > > vec_perm_shuffle. >> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New >> > > check for stores group of length 3. >> > > (vect_permute_store_chain): New permutations for stores group of >> > > length 3. >> > > (vect_grouped_load_supported): New check for loads group of >> > > length >> > > 3. >> > > (vect_permute_load_chain): New permutations for loads group of >> > > length 3. >> > > * tree-vect-stmts.c (vect_model_store_cost): New cost >> > > vec_perm_shuffle >> > > for the new permutations. >> > > (vect_model_load_cost): Ditto. >> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding >> > > vec_perm_shuffle cost as equvivalent of vec_perm cost. >> > > * config/arm/arm.c: Ditto. >> > > * config/rs6000/rs6000.c: Ditto. >> > > * config/spu/spu.c: Ditto. >> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow >> > > byte >> > > shuffle on some x86 architectures. >> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost. >> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost. >> > > (ix86_builtin_vectorization_cost): Adding cost for the new >> > > permutations. >> > > Fixing cost for other permutations. >> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are >> > > slow (TARGET_SLOW_PHUFFB). >> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. >> > > Adding new shuffle cost only when byte shuffle is expected. >> > > Fixing cost model for Silvermont. >> > > >> > > Thanks, >> > > Evgeny >> > > >> > >> > -- >> > Richard Biener <rguent...@suse.de> >> > SUSE / SUSE Labs >> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 >> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer >> > > -- > Richard Biener <rguent...@suse.de> > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer