Testing finished. No new regressions. Is the following patch ok? 2014-06-11 Evgeny Stupachenko <evstu...@gmail.com>
* config/i386/i386.c (ix86_reassociation_width): Add alternative for vector case. * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. Introduces alternative way of loads group permutaions. (vect_transform_grouped_load): Try alternative way of permutations. Thanks, Evgeny On Tue, Jun 10, 2014 at 4:43 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which > include vector mode. > I'll try to separate this into scalar and vector part, but it will > require more testing (under the testing now). > What about the rest of the patch? > > Thanks, > Evgeny > > On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan > <ramana.radhakrish...@arm.com> wrote: >> On 06/05/14 12:43, Evgeny Stupachenko wrote: >>> >>> New hook is related to vector instructions only. Vector instructions >>> could be sequential in pipeline, but scalar - parallel. For x86 >>> architectures TARGET_SCHED_REASSOC_WIDTH does not give required >>> differentiation. >>> General hooks could be potentially reused in other algorithms/by other >>> architectures. >> >> >> It already takes a "mode" argument. Couldn't you use a vector mode to work >> this out ? >> >> If it is not enough then please be more specific about the documentation of >> this hook about where it is useful so that it's easy for people reading the >> documentation to understand at a glance what purpose it serves. >> >> >> Ramana >> >> >>> >>> Thanks, >>> Evgeny >>> >>> On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan >>> <ramana....@googlemail.com> wrote: >>>> >>>> On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko <evstu...@gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> The patch introduces alternative way of permutations for load groups >>>>> of size 2 and 3 which should be faster on architectures with low >>>>> parallelism. >>>>> The patch gives 2 times gain on Silvermont to the test from PR52252 >>>>> (in addition to already committed 3 times gain). >>>>> >>>>> Patch passes bootstrap on x86. Make check is in progress. >>>> >>>> >>>> Why do we need a new hook ? Can't you derive this information from >>>> something which is equally badly named TARGET_SCHED_REASSOC_WIDTH >>>> though used in the reassociation logic but also serves a similar >>>> purpose ? >>>> >>>> Also the documentation of this hook is incomplete at best and wrong at >>>> worst as this is not applied everywhere in the vectorizer but just for >>>> this special case for load store permuting. Implying this is useful >>>> everywhere in the vectorizer does not appear to be correct. >>>> >>>> regards >>>> Ramana >>>> >>>> >>>> >>>> >>>>> >>>>> ChangeLog: >>>>> >>>>> 2014-05-28 Evgeny Stupachenko <evstu...@gmail.com> >>>>> >>>>> * config/i386/i386.c (ix86_have_vector_parallel_execution): >>>>> New. >>>>> (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New. >>>>> * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. >>>>> * config/i386/x86-tune.def >>>>> (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. >>>>> * target.def (have_vector_parallel_execution): New. >>>>> * doc/tm.texi.in (have_vector_parallel_execution)): New. >>>>> * doc/tm.texi: Regenerate. >>>>> * targhooks.c (default_have_vector_parallel_execution): New. >>>>> * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. >>>>> Introduces alternative way of loads group permutaions. >>>>> (vect_transform_grouped_load): Try alternative way of >>>>> permutaions. >>>>> >>>>> Evgeny >>> >>> >>
vect_groups1.patch
Description: Binary data