2d part: 2014-04-15 Evgeny Stupachenko <evstu...@gmail.com>
* config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte shuffle on some x86 architectures. * config/i386/i386.h (TARGET_SLOW_PHUFFB): Ditto. * config/i386/i386.c (expand_vec_perm_even_odd_1): Avoid byte shuffles in architectures where they are slow (TARGET_SLOW_PHUFFB). diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bf4d576..0ae3cda 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -44026,7 +44026,7 @@ expand_vec_perm_even_odd_1 (struct expand_vec_perm_d *d, unsigned odd) gcc_unreachable (); case V8HImode: - if (TARGET_SSSE3) + if (TARGET_SSSE3 && !TARGET_SLOW_PSHUFB) return expand_vec_perm_pshufb2 (d); else { @@ -44049,7 +44049,7 @@ expand_vec_perm_even_odd_1 (struct expand_vec_perm_d *d, unsigned odd) break; case V16QImode: - if (TARGET_SSSE3) + if (TARGET_SSSE3 && !TARGET_SLOW_PSHUFB) return expand_vec_perm_pshufb2 (d); else { diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 51659de..1a884d8 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -425,6 +425,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_USE_VECTOR_FP_CONVERTS] #define TARGET_USE_VECTOR_CONVERTS \ ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS] +#define TARGET_SLOW_PSHUFB \ + ix86_tune_features[X86_TUNE_SLOW_PSHUFB] #define TARGET_FUSE_CMP_AND_BRANCH_32 \ ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32] #define TARGET_FUSE_CMP_AND_BRANCH_64 \ diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 8399102..9b0ff36 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -386,6 +386,10 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS, "use_vector_fp_converts", from integer to FP. */ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10) +/* X86_TUNE_SLOW_SHUFB: Indicates tunings with slow pshufb instruction. */ +DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb", + m_BONNELL | m_SILVERMONT | m_INTEL) + /*****************************************************************************/ /* AVX instruction selection tuning (some of SSE flags affects AVX, too) */ /*****************************************************************************/ On Thu, Mar 6, 2014 at 12:58 AM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > slm_cost/intel_cost and TARGET_SLOW_PSHUFB are just preparation to a > next vectorization patch. > Changes in ix86_add_stmt_cost gives real performance to Silvermont. > Let's move all to stage1. > > On Wed, Mar 5, 2014 at 9:29 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Wed, Mar 5, 2014 at 5:46 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> On Wed, Mar 5, 2014 at 7:58 AM, Evgeny Stupachenko <evstu...@gmail.com> >>> wrote: >>>> Hi, >>>> >>>> The patch is for x86 Silvermont. >>>> It improves x86 Silvermont vector cost model. >>>> It gives +20% on facerec spec on Silvermont. >>>> It passes make check and bootstrap on x86. >>>> >>>> Is this patch ok for stage1? >>>> >>>> ChangeLog: >>>> >>>> 2014-03-05 Evgeny Stupachenko <evstu...@gmail.com> >>>> >>>> * config/i386/x86-tune.def (TARGET_SLOW_PSHUFB): Target for slow byte >>>> shuffle on some x86 architectures. >>>> * config/i386/i386.h (TARGET_SLOW_PSHUFB): Ditto. >>>> * config/i386/i386.c (processor_costs): Fixing vec_to_scalar_cost for >>>> Silvermont according latency table. >>>> (expand_vec_perm_even_odd_1): Avoid byte shuffles in architectures >>>> where they are slow (TARGET_SLOW_PSHUFB). >>>> (x86_add_stmt_cost): Fixing vector cost model for Silvermont. >>>> >>>> Thanks, >>>> Evgeny >>> >>> There are 3 separate changes in this patch: >>> >>> 1. Update slm_cost, which doesn't have a ChangeLog entry. >>> 2. Add TARGET_SLOW_PSHUFB. >>> 3. Update ix86_add_stmt_cost. >>> >>> I suggest you break it into 3 independent patches. >> >> I think that slm_cost/intel_cost and TARGET_SLOW_PSHUFB changes can >> still go into mainline at this stage since they are trivial tuning >> changes that should not destabilize the compiler. >> >> The ix86_add_stmt_cost should wait for stage 1. >> >> Uros.