On Mon, Oct 3, 2011 at 8:02 PM, Richard Henderson <r...@redhat.com> wrote:
> On 10/03/2011 11:40 AM, Artem Shinkarov wrote:
>> Currently if vec_perm_ok returns false, we do not try to use a new
>> vshuffle routine. Would it make sense to implement that? The only
>> potential problem I can see is a possible performance degradation.
>> This leads us to the second issue.
>
> Implement that where?  In the vectorizer?  No, I don't think so.
> The _ok routine, while also indicating what the backend expander
> supports, could also be thought of as a cost cutoff predicate.
> Unless the vectorization folk request some more exact cost metric
> I don't see any reason to change this.

I was thinking more about the expander of the backend itself. When we
throw sorry () in the ix86_expand_vec_perm_builtin, we can fall back
to the vshuffle routine, unless it would lead to the performance
degradation.

>> When we perform vshuffle, we need to know whether it make sense to use
>> pshufb (in case of x86) or to perform data movement via standard
>> non-simd registers. Do we have this information in the current
>> cost-model?
>
> Not really.  Again, if you're talking about the vectorizer, it
> gets even more complicated than this because...
>
>> Also, in certain cases, when the mask is constant, I would
>> assume the memory movement is also faster. For example if the mask is
>> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.
>
> ... even before SSSE3 PSHUFB, we have all sorts of insns that can
> perform a constant shuffle without having to resort to either
> general-purpose registers or memory.  E.g. PSHUFD.  For specific
> data types, we can handle arbitrary constant shuffle with 1 or 2
> insns, even when arbitrary variable shuffles aren't.

But these cases are more or less covered. I am thinking about the
cases when vec_perm_ok, returns false, but the actual permutation
could be done faster with memory/register transfers, rather than with
the PSHUFB & Co.

> It's certainly something that we could add to tree-vect-generic.c.
> I have no plans to do anything of the sort, however.

I didn't quite understand what do you think can be added to the
tree-vect-generic? I thought that we are talking about more or less
backend issues.

In any case I am investigating these problems, and I will appreciate
any help or advices.


Thanks,
Artem.

Reply via email to