Re: (R5900) Implementing Vector Support

Richard Henderson Mon, 09 May 2016 10:54:22 -0700

On 05/06/2016 09:28 PM, Woon yung Liu wrote:

Regarding multiplication of vectors, is there a way to work with a 
multiplication operation that results in something like this (the result is 
spread across these 3 registers), without re-ordering any elements:


RD: A6xB6, A4xB4, A2xB2, A0xA0

LO: A7xB7, A6xB6, A3xB3, A2xA2
HI: A5xB5, A4xB4, A1xB1, A0xA0

A0-A7 and B0-B7 are the 8 elements of two V8HI vectors, which are multiplied 
together to produce a widened multiplication result.

It looks like the vector hi/lo multiplication pattern would work with the 
values in HI and LO, but the order of the elements don't seem to be in a way 
that GCC expects.

Assuming that it is possible to put this pattern to use, does GCC allow the 
vec_widen_smult_hi and
vec_widen_smult_lo patterns to be combined together? Like for the divmod 
(division + modulus) patterns.
The instruction described above (PMULTH) will result in calculation of both the 
hi and lo parts of the result, in one instruction. Hence combining the two 
patterns would be more efficient.


You can use this if you reshuffle the results.

Since it appears that PMULTH naturally produces even results in RD, it wouldseem to make the most sense to attempt to construct the odd results from LO+HI.However, I don't see anything in the TX79 isa that's particularly helpful there.


That said,

        pmulth  r0, x, y
        pmflo   t1
        pmfhi   t2
        pcpyld  r1, t1, t2
        pcpyud  r2, t2, t1

would appear to produce the results gcc expects for the hi/lo multiples.

Don't worry overmuch about initially generating two copies of the pmulthinstruction. We have a similar problem with the ia64 patterns. Rely on thertl CSE pass to remove the duplicate instructions.

r~

Re: (R5900) Implementing Vector Support

Reply via email to