On 05/06/2016 09:28 PM, Woon yung Liu wrote:
Regarding multiplication of vectors, is there a way to work with a
multiplication operation that results in something like this (the result is
spread across these 3 registers), without re-ordering any elements:
RD: A6xB6, A4xB4, A2xB2, A0xA0
LO: A7xB7, A6xB6, A3xB3, A2xA2
HI: A5xB5, A4xB4, A1xB1, A0xA0
A0-A7 and B0-B7 are the 8 elements of two V8HI vectors, which are multiplied
together to produce a widened multiplication result.
It looks like the vector hi/lo multiplication pattern would work with the
values in HI and LO, but the order of the elements don't seem to be in a way
that GCC expects.
Assuming that it is possible to put this pattern to use, does GCC allow the
vec_widen_smult_hi and
vec_widen_smult_lo patterns to be combined together? Like for the divmod
(division + modulus) patterns.
The instruction described above (PMULTH) will result in calculation of both the
hi and lo parts of the result, in one instruction. Hence combining the two
patterns would be more efficient.
You can use this if you reshuffle the results.
Since it appears that PMULTH naturally produces even results in RD, it would
seem to make the most sense to attempt to construct the odd results from LO+HI.
However, I don't see anything in the TX79 isa that's particularly helpful there.
That said,
pmulth r0, x, y
pmflo t1
pmfhi t2
pcpyld r1, t1, t2
pcpyud r2, t2, t1
would appear to produce the results gcc expects for the hi/lo multiples.
Don't worry overmuch about initially generating two copies of the pmulth
instruction. We have a similar problem with the ia64 patterns. Rely on the
rtl CSE pass to remove the duplicate instructions.
r~