On 08/16/2016 03:10 AM, shmuel gutl wrote:
My hardware directly supports instructions of the form
subreg:SI(reg:VEC v1,3) = SI:a1
Subregs of hard registers should be avoided. They are primarily useful
for pseudo regs. Subregs that aren't lowpart subregs should be avoided
also. Except when you have a subreg of a pseudo that maps to multiple
hard regs, and can eventually become a lowpart subreg after the pseudo
gets allocated to a hard reg and gets simplified.
It isn't clear where the subregs are coming from, but what you are doing
sounds like a bit-field extract/insert, and these are not operations
that the register allocator will add to the code. Depending on what
exactly you are trying to do, I have two general suggestions.
1) Define the vector registers as 32-bit registers, and define vector
operations as using aligned groups of these 32-bit registers. This
exposes the 32-bit registers to the register allocator so that it can
use them directly.
2) Use zero_extract and/or vec_select instead of subreg, which requires
that you have patterns that emit the zero_extract/vec_select operations,
patterns that recognize them, and possibly builtin functions that the
user can call to get these zero_extract/vec_select operations emitted
into the rtl. There is a named pattern vec_extract that the vectorizer
can use to generate these rtl operations. For examples of this, in the
aarch64 port, see for instance the aarch64_movdi_* patterns in the
aarch64.md file, and the aarch64_get_lane* patterns in the
aarch64-simd.md file.
Jim