On 08/16/2016 03:10 AM, shmuel gutl wrote:
My hardware directly supports instructions of the form subreg:SI(reg:VEC v1,3) = SI:a1
Subregs of hard registers should be avoided. They are primarily useful for pseudo regs. Subregs that aren't lowpart subregs should be avoided also. Except when you have a subreg of a pseudo that maps to multiple hard regs, and can eventually become a lowpart subreg after the pseudo gets allocated to a hard reg and gets simplified.
It isn't clear where the subregs are coming from, but what you are doing sounds like a bit-field extract/insert, and these are not operations that the register allocator will add to the code. Depending on what exactly you are trying to do, I have two general suggestions.
1) Define the vector registers as 32-bit registers, and define vector operations as using aligned groups of these 32-bit registers. This exposes the 32-bit registers to the register allocator so that it can use them directly.
2) Use zero_extract and/or vec_select instead of subreg, which requires that you have patterns that emit the zero_extract/vec_select operations, patterns that recognize them, and possibly builtin functions that the user can call to get these zero_extract/vec_select operations emitted into the rtl. There is a named pattern vec_extract that the vectorizer can use to generate these rtl operations. For examples of this, in the aarch64 port, see for instance the aarch64_movdi_* patterns in the aarch64.md file, and the aarch64_get_lane* patterns in the aarch64-simd.md file.
Jim