On 2/26/19 3:38 AM, David Hildenbrand wrote: > We sometimes want to work on a temporary vector register instead of the > actual destination, because source and destination might overlap. An > alternative would be loading the vector into two i64 variables, but than > separate handling for accessing the vector elements would be needed. > This is easier. Add one for now as that seems to be enough.
Hmm, I'll reserve judgment until I see how this is used. For ARM SVE, I would allocate this temporary on the stack within the helper, and move one of the operands out of the way. E.g. void helper(foo)(void *vd, void *vx, *void *vy { VectorReg tmp; TYPE *d = vd, *x = vx, *y = vy; if (vx == vd || vy == vd) { tmp = *(VectorReg *)vd; if (vx == vd) { vx = &tmp; } if (vy == vd) { vy = &tmp; } } process d, x, y as normal. } This minimized the amount of code inline. However, SVE vectors are quite a bit larger, at 256 bytes, so the copy itself was out of line most of the time anyway. Provisionally, Reviewed-by: Richard Henderson <richard.hender...@linaro.org> r~