On 4/22/20 7:15 AM, Stephen Long wrote: > Signed-off-by: Stephen Long <stepl...@quicinc.com> > > I'm guessing endianness doesn't matter because we are writing to the > corresponding 32-bit/64-bit in the destination register. > --- > target/arm/cpu.h | 10 +++++++++ > target/arm/helper-sve.h | 3 +++ > target/arm/sve.decode | 4 ++++ > target/arm/sve_helper.c | 44 ++++++++++++++++++++++++++++++++++++++ > target/arm/translate-sve.c | 29 +++++++++++++++++++++++++ > 5 files changed, 90 insertions(+)
Endianness does matter for 32-bit, as we are writing into a host-endian 64-bit quantity. I was being over-brief in my earlier reply. > + TYPE p0, p1, results[4]; \ > + \ > + /* i = 0, j = 0 */ \ > + p0 = MUL(n00, m00, status); \ > + p1 = MUL(n01, m01, status); \ > + results[0] = ADD(a[0], ADD(p0, p1, status), status); \ > + \ > + /* i = 0, j = 1 */ \ > + p0 = MUL(n00, m10, status); \ > + p1 = MUL(n01, m11, status); \ > + results[1] = ADD(a[1], ADD(p0, p1, status), status); \ > + \ > + /* i = 1, j = 0 */ \ > + p0 = MUL(n10, m00, status); \ > + p1 = MUL(n11, m01, status); \ > + results[2] = ADD(a[2], ADD(p0, p1, status), status); \ > + \ > + /* i = 1, j = 1 */ \ > + p0 = MUL(n10, m10, status); \ > + p1 = MUL(n11, m11, status); \ > + results[3] = ADD(a[3], ADD(p0, p1, status), status); \ > + \ > + memcpy(d, results, sizeof(TYPE) * 4); \ There's no need for the result array -- we have already read the inputs, so we can write back the result straight away. r~