On 01/15/2018 10:18 AM, Peter Maydell wrote: >> +void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm, >> + void *vfpst, uint32_t desc) >> +{ >> + uintptr_t opr_sz = simd_oprsz(desc); >> + float16 *d = vd; >> + float16 *n = vn; >> + float16 *m = vm; >> + float_status *fpst = vfpst; >> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1); >> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1); >> + uint32_t neg_real = flip ^ neg_imag; >> + uintptr_t i; >> + >> + neg_real <<= 15; >> + neg_imag <<= 15; >> + >> + for (i = 0; i < opr_sz / 2; i += 2) { >> + float16 e0 = n[H2(i + flip)]; >> + float16 e1 = m[H2(i + flip)] ^ neg_real; >> + float16 e2 = e0; >> + float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag; > > This is again rather confusing to compare against the pseudocode. > What order are your e0/e1/e2/e3 compared to the pseudocode's > element1/element2/element3/element4 ?
The SVE pseudocode for the same operation is clearer than that in the main ARM ARM, and is nearer to what I used: for e = 0 to elements-1 if ElemP[mask, e, esize] == '1' then pair = e - (e MOD 2); // index of first element in pair addend = Elem[result, e, esize]; if IsEven(e) then // real part // realD = realA [+-] flip ? (imagN * imagM) : (realN * realM) element1 = Elem[operand1, pair + flip, esize]; element2 = Elem[operand2, pair + flip, esize]; if neg_real then element2 = FPNeg(element2); else // imaginary part // imagD = imagA [+-] flip ? (imagN * realM) : (realN * imagM) element1 = Elem[operand1, pair + flip, esize]; element2 = Elem[operand2, pair + (1 - flip), esize]; if neg_imag then element2 = FPNeg(element2); Elem[result, e, esize] = FPMulAdd(addend, element1, element2, FPCR); In my version, e0/e1 are element1/element2 (real) and e2/e3 are element1/element2 (imag). r~