> On 31 dec. 2014, at 18:25, Peter Maydell <peter.mayd...@linaro.org> wrote: > >> On 31 December 2014 at 18:08, Ard Biesheuvel <ard.biesheu...@linaro.org> >> wrote: >>> On 31 December 2014 at 17:37, Peter Maydell <peter.mayd...@linaro.org> >>> wrote: >>> It doesn't happen on LE TCG hosts. Joy :-) > >> I will need to find out first if this is the kernel driver or the >> emulation failing. I never tested the former on BE, as the models >> don't support the crypto extensions (at least, not without a plugin >> that I don't have access to) and the actual hardware boots via EFI >> which is LE only, and with no 'ground truth' as a reference, it is >> hard to be 100% certain both implementations are in spec, even if they >> work in combination. > > It's an LE guest in both cases (AArch64); the only difference is the > host system (LE x86-64 vs BE PPC64). If our emulated CPU behaves > differently based on the endianness of the host, it's always an > emulation bug by definition. (Of course it's possible that we're > currently correct on BE host, wrong on LE host, and with the > corresponding bug in LE guest kernels, but that seems unlikely -- I > did do cross-checks against the fast models for the crypto insns > using an LE-host QEMU.) >
Ok, I had missed that bit of context, but obviously, if the only difference is the endianness of the host, it must be an emulation bug. >> The SHA emulation code only operates on 32-bit quantities, and the >> only memory access it does is populating the CRYPTO_STATE union using >> float64_val(). Does anyone feel there is anything obviously wrong with >> that? > > How do you extract the 32 bit quantities from the float64_val ? > > Definition of vfp.regs[] from cpu.h: > * In AArch32: > * Qn = regs[2n+1]:regs[2n] > * Dn = regs[n] > * Sn = regs[n/2] bits 31..0 for even n, and bits 63..32 for odd n > * (and regs[32] to regs[63] are inaccessible) > * In AArch64: > * Qn = regs[2n+1]:regs[2n] > * Dn = regs[2n] > * Sn = regs[2n] bits 31..0 > > where regs[x] is always a 64-bit value in host endianness > (ie if you store 0x112233445566778899aabbccddeeff00 as > a 64 bit write to VFP register D0 then regs[0] will be > 0x112233445566778899aabbccddeeff00 regardless of host > endianness. That is, the least significant 8 bits of D0 > will be (regs[0] & 0xff). (This isn't the same number as > if you do the union-type-punning thing with union { > uint64_t l; uint8_t b[8]; } and look at b[0].) > Ok, that must be the culprit then. I indeed use a union to convert between the various operand sizes. > Notice that for 128-bit AArch64 vectors, the high half > is always regs[2n+1] and the low half regs[2n], regardless > of host endianness. This implies that on big-endian hosts > a 128-bit vector is not represented as a 128-bit host > endian vector at any offset in regs[]. The function > vec_reg_offset() in translate-a64.c may be helpful. I cannot easily test this myself, but i will try to cook up a patch anyway. Cheers, Ard.