On 2 January 2015 at 15:17, Laszlo Ersek <ler...@redhat.com> wrote: > On 01/02/15 15:18, Ard Biesheuvel wrote: >> The crypto emulation code in target-arm/crypto_helper.c never worked >> correctly on big endian hosts, due to the fact that it uses a union >> of array types to convert between the native VFP register size (64 >> bits) and the types used in the algorithms (bytes and 32 bit words) >> >> We cannot just swab between LE and BE when reading and writing the >> registers, as the SHA code performs word additions, so instead, add >> array accessors for the CRYPTO_STATE type whose LE and BE specific >> implementations ensure that the correct array elements are referenced. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> >> --- >> >> Only tested on a LE (amd64) host, as I don't have access to a BE host. >> >> target-arm/crypto_helper.c | 114 >> +++++++++++++++++++++++++-------------------- >> 1 file changed, 63 insertions(+), 51 deletions(-) >> >> diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c >> index dd60d0b81a4c..1fe975d0f1e3 100644 >> --- a/target-arm/crypto_helper.c >> +++ b/target-arm/crypto_helper.c >> @@ -22,6 +22,14 @@ union CRYPTO_STATE { >> uint64_t l[2]; >> }; >> >> +#ifdef HOST_WORDS_BIGENDIAN >> +#define CR_ST_BYTE(state, i) (state.bytes[(15 - (i)) ^ 8]) > > Produces > > 7, 6, 5, 4, 3, 2, 1, 0, > 15, 14, 13, 12, 11, 10, 9 > > which I think is consistent with what Peter wrote. > >> +#define CR_ST_WORD(state, i) (state.words[(3 - (i)) ^ 2]) > > Hm. The indices look good: > > 1, 0 > 3, 2 > > But, assuming you're on a big-endian host, perhaps you should byte swap > the uint32_t too?... >
No, I don't think so. I even removed the cpu_to_le32() call from the aesmc helper because the MixColumns lookup tables will be emitted in host native endianness, and the vfp.regs[] array is host native endianness as well. I think the union type may have been a mistake to begin with, because it introduces endianness dependencies that don't actually exist in the code. It probably would have been cleaner if I had defined the relation between VFP D-regs, words and bytes in terms of shifts and masks instead. >> (ie if you store 0x112233445566778899aabbccddeeff00 as a 64 bit write >> to VFP register D0 then regs[0] will be >> 0x112233445566778899aabbccddeeff00 regardless of host endianness. That >> is, the least significant 8 bits of D0 will be (regs[0] & 0xff). (This >> isn't the same number as if you do the union-type-punning thing with >> union { uint64_t l; uint8_t b[8]; } and look at b[0].) > > Assume the guest writes value 0x112233445566778899aabbccddeeff00 (LE > representation). > > * Assuming a little endian host, we get: > > [00 ff ee dd cc bb aa 99 88 77 66 55 44 33 22 11] > > (full 128-bit LE vector). > > This is grouped into units of four bytes for the words array: > > [00 ff ee dd] [cc bb aa 99] [88 77 66 55] [44 33 22 11] > > And the meaning of the individual uint32_t's is: > > 0xddeeff00 0x99aabbcc 0x55667788 0x11223344 > > Which are the building blocks for the crypto primitives. > > * Assuming a big endian host, you get the following byte array for the > same store (if I understand Peter right -- two uint64_t values are > encoded as host endian, but the ordering between those remains little > endian): > > [99 aa bb cc dd ee ff 00 11 22 33 44 55 66 77 88] > > (and your CR_ST_BYTE() macro looks right for this). > > Grouping this into units of four bytes for the words array, we get > > [99 aa bb cc] [dd ee ff 00] [11 22 33 44] [55 66 77 88] > > resulting in values > > 0x99aabbcc 0xddeeff00 0x11223344 0x55667788 > > And then your CR_ST_WORD macro permutes them (1, 0, 3, 2) to the > following values: > > 0xddeeff00 0x99aabbcc 0x55667788 0x11223344 > > ... Which is identical to the LE host result. > > So this part looks fine to me. > Thanks for the elaborate review. > Regarding the rest -- in my opinion, getting the implementation of > crypto primitivies *wrong* despite them succesfully passing an extensive > test suite is exceedingly unlikely. You usually cannot get crypto > primitives "halfway right" -- as long as you don't *combine* them to > bigger building blocks, crypto primitives are 100% specified and there > are no "corner cases" that are usual for other program logic. The > implementation either works or is catastrophically broken (which is > quickly visible). > > Hence, if Peter gets good test results for this patchset, then I think > that *suffices* to apply the patch. > Agreed. > ... Which is why I won't try to eyeball the rest of the patch where you > put the macros to use. :) The macros themselves are sound. If you broke > their application, they won't pass the test suite (in the kernel bootup > code, or elsewhere). > > Acked-by: Laszlo Ersek <ler...@redhat.com> > > Thanks > Laszlo > Cheers, Ard. >> +#else >> +#define CR_ST_BYTE(state, i) (state.bytes[i]) >> +#define CR_ST_WORD(state, i) (state.words[i]) >> +#endif >> + >> void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm, >> uint32_t decrypt) >> { >> @@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, >> uint32_t rm, >> >> /* combine ShiftRows operation and sbox substitution */ >> for (i = 0; i < 16; i++) { >> - st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]]; >> + CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, >> shift[decrypt][i])]; >> } >> >> env->vfp.regs[rd] = make_float64(st.l[0]); >> @@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t >> rd, uint32_t rm, >> assert(decrypt < 2); >> >> for (i = 0; i < 16; i += 4) { >> - st.words[i >> 2] = cpu_to_le32( >> - mc[decrypt][st.bytes[i]] ^ >> - rol32(mc[decrypt][st.bytes[i + 1]], 8) ^ >> - rol32(mc[decrypt][st.bytes[i + 2]], 16) ^ >> - rol32(mc[decrypt][st.bytes[i + 3]], 24)); >> + CR_ST_WORD(st, i >> 2) = >> + mc[decrypt][CR_ST_BYTE(st, i)] ^ >> + rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^ >> + rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^ >> + rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24); >> } >> >> env->vfp.regs[rd] = make_float64(st.l[0]); >> @@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, >> uint32_t rd, uint32_t rn, >> >> switch (op) { >> case 0: /* sha1c */ >> - t = cho(d.words[1], d.words[2], d.words[3]); >> + t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, >> 3)); >> break; >> case 1: /* sha1p */ >> - t = par(d.words[1], d.words[2], d.words[3]); >> + t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, >> 3)); >> break; >> case 2: /* sha1m */ >> - t = maj(d.words[1], d.words[2], d.words[3]); >> + t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, >> 3)); >> break; >> default: >> g_assert_not_reached(); >> } >> - t += rol32(d.words[0], 5) + n.words[0] + m.words[i]; >> - >> - n.words[0] = d.words[3]; >> - d.words[3] = d.words[2]; >> - d.words[2] = ror32(d.words[1], 2); >> - d.words[1] = d.words[0]; >> - d.words[0] = t; >> + t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0) >> + + CR_ST_WORD(m, i); >> + >> + CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3); >> + CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2); >> + CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2); >> + CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0); >> + CR_ST_WORD(d, 0) = t; >> } >> } >> env->vfp.regs[rd] = make_float64(d.l[0]); >> @@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, >> uint32_t rm) >> float64_val(env->vfp.regs[rm + 1]) >> } }; >> >> - m.words[0] = ror32(m.words[0], 2); >> - m.words[1] = m.words[2] = m.words[3] = 0; >> + CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2); >> + CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0; >> >> env->vfp.regs[rd] = make_float64(m.l[0]); >> env->vfp.regs[rd + 1] = make_float64(m.l[1]); >> @@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t >> rd, uint32_t rm) >> float64_val(env->vfp.regs[rm + 1]) >> } }; >> >> - d.words[0] = rol32(d.words[0] ^ m.words[1], 1); >> - d.words[1] = rol32(d.words[1] ^ m.words[2], 1); >> - d.words[2] = rol32(d.words[2] ^ m.words[3], 1); >> - d.words[3] = rol32(d.words[3] ^ d.words[0], 1); >> + CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1); >> + CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1); >> + CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1); >> + CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1); >> >> env->vfp.regs[rd] = make_float64(d.l[0]); >> env->vfp.regs[rd + 1] = make_float64(d.l[1]); >> @@ -356,20 +365,22 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t >> rd, uint32_t rn, >> int i; >> >> for (i = 0; i < 4; i++) { >> - uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3] >> - + S1(n.words[0]) + m.words[i]; >> - >> - n.words[3] = n.words[2]; >> - n.words[2] = n.words[1]; >> - n.words[1] = n.words[0]; >> - n.words[0] = d.words[3] + t; >> - >> - t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]); >> - >> - d.words[3] = d.words[2]; >> - d.words[2] = d.words[1]; >> - d.words[1] = d.words[0]; >> - d.words[0] = t; >> + uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, >> 2)) >> + + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0)) >> + + CR_ST_WORD(m, i); >> + >> + CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2); >> + CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1); >> + CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0); >> + CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t; >> + >> + t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2)) >> + + S0(CR_ST_WORD(d, 0)); >> + >> + CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2); >> + CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1); >> + CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0); >> + CR_ST_WORD(d, 0) = t; >> } >> >> env->vfp.regs[rd] = make_float64(d.l[0]); >> @@ -394,13 +405,14 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, >> uint32_t rd, uint32_t rn, >> int i; >> >> for (i = 0; i < 4; i++) { >> - uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3] >> - + S1(d.words[0]) + m.words[i]; >> - >> - d.words[3] = d.words[2]; >> - d.words[2] = d.words[1]; >> - d.words[1] = d.words[0]; >> - d.words[0] = n.words[3 - i] + t; >> + uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, >> 2)) >> + + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0)) >> + + CR_ST_WORD(m, i); >> + >> + CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2); >> + CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1); >> + CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0); >> + CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t; >> } >> >> env->vfp.regs[rd] = make_float64(d.l[0]); >> @@ -418,10 +430,10 @@ void HELPER(crypto_sha256su0)(CPUARMState *env, >> uint32_t rd, uint32_t rm) >> float64_val(env->vfp.regs[rm + 1]) >> } }; >> >> - d.words[0] += s0(d.words[1]); >> - d.words[1] += s0(d.words[2]); >> - d.words[2] += s0(d.words[3]); >> - d.words[3] += s0(m.words[0]); >> + CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1)); >> + CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2)); >> + CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3)); >> + CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0)); >> >> env->vfp.regs[rd] = make_float64(d.l[0]); >> env->vfp.regs[rd + 1] = make_float64(d.l[1]); >> @@ -443,10 +455,10 @@ void HELPER(crypto_sha256su1)(CPUARMState *env, >> uint32_t rd, uint32_t rn, >> float64_val(env->vfp.regs[rm + 1]) >> } }; >> >> - d.words[0] += s1(m.words[2]) + n.words[1]; >> - d.words[1] += s1(m.words[3]) + n.words[2]; >> - d.words[2] += s1(d.words[0]) + n.words[3]; >> - d.words[3] += s1(d.words[1]) + m.words[0]; >> + CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1); >> + CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2); >> + CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3); >> + CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0); >> >> env->vfp.regs[rd] = make_float64(d.l[0]); >> env->vfp.regs[rd + 1] = make_float64(d.l[1]); >> >