Le 02/09/2024 à 14:34, Jason A. Donenfeld a écrit :
On Mon, Sep 02, 2024 at 02:04:41PM +0200, Christophe Leroy wrote:
This first patch adds support for PPC32. As selftests cannot easily
be generated only for PPC32, and because the following patch brings
support for PPC64 anyway, this patch opts out all code in
__arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
fail to compile and will not crash on PPC64/PPC64LE, allthough the
selftest itself will fail. This patch also adds a dummy
__kernel_getrandom() function that returns ENOSYS on PPC64 so that
vdso_test_getrandom returns KSFT_SKIP instead of KSFT_FAIL.

Why not just wire up the selftests in the next patch like you did for
v3? This seems like extra stuff for no huge reason?

In v3 selftests were already wired up in v3, and there was the following build failure:

$ make  ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-
  CC       vdso_test_gettimeofday
  CC       vdso_test_getcpu
  CC       vdso_test_abi
  CC       vdso_test_clock_getres
  CC       vdso_test_correctness
  CC       vdso_test_getrandom
  CC       vdso_test_chacha
/home/chleroy/linux-powerpc/tools/testing/selftests/../../../tools/arch/powerpc/vdso/vgetrandom-chacha.S: Assembler messages: /home/chleroy/linux-powerpc/tools/testing/selftests/../../../tools/arch/powerpc/vdso/vgetrandom-chacha.S:84: Error: `stmw' invalid when little-endian /home/chleroy/linux-powerpc/tools/testing/selftests/../../../tools/arch/powerpc/vdso/vgetrandom-chacha.S:198: Error: `lmw' invalid when little-endian make: *** [../lib.mk:222: /home/chleroy/linux-powerpc/tools/testing/selftests/vDSO/vdso_test_chacha] Error 1

So I did this change to get a clean PPC32 implementation before going into PPC64. I thought it was easier to go in two steps for reviews, bisectability, etc .... for just a very little extra stuff.


  arch/powerpc/Kconfig                         |   1 +
  arch/powerpc/include/asm/vdso/getrandom.h    |  54 +++++
  arch/powerpc/include/asm/vdso/vsyscall.h     |   6 +
  arch/powerpc/include/asm/vdso_datapage.h     |   2 +
  arch/powerpc/kernel/asm-offsets.c            |   1 +
  arch/powerpc/kernel/vdso/Makefile            |  13 +-
  arch/powerpc/kernel/vdso/getrandom.S         |  58 ++++++
  arch/powerpc/kernel/vdso/vdso32.lds.S        |   1 +
  arch/powerpc/kernel/vdso/vdso64.lds.S        |   1 +
  arch/powerpc/kernel/vdso/vgetrandom-chacha.S | 207 +++++++++++++++++++
  arch/powerpc/kernel/vdso/vgetrandom.c        |  16 ++
  tools/testing/selftests/vDSO/Makefile        |   2 +-
  12 files changed, 359 insertions(+), 3 deletions(-)
  create mode 100644 arch/powerpc/include/asm/vdso/getrandom.h
  create mode 100644 arch/powerpc/kernel/vdso/getrandom.S
  create mode 100644 arch/powerpc/kernel/vdso/vgetrandom-chacha.S
  create mode 100644 arch/powerpc/kernel/vdso/vgetrandom.c

I think you might have forgotten to add the symlink in this commit (or
the next one, per my comment above, if you agree with it).

???? That's odd. All CI tests on github went ok !!! Looks like the CI tests for selftests are broken. Argh ! And of course on my computer the link was there so I didn't notice.


+/*
+ * Very basic 32 bits implementation of ChaCha20. Produces a given positive 
number
+ * of blocks of output with a nonce of 0, taking an input key and 8-byte
+ * counter. Importantly does not spill to the stack. Its arguments are:
+ *
+ *     r3: output bytes
+ *     r4: 32-byte key input
+ *     r5: 8-byte counter input/output (saved on stack)
+ *     r6: number of 64-byte blocks to write to output
+ *
+ *     r0: counter of blocks (initialised with r6)
+ *     r4: Value '4' after key has been read.
+ *     r5-r12: key
+ *     r14-r15: counter
+ *     r16-r31: state
+ */
+SYM_FUNC_START(__arch_chacha20_blocks_nostack)
+#ifdef __powerpc64__
+       blr
+#else
+       stwu    r1, -96(r1)
+       stw     r5, 20(r1)
+       stmw    r14, 24(r1)
+
+       lwz     r14, 0(r5)
+       lwz     r15, 4(r5)
+       mr      r0, r6
+       subi    r3, r3, 4
+
+       lwz     r5, 0(r4)
+       lwz     r6, 4(r4)
+       lwz     r7, 8(r4)
+       lwz     r8, 12(r4)
+       lwz     r9, 16(r4)
+       lwz     r10, 20(r4)
+       lwz     r11, 24(r4)
+       lwz     r12, 28(r4)

If you don't want to do this, don't worry about it, but while I'm
commenting on things, I think it's worth noting that x86, loongarch, and
arm64 implementations all use the preprocessor or macros to give names
to these registers -- state1,2,3,...copy1,2,3 and so forth. Might be
worth doing the same if you think there's an easy and obvious way of
doing it. If not -- or if that kind of work abhors you -- don't worry
about it, as I'm confident enough that this code works fine. But it
might be "nice to have". Up to you.

I'll have a look.


+
+       li      r4, 4
+.Lblock:
+       li      r31, 10
+

Maybe a comment here, "expand 32-byte k" or similar.

ok


+       lis     r16, 0x6170
+       lis     r17, 0x3320
+       lis     r18, 0x7962
+       lis     r19, 0x6b20
+       addi    r16, r16, 0x7865
+       addi    r17, r17, 0x646e
+       addi    r18, r18, 0x2d32
+       addi    r19, r19, 0x6574
+
+       mtctr   r31
+
+
+       subic.  r0, r0, 1       /* subi. can't use r0 as source */

Never seen the period suffix. Just looked this up. Neat.

Not sure what your comment is. Are you talking about the dot suffix after subic ?

That dot means I want CR register to be updated by the instruction. It is equivalent to doing a comparision of the result with 0. It is used by the bne (branch if not equal) a few lines later.

Reply via email to