----- Original Message -----
> From: "Michael Ellerman" <m...@ellerman.id.au>
> To: "Timothy Pearson" <tpear...@raptorengineering.com>, "linuxppc-dev"
> <linuxppc-dev@lists.ozlabs.org>
> Cc: "Jens Axboe" <ax...@kernel.dk>
> Sent: Tuesday, November 14, 2023 6:14:37 AM
> Subject: Re: [PATCH] powerpc: Fix data corruption on IPI
> Hi Timothy,
>
> Thanks for debugging this, but I'm unclear why this is helping because
> we should already have a full barrier (hwsync) on both the sending and
> receiving side.
>
> More below.
So it looks like we might be dealing with a couple of potentially separate
issues here, not sure. This is probably the most infuriating bug I've run
across in my career, so please bear with me -- with the amount of active test
kernels I have installed at any one time I'm occassionally infecting one with
the tests from another. ;)
First, I'm not 100% convinced the code in try_to_wake_up() is race-free on
ppc64, since adding a memory barrier between it and the call to kick_process()
significantly reduces the frequency of the on-disk corruption. Can you take a
look and see if anything stands out as to why that would be important?
The second part of this though is that the barrier only reduces the frequency
of the corruption, it does not eliminate the corruption. After some
consolidation, what completely eliminates the corruption is a combination of:
* Switching to TWA_SIGNAL_NO_IPI in task_work_add() *
* Adding udelay(1000) in io_uring/rw.c:io_write(), right before the call to
io_rw_init_file()
Adding a memory barrier instead of the udelay() doesn't work, nor does adding
the delay without switching to NO_IPI.
[1] Keeping TWA_SIGNAL and adding the barrier instruction also works, but this
is conceptually simpler to understand as to why it would have an effect at all