On 3/28/23 6:51?AM, Michael Ellerman wrote: > Jens Axboe <ax...@kernel.dk> writes: >>>> Can the queueing cause the creation of an IO thread (if one does not >>>> exist, or all blocked?) >>> >>> Yep >>> >>> Since writing this email, I've gone through a lot of different tests. >>> Here's a rough listing of what I found: >>> >>> - Like using the hack patch, if I just limit the number of IO thread >>> workers to 1, it seems to pass. At least longer than before, does 1000 >>> iterations. >>> >>> - If I pin each IO worker to a single CPU, it also passes. >>> >>> - If I liberally sprinkle smp_mb() for the io-wq side, test still fails. >>> I've added one before queueing the work item, and after. One before >>> the io-wq worker grabs a work item and one after. Eg full hammer >>> approach. This still fails. >>> >>> Puzzling... For the "pin each IO worker to a single CPU" I added some >>> basic code around trying to ensure that a work item queued on CPU X >>> would be processed by a worker on CPU X, and too a large degree, this >>> does happen. But since the work list is a normal list, it's quite >>> possible that some other worker finishes its work on CPU Y just in time >>> to grab the one from cpu X. I checked and this does happen in the test >>> case, yet it still passes. This may be because I got a bit lucky, but >>> seems suspect with thousands of passes of the test case. >>> >>> Another theory there is that it's perhaps related to an io-wq worker >>> being rescheduled on a different CPU. Though again puzzled as to why the >>> smp_mb sprinkling didn't fix that then. I'm going to try and run the >>> test case with JUST the io-wq worker pinning and not caring about where >>> the work is processed to see if that does anything. >> >> Just pinning each worker to whatever CPU they got created on seemingly >> fixes the issue too. This does not mean that each worker will process >> work on the CPU on which it was queued, just that each worker will >> remain on whatever CPU it originally got created on. >> >> Puzzling... >> >> Note that it is indeed quite possible that this isn't a ppc issue at >> all, just shows on ppc. It could be page cache related, or it could even >> be a bug in mariadb itself. > > I tried binary patching every lwsync to hwsync (read/write to full > barrier) in mariadbd and all the libaries it links. It didn't fix the > problem. > > I also tried switching all the kernel barriers/spin locks to using a > hwsync, but that also didn't fix it. > > It's still possible there's somewhere that currently has no barrier at > all that needs one, the above would only fix the problem if we have a > read/write barrier that actually needs to be a full barrier. > > > I also looked at making all TLB invalidates broadcast, regardless of > whether we think the thread has only been on a single CPU. That didn't > help, but I'm not sure I got all places where we do TLB invalidates, so > I'll look at that some more tomorrow.
Thanks, appreciate your testing! I have no new data points since yesterday, but the key point from then still seems to be that if an io worker never reschedules onto a different CPU, then the problem doesn't occur. This could very well be a page cache issue, if it isn't an issue on the powerpc side... -- Jens Axboe