On Dienstag, 25. Mai 2021 13:41:22 CEST Christian Schoenebeck wrote: > I started to work on a patch set on this. > > I try to get rid of that rename_lock entirely by letting the worker threads > only access temporary copies e.g. of the fid path instead of allowing the > worker threads to access main thread owned structures directly like it is > the case ATM. > > I also noticed that the rename_lock scheme is quite inconsistent right now > anyway. E.g. some of the fs v9fs_co_*() functions accessing main thread > owned structures don't take the lock at all. Some for good reasons, some > not. > > So this week I will give the described approach above a test spin and then > we will see how this impacts performance in practice before actually > posting any patch set here.
Ok, I made some test runs. Actually quite disappointing. It made literally no difference in performance whether or not that global lock is there, which surprises me, because I was expecting this to be a bottleneck for parallel background worker tasks and that removing that global lock would at least provide a few percent in performance gain. I performed two tests: 1. Measuring the boot time of a guest OS ontop of 9pfs of an OS that uses a parallel init system. Zero difference here. 2. A simple shell script using 'find' to scan the entire filesystem on a one guest process and a shell script loop in another guest process doing constantly rename of an arbitrary directory in parallel for trying to hit that global 9p lock. In this test the parallel renaming slowed down the find process by about 20%, which is definitely too much, but removing the lock did not improve this either. I had some doubts on my results and wondered if the actual amount of qemu worker threads used by 9p where simply too small for some reason, so I added a little debug hack to verify how many workers I was actually running for 9p: diff --git a/hw/9pfs/coth.c b/hw/9pfs/coth.c index 2802d41cce..a5340db28b 100644 --- a/hw/9pfs/coth.c +++ b/hw/9pfs/coth.c @@ -22,6 +22,28 @@ #include "qemu/coroutine.h" #include "qemu/main-loop.h" #include "coth.h" +#include <stdatomic.h> + +atomic_int p9_workers; +atomic_int p9_workers_max; + +void p9_worker_starts(void) { + int now = atomic_fetch_add_explicit(&p9_workers, 1, memory_order_seq_cst) + 1; + int prev = p9_workers_max; + bool isNewMax = now > prev; + while (now > prev && + !atomic_compare_exchange_weak(&p9_workers_max, &prev, now)) + { + } + if (isNewMax) { + printf("9P: NEW MAX WORKER THREADS : %d <-------- !!!\n", now); + fflush(stdout); + } +} + +void p9_worker_ends(void) { + atomic_fetch_add_explicit(&p9_workers, -1, memory_order_seq_cst); +} /* Called from QEMU I/O thread. */ static void coroutine_enter_cb(void *opaque, int ret) diff --git a/hw/9pfs/coth.h b/hw/9pfs/coth.h index c51289903d..e82f92af5e 100644 --- a/hw/9pfs/coth.h +++ b/hw/9pfs/coth.h @@ -51,12 +51,16 @@ */ \ qemu_coroutine_yield(); \ qemu_bh_delete(co_bh); \ + p9_worker_starts(); \ code_block; \ + p9_worker_ends(); \ /* re-enter back to qemu thread */ \ qemu_coroutine_yield(); \ } while (0) void co_run_in_worker_bh(void *); +void p9_worker_starts(void); +void p9_worker_ends(void); int coroutine_fn v9fs_co_readlink(V9fsPDU *, V9fsPath *, V9fsString *); int coroutine_fn v9fs_co_readdir(V9fsPDU *, V9fsFidState *, struct dirent **); int coroutine_fn v9fs_co_readdir_many(V9fsPDU *, V9fsFidState *, But I got about 8 - 12 maximum worker threads doing their job for 9pfs simultaniously. I mean if you look at this debug code I am actually counting the amount of worker *coroutines*, not really the amount of worker *threads*, but as all v9fs_co_*() functions are only dispatching exactly once back to main thread (i.e. once worker task completed or errored out), that should in fact always equal the number of parallel 9p worker threads. I actually wonder where that's determined how many worker threads are allocated for the qemu thread pool, whether that's a hard coded amount somewhere or probably configurable from the CLI. So for now I guess I postpone this global lock issue, unless somebody has some idea of what I still might be missing here. I still think the global lock should be removed on the long term, but considering that I had zero performance gain and my current patch set delta stats aren't necessarily small: fsdev/9p-marshal.c | 7 +++++++ fsdev/9p-marshal.h | 1 + hw/9pfs/9p.c | 32 +++++++------------------------- hw/9pfs/9p.h | 27 +-------------------------- hw/9pfs/codir.c | 26 ++++++++++++++++---------- hw/9pfs/cofile.c | 53 +++++++++++++++++++++++++++++------------------------ hw/9pfs/cofs.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++--------------------------------------- hw/9pfs/coth.c | 22 ++++++++++++++++++++++ hw/9pfs/coth.h | 4 ++++ hw/9pfs/coxattr.c | 28 ++++++++++++++++------------ 10 files changed, 162 insertions(+), 136 deletions(-) Hence I probably concentrate on other things for now. If somebody wants to have a glimpse on my patch set, I can post it at any time of course. Best regards, Christian Schoenebeck