On Donnerstag, 22. September 2022 13:43:56 CEST Linus Heckemann wrote: > Christian Schoenebeck <qemu_...@crudebyte.com> writes: > > On Freitag, 9. September 2022 15:10:48 CEST Christian Schoenebeck wrote: > >> On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote: > >> > The previous implementation would iterate over the fid table for > >> > lookup operations, resulting in an operation with O(n) complexity on > >> > the number of open files and poor cache locality -- for every open, > >> > stat, read, write, etc operation. > >> > > >> > This change uses a hashtable for this instead, significantly improving > >> > the performance of the 9p filesystem. The runtime of NixOS's simple > >> > installer test, which copies ~122k files totalling ~1.8GiB from 9p, > >> > decreased by a factor of about 10. > >> > > >> > Signed-off-by: Linus Heckemann <g...@sphalerite.org> > >> > Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org> > >> > Reviewed-by: Greg Kurz <gr...@kaod.org> > >> > --- > >> > >> Queued on 9p.next: > >> https://github.com/cschoenebeck/qemu/commits/9p.next > >> > >> I retained the BUG_ON() in get_fid(), Greg had a point there that > >> continuing to work on a clunked fid would still be a bug. > >> > >> I also added the suggested TODO comment for > >> g_hash_table_steal_extended(), > >> the actual change would be outside the scope of this patch. > >> > >> And finally I gave this patch a whirl, and what can I say: that's just > >> sick! Compiling sources with 9p is boosted by around factor 6..7 here! > >> And running 9p as root fs also no longer feels sluggish as before. I > >> mean I knew that this fid list traversal performance issue existed and > >> had it on my TODO list, but the actual impact exceeded my expectation by > >> far.> > > Linus, there is still something cheesy. After more testing, at a certain > > point> > > running the VM, the terminal is spilled with this message: > > GLib: g_hash_table_iter_next: assertion 'ri->version == > > ri->hash_table->version' failed> > > Looking at the glib sources, I think this warning means the iterator got > > invalidated. Setting a breakpoint at glib function > > g_return_if_fail_warning I> > > got: > > Thread 1 "qemu-system-x86" hit Breakpoint 1, 0x00007ffff7aa9d80 in > > g_return_if_fail_warning () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 > > (gdb) bt > > #0 0x00007ffff7aa9d80 in g_return_if_fail_warning () at > > /lib/x86_64-linux-gnu/libglib-2.0.so.0 #1 0x00007ffff7a8ea18 in > > g_hash_table_iter_next () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 > > 0x0000555555998a7a in v9fs_mark_fids_unreclaim (pdu=0x555557a34c90, > > path=0x7ffba8ceff30) at ../hw/9pfs/9p.c:528 #3 0x000055555599f7a0 in > > v9fs_unlinkat (opaque=0x555557a34c90) at ../hw/9pfs/9p.c:3170 #4 > > 0x000055555606dc4b in coroutine_trampoline (i0=1463900480, i1=21845) at > > ../util/coroutine-ucontext.c:177 #5 0x00007ffff7749d40 in > > __start_context () at /lib/x86_64-linux-gnu/libc.so.6 #6 > > 0x00007fffffffd5f0 in () > > #7 0x0000000000000000 in () > > (gdb) > > > > The while loop in v9fs_mark_fids_unreclaim() holds the hash table iterator > > while the hash table is modified during the loop. > > > > Would you please fix this? If you do, please use my already queued patch > > version as basis. > > > > Best regards, > > Christian Schoenebeck > > Hi Christian, > > Thanks for finding this! > > I think I understand the problem, but I can't reproduce it at all (I've > been trying by hammering the filesystem with thousands of opens/closes > across several processes). Do you have a reliable way?
I also didn't hit this issue in my initial tests. I do hit this after about just a minute though when running 9p as root fs [1] and then starting to compile KDE [2] inside the VM. [1] https://wiki.qemu.org/Documentation/9p_root_fs [2] https://community.kde.org/Get_Involved/development Independent of reproduction, let me elaborate what's going on, as this issue is probably not that obvious: 1.) Like with any ordered container, the glib hash iterator becomes invalid as soon as the hash table got modified. Fortunately glib detects this case by maintaining a "version" field on the hash table which is bumped whenever the hash table was modified, and likewise a "version" field on the iterator structure and detecting an invalid iterator by comparing [3] the two "version" fields when calling g_hash_table_iter_next() for instance: g_return_val_if_fail (ri->version == ri->hash_table->version, FALSE); and the g_return_val_if_fail() macro calls g_return_if_fail_warning() to print the warning message in this case and then just immediately returns from g_hash_table_iter_next(). So this is not nitpicking, it will actually start severe 9p server misbehaviour at this point. [3] https://github.com/GNOME/glib/blob/main/glib/ghash.c#L1182 2.) The hash table loop inside v9fs_mark_fids_unreclaim() does not directly modify the "fids" hash table. But inside that loop we get contention, because even though basically everything inside 9p.c is executed on QEMU main thread only, all the 9p filesystem driver calls are dispatched [4] to a worker thread and then after fs driver task completion, dispatched back to main thread. And specifically in v9fs_mark_fids_unreclaim() these are the two functions put_fid() and v9fs_reopen_fid() that are endangered to this possibility (i.e. those two may "yield" [4] the coroutine). So what happens is, the coroutine is dispatched to the fs worker thread (inside those two functions), in the meantime main thread picks & continues to run another coroutine (i.e. processes another active 9p request), and that in turn might of course modify the 'fids' hash table, depending on what kind of 9p request that is. Then main thread gets back to the original couroutine inside 9fs_mark_fids_unreclaim(), and eventually continues the hash table loop there, and bam. [4] https://wiki.qemu.org/Documentation/9p#Coroutines --- As for a solution: in contrast to the previous list based implementation, I think we can't (or shouldn't) recover from an invalidated hash table iterator, even though we could detect this case by checking the return value of g_hash_table_iter_next(). I think it would make sense instead to first collect the respective fids inside that loop with a local container (e.g. a local list on the stack), and then putting the fids subsequently in a separate loop below. Does this make sense? Best regards, Christian Schoenebeck