Christian Schoenebeck <qemu_...@crudebyte.com> writes:
> On Freitag, 9. September 2022 15:10:48 CEST Christian Schoenebeck wrote: >> On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote: >> > The previous implementation would iterate over the fid table for >> > lookup operations, resulting in an operation with O(n) complexity on >> > the number of open files and poor cache locality -- for every open, >> > stat, read, write, etc operation. >> > >> > This change uses a hashtable for this instead, significantly improving >> > the performance of the 9p filesystem. The runtime of NixOS's simple >> > installer test, which copies ~122k files totalling ~1.8GiB from 9p, >> > decreased by a factor of about 10. >> > >> > Signed-off-by: Linus Heckemann <g...@sphalerite.org> >> > Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org> >> > Reviewed-by: Greg Kurz <gr...@kaod.org> >> > --- >> >> Queued on 9p.next: >> https://github.com/cschoenebeck/qemu/commits/9p.next >> >> I retained the BUG_ON() in get_fid(), Greg had a point there that continuing >> to work on a clunked fid would still be a bug. >> >> I also added the suggested TODO comment for g_hash_table_steal_extended(), >> the actual change would be outside the scope of this patch. >> >> And finally I gave this patch a whirl, and what can I say: that's just sick! >> Compiling sources with 9p is boosted by around factor 6..7 here! And >> running 9p as root fs also no longer feels sluggish as before. I mean I >> knew that this fid list traversal performance issue existed and had it on >> my TODO list, but the actual impact exceeded my expectation by far. > > Linus, there is still something cheesy. After more testing, at a certain point > running the VM, the terminal is spilled with this message: > > GLib: g_hash_table_iter_next: assertion 'ri->version == > ri->hash_table->version' failed > > Looking at the glib sources, I think this warning means the iterator got > invalidated. Setting a breakpoint at glib function g_return_if_fail_warning I > got: > > Thread 1 "qemu-system-x86" hit Breakpoint 1, 0x00007ffff7aa9d80 in > g_return_if_fail_warning () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 > (gdb) bt > #0 0x00007ffff7aa9d80 in g_return_if_fail_warning () at > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #1 0x00007ffff7a8ea18 in g_hash_table_iter_next () at > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #2 0x0000555555998a7a in v9fs_mark_fids_unreclaim (pdu=0x555557a34c90, > path=0x7ffba8ceff30) at ../hw/9pfs/9p.c:528 > #3 0x000055555599f7a0 in v9fs_unlinkat (opaque=0x555557a34c90) at > ../hw/9pfs/9p.c:3170 > #4 0x000055555606dc4b in coroutine_trampoline (i0=1463900480, i1=21845) at > ../util/coroutine-ucontext.c:177 > #5 0x00007ffff7749d40 in __start_context () at > /lib/x86_64-linux-gnu/libc.so.6 > #6 0x00007fffffffd5f0 in () > #7 0x0000000000000000 in () > (gdb) > > The while loop in v9fs_mark_fids_unreclaim() holds the hash table iterator > while the hash table is modified during the loop. > > Would you please fix this? If you do, please use my already queued patch > version as basis. > > Best regards, > Christian Schoenebeck Hi Christian, Thanks for finding this! I think I understand the problem, but I can't reproduce it at all (I've been trying by hammering the filesystem with thousands of opens/closes across several processes). Do you have a reliable way? Cheers Linus