On Sat, Feb 04, 2023 at 12:34:52PM +0000, Richard W.M. Jones wrote: > Anyway, this all seems to work, but it actually reduces performance :-( > > In particular this simple test slows down quite substantially: > > time ./nbdkit -r -U - curl file:/var/tmp/fedora-36.img --run 'nbdcopy > --no-extents -p "$uri" null:' > > (where /var/tmp/fedora-36.img is a 10G file).
A bit more on this ... The slowdown is most easily observable if you apply this patch series, test it (see command above), and then change just: plugin/curl/curl.c: -#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS Serialising requests dramatically, repeatably improves the performance! Here are flame graphs for the two cases: http://oirase.annexia.org/tmp/nbdkit-parallel.svg http://oirase.annexia.org/tmp/nbdkit-serialize-requests.svg These are across all cores on a 12 core / 24 thread machine. nbdkit is somehow able to consume more total machine time in the serialize requests case (67.75%) than in the parallel case (37.75%). nbdcopy is taking about the same amount of time in both cases. In the parallel case, the time spent in do_idle in the kernel dramatically increases. My working theory is this is something to do with starvation of the NBD multi-conn connections: We now have multi-conn enabled, so nbdcopy will make 4 connections to nbdkit. nbdcopy also aggressively keeps multiple requests in flight on each connection (64 at a time). In the serialize_requests case, each NBD connection will only handle a single request at a time. These are shared across the 4 available libcurl handles. In the parallel requests case, it is highly likely that the first 4 requests on the 1st NBD connection will grab the 4 available libcurl handles. The replies will then be sent back over the single NBD connection. Then the next 4 requests from one of the NBD connections will repeat the same thing. Basically even though multi-conn is possible, I expect that only one NBD connection is being fully utilised most of the time (or anyway full use is not made of all 4 NBD connections at the same time). To maximize throughput we want to send replies over all NBD connections simultaneously, and serialize_requests (indirectly and accidentally) achieves that. I'm still adding instrumentation to see if the theory above is right, plus I have no idea how to fix this. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs