Testing with qemu-nbd shows that computing a hash of an image via qemu-nbd is 5-7 times faster with this change.
Tested with 2 qemu-nbd processes: $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock /var/tmp/bench/data-10g.img & $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock /var/tmp/bench/data-10g.img & With nbdcopy, using 4 NBD connections: $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null:" "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null:" Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System: 7.113 s] Range (min … max): 8.620 s … 8.703 s 10 runs Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System: 1.882 s] Range (min … max): 1.830 s … 1.853 s 10 runs Summary ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran 4.72 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: With blksum, using one NBD connection: $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \ "blksum 'nbd+unix:///?socket=/tmp/after.sock'" Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock' Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System: 6.231 s] Range (min … max): 13.516 s … 13.785 s 10 runs Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock' Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System: 1.481 s] Range (min … max): 1.912 s … 1.979 s 10 runs Summary blksum 'nbd+unix:///?socket=/tmp/after.sock' ran 6.99 ± 0.07 times faster than blksum 'nbd+unix:///?socket=/tmp/before.sock' This will improve other usage of unix domain sockets on macOS, I tested only qemu-nbd. Signed-off-by: Nir Soffer <nir...@gmail.com> --- io/channel-socket.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/io/channel-socket.c b/io/channel-socket.c index 608bcf066e..b858659764 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, } #endif /* WIN32 */ +#if __APPLE__ + /* On macOS we need to tune unix domain socket buffer for best performance. + * Apple recommends sizing the receive buffer at 4 times the size of the + * send buffer. + */ + if (cioc->localAddr.ss_family == AF_UNIX) { + const int sndbuf_size = 1024 * 1024; + const int rcvbuf_size = 4 * sndbuf_size; + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); + } +#endif /* __APPLE__ */ + qio_channel_set_feature(QIO_CHANNEL(cioc), QIO_CHANNEL_FEATURE_READ_MSG_PEEK); -- 2.39.5 (Apple Git-154)