Hi Nir,
On 18/4/25 16:24, Nir Soffer wrote:
Testing with qemu-nbd shows that computing a hash of an image via
qemu-nbd is 5-7 times faster with this change.
Tested with 2 qemu-nbd processes:
$ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock
/var/tmp/bench/data-10g.img &
$ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock
/var/tmp/bench/data-10g.img &
With nbdcopy, using 4 NBD connections:
$ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock'
null:"
"./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock'
null:"
Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock'
null:
Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System:
7.113 s]
Range (min … max): 8.620 s … 8.703 s 10 runs
Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock'
null:
Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System:
1.882 s]
Range (min … max): 1.830 s … 1.853 s 10 runs
Summary
./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran
4.72 ± 0.02 times faster than ./nbdcopy --blkhash
'nbd+unix:///?socket=/tmp/before.sock' null:
With blksum, using one NBD connection:
$ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \
"blksum 'nbd+unix:///?socket=/tmp/after.sock'"
Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock'
Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System:
6.231 s]
Range (min … max): 13.516 s … 13.785 s 10 runs
Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock'
Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System:
1.481 s]
Range (min … max): 1.912 s … 1.979 s 10 runs
Summary
blksum 'nbd+unix:///?socket=/tmp/after.sock' ran
6.99 ± 0.07 times faster than blksum
'nbd+unix:///?socket=/tmp/before.sock'
This will improve other usage of unix domain sockets on macOS, I tested
only qemu-nbd.
Signed-off-by: Nir Soffer <nir...@gmail.com>
---
io/channel-socket.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 608bcf066e..b858659764 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
}
#endif /* WIN32 */
+#if __APPLE__
+ /* On macOS we need to tune unix domain socket buffer for best performance.
+ * Apple recommends sizing the receive buffer at 4 times the size of the
+ * send buffer.
+ */
+ if (cioc->localAddr.ss_family == AF_UNIX) {
+ const int sndbuf_size = 1024 * 1024;
Please add a definition instead of magic value, i.e.:
#define SOCKET_SEND_BUFSIZE (1 * MiB)
BTW in test_io_channel_set_socket_bufs() we use 64 KiB, why 1 MiB?
+ const int rcvbuf_size = 4 * sndbuf_size;
+ setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size,
sizeof(sndbuf_size));
+ setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size,
sizeof(rcvbuf_size));
+ }
+#endif /* __APPLE__ */
Thanks,
Phil.