Hi Nir,

On 18/4/25 16:24, Nir Soffer wrote:
Testing with qemu-nbd shows that computing a hash of an image via
qemu-nbd is 5-7 times faster with this change.

Tested with 2 qemu-nbd processes:

     $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock 
/var/tmp/bench/data-10g.img &
     $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock 
/var/tmp/bench/data-10g.img &

With nbdcopy, using 4 NBD connections:

     $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' 
null:"
                      "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' 
null:"
     Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' 
null:
       Time (mean ± σ):      8.670 s ±  0.025 s    [User: 5.670 s, System: 
7.113 s]
       Range (min … max):    8.620 s …  8.703 s    10 runs

     Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' 
null:
       Time (mean ± σ):      1.839 s ±  0.008 s    [User: 4.651 s, System: 
1.882 s]
       Range (min … max):    1.830 s …  1.853 s    10 runs

     Summary
       ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran
         4.72 ± 0.02 times faster than ./nbdcopy --blkhash 
'nbd+unix:///?socket=/tmp/before.sock' null:

With blksum, using one NBD connection:

     $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \
                      "blksum 'nbd+unix:///?socket=/tmp/after.sock'"
     Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock'
       Time (mean ± σ):     13.606 s ±  0.081 s    [User: 5.799 s, System: 
6.231 s]
       Range (min … max):   13.516 s … 13.785 s    10 runs

     Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock'
       Time (mean ± σ):      1.946 s ±  0.017 s    [User: 4.541 s, System: 
1.481 s]
       Range (min … max):    1.912 s …  1.979 s    10 runs

     Summary
       blksum 'nbd+unix:///?socket=/tmp/after.sock' ran
         6.99 ± 0.07 times faster than blksum 
'nbd+unix:///?socket=/tmp/before.sock'

This will improve other usage of unix domain sockets on macOS, I tested
only qemu-nbd.

Signed-off-by: Nir Soffer <nir...@gmail.com>
---
  io/channel-socket.c | 13 +++++++++++++
  1 file changed, 13 insertions(+)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 608bcf066e..b858659764 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
      }
  #endif /* WIN32 */
+#if __APPLE__
+    /* On macOS we need to tune unix domain socket buffer for best performance.
+     * Apple recommends sizing the receive buffer at 4 times the size of the
+     * send buffer.
+     */
+    if (cioc->localAddr.ss_family == AF_UNIX) {
+        const int sndbuf_size = 1024 * 1024;

Please add a definition instead of magic value, i.e.:

  #define SOCKET_SEND_BUFSIZE  (1 * MiB)

BTW in test_io_channel_set_socket_bufs() we use 64 KiB, why 1 MiB?

+        const int rcvbuf_size = 4 * sndbuf_size;
+        setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, 
sizeof(sndbuf_size));
+        setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, 
sizeof(rcvbuf_size));
+    }
+#endif /* __APPLE__ */

Thanks,

Phil.

Reply via email to