> On 22 Apr 2025, at 13:42, Daniel P. Berrangé <berra...@redhat.com> wrote: > > On Sun, Apr 20, 2025 at 02:12:18AM +0300, Nir Soffer wrote: >> On macOS we need to increase unix socket buffers size on the client and >> server to get good performance. We set the socket buffers on macOS after >> connecting or accepting a client connection. >> >> Testing with qemu-nbd shows that reading an image with qemu-img convert >> from qemu-nbd is *11.4 times faster* and qemu-img cpu usage is *8.3 times >> lower*. >> >> | qemu-img | qemu-nbd | time | user | system | >> |----------|----------|--------|--------|--------| >> | before | before | 12.957 | 2.643 | 5.777 | >> | after | before | 12.803 | 2.632 | 5.742 | >> | before | after | 1.139 | 0.074 | 0.905 | >> | after | after | 1.179 | 0.077 | 0.931 | >> >> For testing buffers size I built qemu-nbd and qemu-img with send buffer >> size from 64k to 2m. In this test 256k send buffer and 1m receive buffer >> are optimal. >> >> | send buffer | recv buffer | time | user | system | >> |-------------|-------------|--------|--------|--------| >> | 64k | 256k | 2.233 | 0.290 | 1.408 | >> | 128k | 512k | 1.189 | 0.103 | 0.841 | >> | 256k | 1024k | 1.121 | 0.085 | 0.813 | >> | 512k | 2048k | 1.172 | 0.081 | 0.953 | >> | 1024k | 4096k | 1.160 | 0.072 | 0.907 | >> | 2048k | 8192k | 1.309 | 0.056 | 0.960 | >> >> Using null-co driver is useful to focus on the read part, but in the >> real world we do something with the read data. I tested real world usage >> with nbdcopy and blksum. >> >> I tested computing a hash of the image using nbdcopy, using 4 NBD >> connections and 256k request size. In this test 1m send buffer size and >> 4m receive buffer size are optimal. >> >> | send buffer | recv buffer | time | user | system | >> |-------------|-------------|--------|--------|--------| >> | 64k | 256k | 2.832 | 4.866 | 2.550 | >> | 128k | 512k | 2.429 | 4.762 | 2.037 | >> | 256k | 1024k | 2.158 | 4.724 | 1.813 | >> | 512k | 2048k | 1.777 | 4.632 | 1.790 | >> | 1024k | 4096k | 1.657 | 4.466 | 1.812 | >> | 2048k | 8192k | 1.782 | 4.570 | 1.912 | >> >> I tested creating a hash of the image with blksum, using one NBD >> connection and 256k read size. In this test 2m send buffer and 8m >> receive buffer are optimal. >> >> | send buffer | recv buffer | time | user | system | >> |-------------|-------------|--------|--------|--------| >> | 64k | 256k | 4.233 | 5.242 | 2.632 | >> | 128k | 512k | 3.329 | 4.915 | 2.015 | >> | 256k | 1024k | 2.071 | 4.647 | 1.474 | >> | 512k | 2048k | 1.980 | 4.554 | 1.432 | >> | 1024k | 4096k | 2.058 | 4.553 | 1.497 | >> | 2048k | 8192k | 1.972 | 4.539 | 1.497 | >> >> In the real world tests larger buffers are optimal, so I picked send >> buffer of 1m and receive buffer of 4m. > > IIUC all your test scenarios have recv buffer x4 size of send buffer. > > Do you have any link / reference for the idea that we should be using > this x4 size multiplier ? This feels rather peculiar as a rule.
The x4 factor came from this: https://developer.apple.com/documentation/virtualization/vzfilehandlenetworkdeviceattachment/maximumtransmissionunit?language=objc > The client side of the associated datagram socket must be properly configured > with the appropriate values for SO_SNDBUF, and SO_RCVBUF. Set these using the > setsockopt(_:_:_:_:_:) system call. The system expects the value of SO_RCVBUF > to be at least double the value of SO_SNDBUF, and for optimal performance, the > recommended value of SO_RCVBUF is four times the value of SO_SNDBUF. This advice is wrong since with unix datagram socket the send buffer is not used for buffering. It only determines the maximum datagram that can be sent. This is not documented in macOS, but documented in FreeBSD manual. I tested this for Vmnet-helper, using 65k send buffer (largest packet size when using offloading) and 4m receive buffer. This configuration (1m send buffer, 4m receive buffer) is used in many projects using the virtiaulization framework (lima, vfkit, softnet). This is why I started with this configuration. But these projects use it for unix datagram socket and the advice may not be relevant to unix stream socket. This is what we have in macOS manuals about the values: getsockopt(2) SO_SNDBUF and SO_RCVBUF are options to adjust the normal buffer sizes allocated for output and input buffers, respectively. The buffer size may be increased for high-volume connections, or may be decreased to limit the possible backlog of incoming data. The system places an absolute limit on these values. > > Can you show test result grid matrix for the incrementing these > send/recv buffers independently ? > Sure, I think testing with the same value and with default value for receive buffer will show if this make a difference for read. Note that I tested only read - in this case the client send small nbd read command (~32 bytes) and receives nbd structured reply with 2m of payload (2m + ~32 bytes). Changing the client send and receive buffer shows very little change, so it is likly that only the send buffer on the server side matter in this case. We need to test also write to nbd.