On 6/7/19 5:14 PM, Eric Blake wrote: > Our current implementation of qio_channel_set_cork() is pointless for > TLS sessions: we block the underlying channel, but still hand things > piecemeal to gnutls which then produces multiple encryption packets. > Better is to directly use gnutls corking, which collects multiple > inputs into a single encryption packet. > > Signed-off-by: Eric Blake <ebl...@redhat.com> > > --- > > RFC because unfortunately, I'm not sure I like the results. My test > case (using latest nbdkit.git) involves sending 10G of random data > over NBD using parallel writes (and doing nothing on the server end; > this is all about timing the data transfer): > > $ dd if=/dev/urandom of=rand bs=1M count=10k > $ time nbdkit -p 10810 --tls=require --tls-verify-peer \ > --tls-psk=/tmp/keys.psk --filter=stats null 10g statsfile=/dev/stderr \ > --run '~/qemu/qemu-img convert -f raw -W -n --target-image-opts \ > --object tls-creds-psk,id=tls0,endpoint=client,dir=/tmp,username=eblake \ > rand > driver=nbd,server.type=inet,server.host=localhost,server.port=10810,tls-creds=tls0' > > Pre-patch, I measured: > real 0m34.536s > user 0m29.264s > sys 0m4.014s > > while post-patch, it changed to: > real 0m36.055s > user 0m27.685s > sys 0m10.138s
For grins, I also tried compiling with channel-tls.c doing absolutely nothing for cork requests (neither TCP_CORKing the underlying socket, nor using gnutls_record_cork); the results were: real 0m35.904s user 0m27.480s sys 0m10.373s which is actually faster than using gnutls_record_cork, but slower than using TCP_CORK. > > Less time spent in user space, but for this particular qemu-img > behavior (writing 2M chunks at a time), gnutls is now uncorking huge > packets and the kernel is doing enough extra work that the overall > program actually takes longer. :( > > For smaller I/O patterns, the effects of corking are more likely to be > beneficial, but I don't have a ready test case to produce that pattern > (short of creating a guest running fio on a device backed by nbd). > > Ideas for improvements are welcome; see my recent thread on the > libguestfs about how TCP_CORK is already a painful interface (it > requires additional syscalls), and that we may be better off teaching > qio_channel_writev about taking a flag similar to send(,MSG_MORE), > which can achieve the same effect as setsockopt(TCP_CORK) but in fewer > syscalls: > https://www.redhat.com/archives/libguestfs/2019-June/msg00078.html > https://www.redhat.com/archives/libguestfs/2019-June/msg00081.html My thought process at the moment is that there is probably some ideal size (whether 1500 bytes for traditional Ethernet frame size, or 9000 bytes for gigabit jumbo frame size, or even 64k as a nice round number). If we get a channel_write[v]() call with a new QIO_MORE flag set, which adds less than our magic number of bytes to any already-queued data, then add it to our queue. If we get a call without the QIO_MORE flag set, or if the amount of data is larger than the magic size, then it's time to pass on our buffered data and the rest of the request to the real channel. And within channel_writev, treat all but the last vectored element as if they had the QIO_MORE flag set. Or put another way, after staring a bit at the current gnutls code, the implementation of gnutls_record_cork is not very smart - it really blocks ALL traffic, and memcpy()s data into a holding buffer, then when you finally gnutls_record_uncork, it tries to flush the entire holding buffer at once. The kernel's MSG_MORE algorithm is smarter - it will only buffer data for up to 200ms or until it has a full packet, at which point the buffering ends without further user input. But even when using TLS, we may be able to emulate some of the kernel's behaviors, by ignoring QIO_MORE for large packets, while calling gnutls_record_cork for small packets (no need to duplicate a buffer ourselves if gnutls already has one) and then uncork it as soon as we've hit our magic limit, rather than blindly waiting to uncork it after megabytes of data when the user finally uncorks their channel. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature