On 6/10/19 4:08 AM, Daniel P. Berrangé wrote: > On Fri, Jun 07, 2019 at 05:14:14PM -0500, Eric Blake wrote: >> Our current implementation of qio_channel_set_cork() is pointless for >> TLS sessions: we block the underlying channel, but still hand things >> piecemeal to gnutls which then produces multiple encryption packets. >> Better is to directly use gnutls corking, which collects multiple >> inputs into a single encryption packet. >> >> Signed-off-by: Eric Blake <ebl...@redhat.com> >> >> --- >> >> RFC because unfortunately, I'm not sure I like the results. My test >> case (using latest nbdkit.git) involves sending 10G of random data >> over NBD using parallel writes (and doing nothing on the server end; >> this is all about timing the data transfer): >> >> $ dd if=/dev/urandom of=rand bs=1M count=10k >> $ time nbdkit -p 10810 --tls=require --tls-verify-peer \ >> --tls-psk=/tmp/keys.psk --filter=stats null 10g statsfile=/dev/stderr \ >> --run '~/qemu/qemu-img convert -f raw -W -n --target-image-opts \ >> --object tls-creds-psk,id=tls0,endpoint=client,dir=/tmp,username=eblake >> \ >> rand >> driver=nbd,server.type=inet,server.host=localhost,server.port=10810,tls-creds=tls0' >> >> Pre-patch, I measured: >> real 0m34.536s >> user 0m29.264s >> sys 0m4.014s >> >> while post-patch, it changed to: >> real 0m36.055s >> user 0m27.685s >> sys 0m10.138s >> >> Less time spent in user space, but for this particular qemu-img >> behavior (writing 2M chunks at a time), gnutls is now uncorking huge >> packets and the kernel is doing enough extra work that the overall >> program actually takes longer. :( > > I wonder if the problem is that we're hitting a tradeoff between > time spent in encryption vs time spent in network I/O. > > When we don't cork, the kernel has already sent the first packet > during the time gnutls is burning CPU encrypting the second packet. > > When we do cork gnutls has to encrypt both packets before the kernel > can send anything.
Exactly. That's why my gut feel is that having only a binary cork is too course, and we instead want something more like MSG_MORE. If the flag is set, AND the current size is small, then cork; then on the next message, we see that we are still corked, and decide to either uncork immediately (current message is large, corking buys us nothing useful as we're going to split the current message anyway) or uncork after appending the current message (current message is small enough that keeping things corked may still be a win). Visually, cork send() send() uncork is too late - you can't tell whether the second send would have benefitted from staying corked after the first. But: send(MSG_MORE) send() is ideal; under the hood, we can translate it to: send(MSG_MORE) gnutls_record_cork() gnutls_record_send() send() if (size > threshold) { gnutls_record_uncork() gnutls_record_send() } else { gnutls_record_send() gnutls_record_uncork() } So we really need a way to plumb a MSG_MORE flag for senders to use, when they KNOW they will be sending back-to-back pieces and where the first piece is short, but it is not yet obvious whether the second piece is short or long. MSG_MORE was lon the next message to go through the stack, if the previous message next paccork for > >> For smaller I/O patterns, the effects of corking are more likely to be >> beneficial, but I don't have a ready test case to produce that pattern >> (short of creating a guest running fio on a device backed by nbd). > > I think it would probably be useful to see guest kernels with fio > and definitely see results for something closer to sector sized > I/O. > > 2 MB chunks is pretty unrealistic for a guest workload where there > will be lots of sector sized I/O. Sure QEMU / guest OS can merge > sectors, but it still feels like most I/O is going to be much smaller > than 2 MB. I'll have to play with other benchmarking setups, and see if I can come up with a reliable fio test. > > >> diff --git a/io/channel.c b/io/channel.c >> index 2a26c2a2c0b9..3510912465ac 100644 >> --- a/io/channel.c >> +++ b/io/channel.c >> @@ -379,7 +379,16 @@ void qio_channel_set_cork(QIOChannel *ioc, >> QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); >> >> if (klass->io_set_cork) { >> - klass->io_set_cork(ioc, enabled); >> + int r = klass->io_set_cork(ioc, enabled); >> + >> + while (r == QIO_CHANNEL_ERR_BLOCK) { >> + if (qemu_in_coroutine()) { >> + qio_channel_yield(ioc, G_IO_OUT); >> + } else { >> + qio_channel_wait(ioc, G_IO_OUT); >> + } >> + r = klass->io_set_cork(ioc, enabled); >> + } >> } > > Interesting - did you actually hit this EAGAIN behaviour ? I wouldn't > have expected anything to be pending in gnutls that needs retry. Yes. When gnutls_record_cork() is called, NOTHING goes over the wire, but instead everything gets appended to a realloc'd buffer; when you finally gnutls_record_uncork() and are using a non-blocking socket, you WILL hit EAGAIN if the amount of data being uncorked is larger than TCP max segment size. But again, I think that having qio_channel_set_cork() do the blocking wait is too late, we really want a nicer interface where you can request a per-message MSG_MORE, at which point crypto/tlssession.c handles the cork/uncork logic as part of send()ing rather than making qio_uncork have to block. It's just that it's a lot more work (either all existing callers need to adjust to a flags parameter, or we need new entry points to expose a flag parameter to the few callers that care - right now, nbd and sheepdog, although we may find others). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature