Re: [Qemu-devel] [PATCH] RFC: qio: Improve corking of TLS sessions

Eric Blake Mon, 10 Jun 2019 07:11:30 -0700

On 6/10/19 4:08 AM, Daniel P. Berrangé wrote:
> On Fri, Jun 07, 2019 at 05:14:14PM -0500, Eric Blake wrote:
>> Our current implementation of qio_channel_set_cork() is pointless for
>> TLS sessions: we block the underlying channel, but still hand things
>> piecemeal to gnutls which then produces multiple encryption packets.
>> Better is to directly use gnutls corking, which collects multiple
>> inputs into a single encryption packet.
>>
>> Signed-off-by: Eric Blake <ebl...@redhat.com>
>>
>> ---
>>
>> RFC because unfortunately, I'm not sure I like the results.  My test
>> case (using latest nbdkit.git) involves sending 10G of random data
>> over NBD using parallel writes (and doing nothing on the server end;
>> this is all about timing the data transfer):
>>
>> $ dd if=/dev/urandom of=rand bs=1M count=10k
>> $ time nbdkit -p 10810 --tls=require --tls-verify-peer \
>>    --tls-psk=/tmp/keys.psk --filter=stats null 10g statsfile=/dev/stderr \
>>    --run '~/qemu/qemu-img convert -f raw -W -n --target-image-opts \
>>      --object tls-creds-psk,id=tls0,endpoint=client,dir=/tmp,username=eblake 
>> \
>>      rand 
>> driver=nbd,server.type=inet,server.host=localhost,server.port=10810,tls-creds=tls0'
>>
>> Pre-patch, I measured:
>> real 0m34.536s
>> user 0m29.264s
>> sys  0m4.014s
>>
>> while post-patch, it changed to:
>> real 0m36.055s
>> user 0m27.685s
>> sys  0m10.138s
>>
>> Less time spent in user space, but for this particular qemu-img
>> behavior (writing 2M chunks at a time), gnutls is now uncorking huge
>> packets and the kernel is doing enough extra work that the overall
>> program actually takes longer. :(
> 
> I wonder if the problem is that we're hitting a tradeoff between
> time spent in encryption vs time spent in network I/O.
> 
> When we don't cork, the kernel has already sent the first packet
> during the time gnutls is burning CPU encrypting the second packet.
> 
> When we do cork gnutls has to encrypt both packets before the kernel
> can send anything.


Exactly. That's why my gut feel is that having only a binary cork is too
course, and we instead want something more like MSG_MORE. If the flag is
set, AND the current size is small, then cork; then on the next message,
we see that we are still corked, and decide to either uncork immediately
(current message is large, corking buys us nothing useful as we're going
to split the current message anyway) or uncork after appending the
current message (current message is small enough that keeping things
corked may still be a win).  Visually,

cork
send()
send()
uncork

is too late - you can't tell whether the second send would have
benefitted from staying corked after the first.  But:

send(MSG_MORE)
send()

is ideal; under the hood, we can translate it to:

send(MSG_MORE)
  gnutls_record_cork()
  gnutls_record_send()
send()
  if (size > threshold) {
    gnutls_record_uncork()
    gnutls_record_send()
  } else {
    gnutls_record_send()
    gnutls_record_uncork()
  }

So we really need a way to plumb a MSG_MORE flag for senders to use,
when they KNOW they will be sending back-to-back pieces and where the
first piece is short, but it is not yet obvious whether the second piece
is short or long.

MSG_MORE was lon the next message to go through the stack, if the
previous message next paccork for

> 
>> For smaller I/O patterns, the effects of corking are more likely to be
>> beneficial, but I don't have a ready test case to produce that pattern
>> (short of creating a guest running fio on a device backed by nbd).
> 
> I think it would probably be useful to see guest kernels with fio
> and definitely see results for something closer to sector sized
> I/O.
> 
> 2 MB chunks is pretty unrealistic for a guest workload where there
> will be lots of sector sized I/O. Sure QEMU / guest OS can merge
> sectors, but it still feels like most I/O is going to be much smaller
> than 2 MB.

I'll have to play with other benchmarking setups, and see if I can come
up with a reliable fio test.

> 
> 
>> diff --git a/io/channel.c b/io/channel.c
>> index 2a26c2a2c0b9..3510912465ac 100644
>> --- a/io/channel.c
>> +++ b/io/channel.c
>> @@ -379,7 +379,16 @@ void qio_channel_set_cork(QIOChannel *ioc,
>>      QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
>>
>>      if (klass->io_set_cork) {
>> -        klass->io_set_cork(ioc, enabled);
>> +        int r = klass->io_set_cork(ioc, enabled);
>> +
>> +        while (r == QIO_CHANNEL_ERR_BLOCK) {
>> +            if (qemu_in_coroutine()) {
>> +                qio_channel_yield(ioc, G_IO_OUT);
>> +            } else {
>> +                qio_channel_wait(ioc, G_IO_OUT);
>> +            }
>> +            r = klass->io_set_cork(ioc, enabled);
>> +        }
>>      }
> 
> Interesting - did you actually hit this EAGAIN behaviour ? I wouldn't
> have expected anything to be pending in gnutls that needs retry. 

Yes. When gnutls_record_cork() is called, NOTHING goes over the wire,
but instead everything gets appended to a realloc'd buffer; when you
finally gnutls_record_uncork() and are using a non-blocking socket, you
WILL hit EAGAIN if the amount of data being uncorked is larger than TCP
max segment size.

But again, I think that having qio_channel_set_cork() do the blocking
wait is too late, we really want a nicer interface where you can request
a per-message MSG_MORE, at which point crypto/tlssession.c handles the
cork/uncork logic as part of send()ing rather than making qio_uncork
have to block.  It's just that it's a lot more work (either all existing
callers need to adjust to a flags parameter, or we need new entry points
to expose a flag parameter to the few callers that care - right now, nbd
and sheepdog, although we may find others).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] RFC: qio: Improve corking of TLS sessions

Reply via email to