Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on write error

Markus Armbruster Fri, 24 Feb 2017 05:44:43 -0800

Marc-André Lureau <marcandre.lur...@gmail.com> writes:

> Hi
>
> On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <d...@openvz.org> wrote:
>
>> From: Anton Nefedov <anton.nefe...@virtuozzo.com>
>>
>> Socket backend read handler should normally perform a disconnect, however
>> the read handler may not get a chance to run if the frontend is not ready
>> (qemu_chr_be_can_write() == 0).
>>
>> This means that in virtio-serial frontend case if
>>  - the host has disconnected (giving EPIPE on socket write)
>>  - and the guest has disconnected (-> frontend not ready -> backend
>>    will not read)
>>  - and there is still data (frontend->backend) to flush (has to be a really
>>    tricky timing but nevertheless, we have observed the case in production)
>>
>> This results in virtio-serial trying to flush this data continiously
>> forming
>> a busy loop.
>>
>> Solution: react on write error in the socket write handler.
>> errno is not reliable after qio_channel_writev_full(), so we may not get
>> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
>> io_channel_send_full() converts to errno EAGAIN.
>> We must not disconnect right away though, there still may be data to read
>> (see 4bf1cb0).
>>
>> Signed-off-by: Anton Nefedov <anton.nefe...@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <d...@openvz.org>
>> CC: Paolo Bonzini <pbonz...@redhat.com>
>> CC: Daniel P. Berrange <berra...@redhat.com>
>> CC: Marc-André Lureau <marcandre.lur...@redhat.com>
>> ---
>> Changes from v1:
>> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>>   except EAGAIN
>>
>>  chardev/char-socket.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
>> index 4068dc5..e2137aa 100644
>> --- a/chardev/char-socket.c
>> +++ b/chardev/char-socket.c
>> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>>                                 GIOCondition cond,
>>                                 void *opaque);
>>
>> +static int tcp_chr_read_poll(void *opaque);
>> +static void tcp_chr_disconnect(CharDriverState *chr);
>> +
>>  /* Called with chr_write_lock held.  */
>>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>>  {
>> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
>> *buf, int len)
>>              s->write_msgfds_num = 0;
>>          }
>>
>> +        if (ret < 0 && errno != EAGAIN) {
>> +            if (tcp_chr_read_poll(chr) <= 0) {
>> +                tcp_chr_disconnect(chr);
>> +                return len;
>>
>
> This change breaks a number of assumption in vhost-user code. Until now, a
> vhost-user function assumed that dev->vhost_ops would remain as long as the
> function is running, so it may call vhost_ops callbacks several time, which
> may eventually fail to do io, but no crash would occur. The disconnection
> would be handled later with the HUP handler. Now, vhost_ops may be cleared
> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
> net_vhost_user_event). This can be randomly reproduced with vhost-user-test
> -p /x86_64/vhost-user/flags-mismatch/subprocess  (broken pipe, qemu crashes)


It hangs for me with annoying frequency.  I'm glad you found the culprit!

> I am trying to fix this, but it isn't simple (races, many code paths etc),
> so I just wanted to inform you about it.

Can we revert this patch until we have a fix?

Re: [Qemu-devel] [PATCH v2 1/1] qemu-char: socket backend: disconnect on write error

Reply via email to