[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents

Michael Roth Fri, 25 Feb 2011 12:26:18 -0800

On 02/24/2011 06:48 AM, Amit Shah wrote:

On (Wed) 23 Feb 2011 [08:31:52], Michael Roth wrote:

On 02/22/2011 10:59 PM, Amit Shah wrote:

On (Tue) 22 Feb 2011 [16:40:55], Michael Roth wrote:

If something in the guest is attempting to read/write from the
virtio-serial device, and nothing is connected to virtio-serial's
host character device (say, a socket)


1. writes will block until something connect()s, at which point the
write will succeed

2. reads will always return 0 until something connect()s, at which
point the reads will block until there's data

This makes it difficult (impossible?) to implement the notion of
connect/disconnect or open/close over virtio-serial without layering
another protocol on top using hackish things like length-encoded
payloads or sentinel values to determine the end of one
RPC/request/response/session and the start of the next.

For instance, if the host side disconnects, then reconnects before
we read(), we may never get the read()=0, and our FD remains valid.
Whereas with a tcp/unix socket our FD is no longer valid, and the
read()=0 is an event we can check for at any point after the other
end does a close/disconnect.


There's SIGIO support, so host connect-disconnect notifications can be
caught via the signal.


I recall looking into this at some point....but don't we get a SIGIO
for read/write-ability in general?


I don't get you -- the virtio_console driver emits the SIGIO signal
only when the host side connects or disconnects.  See

http://www.linux-kvm.org/page/Virtio-serial_API

So whenever you receive a SIGIO, poll() in the signal handler for all
fds of interest and whichever has POLLIN set is writable.  Whichever
has POLLHUP set is not.  If you maintain previous state of the fd
(before signal), you can figure out if something happened on the host
side.

I tried this on RHEL6+rhn updates but the O_ASYNC flag doesn't seem tobe supported. Has this been backported?

Either way, it seems we can still lose the disconnect event/poll statechange if the host reconnects before the signal is delivered. So SIGIOin an application would need to be reserved for absolutely 2 things: ahost connect or disconnect (distinguishing between the 2 may not be soimportant, we could treat either as the previous session having beenclosed). Which limits the application to only having 1 O_ASYNC FD openat a time.

But even if we do that, it seems like there might still be a smallwindow where the application could read/write data intended for theprevious connection before the signal handler is invoked. Not too sureon that point though. Assuming this isn't the case...it could work. Butwhat about windows guests?

So you still need some way
differentiate, say, readability from a disconnect/EOF, and the
read()=0 that could determine this is still racing with host-side
reconnects.

Also, nonblocking reads/writes will return -EPIPE if the host-side
connection is not up.


But we still essentially need to poll() for a host-side disconnected
state, which is still racy since they may reconnect before we've
done a read/write that would've generated the -EPIPE. It seems like
what we really need is for the FD to be invalid from that point
forward.


This would go against (or abuse) a chardev interface.  It would
effectively treat a host-side port close as a hot-unplug event.

Well, not a complete hot-unplug. The port would still be there, we'djust need to re-open it after a read()=0

Personally I'm not necessarily advocating we change the defaultbehavior, but couldn't we support this as a separate mode?


-device virtserialport,inv_fd_on_host_close=1

or something along that line?

Also, I focused more on the guest-side connect/disconnect detection,
but as Anthony mentioned I think the host side shares similar
limitations as well. AFAIK once we connect to the chardev that FD
remains valid until the connected process closes it, and so races
with the guest side on detecting connect/disconnect events in a
similar manner. For the host side it looks like virtio-console has
guest_close/guest_open callbacks already that we could potentially
use...seems like it's just a matter of tying them to the chardev...
basically having virtio-serial's guest_close() result in a close()
on the corresponding chardev connection's FD.


Yes, this could be used.

However, the problem with that will be that the chardev can't be
opened again (AFAIR) and a new chardev will have to be used.

Hmm...yeah I was thinking more specifically about the socket chardev,where we can leave the listen_fd alone but close anything we'veaccept()'d prior to a guest-side disconnect. But isn't that enough? Justadd this option for chardevs where this actually makes sense? For instance:


-chardev socket,inv_fd_on_guest_close=1

Although, this wouldn't make sense if we're using the chardev foranything other than virtio-serial...so that flag makes more sense as avirtio-serial flag....but that virtio-serial flag only makes sense forparticular chardevs...

I'm not sure what a good way to do this would be honestly...either wouldwork it seems...but neither seems very intuitive.


So if this is done on both the sides, the race will be eliminated but
the expectation that a chardev port is just a serial port will be
broken and we'll try to bake in some connection layer on top of it.
That wasn't the original idea.  We could extend this, but a better way
to achieve this could be a library on either side to abstract these
details off.

As far as implementing a library...I tried layering something on topwith previous applications to provide connect/disconnect detection overvirtio-serial. Basically by proxing data to/from forwarding sockets oneither side of the virtio-serial channel, and packetizing thevirtio-serial stream into fixed-sized or length-encoded data packets andcontrol packets that induced the kind of invalidate-fd-on-remote-closesemantics we're discussing.

It seems to work ok in practice, but we run into trouble when thedaemons managing the stream go down...mainly due to similar problems asthose we're trying to address with the aforementioned changes. (forinstance...forwarding daemon in guest sends partial packet, goes down,starts back up, sends control packet...but packet stream is out of syncnow. only way around i've found around this so far is reserving bytes inthe transport as sentinel values we can use to re-sync the stream..butthen we have to do stuff like base64-encode binary data, which can beexpensive both in terms of channel bandwidth as well as cpu)

If that's the kind of approach we'd need to take, I think optional flagslike the ones mentioned above, or some other change to the transport,are still required. Unless there's a better approach to handling thisoutside of the actual transport that I'm missing?


                 Amit

[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents

Reply via email to