On (Mon) Dec 06 2010 [13:23:50], Paul Brook wrote: > > Sure. My proposal is for qemu_chr_write() to succeed all the time. If > > the backend can block, and the caller can handle it, it can get a > > -EAGAIN (or WSAEWOULDBLOCK) return. When the backend becomes writable, > > the chardev can call the ->writes_unblocked() callback for that caller. > > Individual callers need not bother about re-submitting partial writes, > > since the buffering can be done in common code in one place > > (qemu-char.c). > > That's only OK if you assume it's OK to buffer all but one byte of the > transmit request.
Is that a fair assumption to make? > > > I'm extremely reluctant to add a new layer of buffering that is not > > > visible to ether the kernel or the device. In practice we still need to > > > be able to split oversize requests, so handling small split requests > > > should be pretty much free. > > > > So do you propose to propagate this -EAGAIN error all the way to the > > guest? That won't work for older virtio guests (for virtio, host and > > guest changes will be needed). > > Huh? That doesn't make any sense. The guest is already using an asyncronous > submission mechanism. > With a virtio device the status of a buffer becomes indeterminate once it has > been placed into the queue. Only when it is removed from the queue do we know > that it has completed. The device may transfer all or part of that buffer at > any time in between. Yes, just making sure this isn't what you're talking about. > > > > > b) Store the data on the side somewhere. Tell the device all data has > > > > > been sent, and arrange for this data to be flushed before accepting > > > > > any more data. This is bad because it allows the guest to allocate > > > > > arbitrarily large[1] buffers on the host. i.e. a fairly easily > > > > > exploitable DoS attack. > > > > > > > > With virtio-serial, this is what's in use. The buffer is limited to > > > > the length of the vq (which is a compile-time constant) and there also > > > > is the virtio_serial_throttle_port() call that tells the guest to not > > > > send any more data to the host till the char layer indicates it's OK > > > > to send more data. > > > > > > No. > > > > > > Firstly you're assuming all users are virtio based. That may be all you > > > care about, but is not acceptable if you want to get this code merged. > > > > OK, but it's assumed that once a -EAGAIN is returned, the caller will > > take appropriate actions to restrict the data sent. Especially, > > send_all has: > > > > if (chr->write_blocked) { > > /* > > * We don't handle this situation: the caller should not send > > * us data while we're blocked. > > * > > * We could buffer this data here but that'll only encourage > > * bad behaviour on part of the callers. > > > */ > > return -1; > > } > > If you're being draconian about this then do it properly and make this an > abort. Otherwise return -EAGAIN. Returning a random error seems like the > worst > of both worlds. Your code results in spurious guest errors (or lost data) > with real indication that this is actually a qemu device emulation bug. Yeah; abort() is a good idea. (BTW the previous send_all() routine just returned -1 for anything other than -EINTR and -EAGAIN, so I went for consistency.) > > > Secondly, the virtqueue only restricts the number of direct ring buffer > > > entries. It does not restrict the quantity of data each ring entry points > > > to. > > > > But that's entirely in guest memory, so it's limited to the amount of > > RAM that has been allocated to the guest. > > Exactly. The guest can cause ram_size * nr_ports of additional host memory to > be allocated. Not acceptable. OK -- so this is how it adds up: - guest vq - virtio-serial-bus converts iov to buf - qemu-char stores the buf in case it wasn't able to send but then, since it's all async, we have: - virtio-serial-bus frees the buf - guest deletes the buf and removes it from the vq So what's left is only the data in qemu-char's buf. Now this can be (buf_size - 1) * nr_ports in the worst case. The alternative would be to keep using the guest buffer till all's done, but then that depends on qemu getting async support - separating out the qemu_chr_write() into a separate thread and allowing vcpu and chr io operations to be run simultaneously. Amit