On Sat, Jan 12, 2019 at 12:50:12PM +0800, Yongji Xie wrote: > On Fri, 11 Jan 2019 at 23:53, Stefan Hajnoczi <stefa...@gmail.com> wrote: > > > > On Thu, Jan 10, 2019 at 06:59:27PM +0800, Yongji Xie wrote: > > > On Thu, 10 Jan 2019 at 18:25, Stefan Hajnoczi <stefa...@gmail.com> wrote: > > > > > > > > On Wed, Jan 09, 2019 at 07:27:21PM +0800, elohi...@gmail.com wrote: > > > > > From: Xie Yongji <xieyon...@baidu.com> > > > > > > > > > > This patchset is aimed at supporting qemu to reconnect > > > > > vhost-user-blk backend after vhost-user-blk backend crash or > > > > > restart. > > > > > > > > > > The patch 1 uses exisiting wait/nowait options to make QEMU not > > > > > do a connect on client sockets during initialization of the chardev. > > > > > > > > > > The patch 2 introduces two new messages VHOST_USER_GET_INFLIGHT_FD > > > > > and VHOST_USER_SET_INFLIGHT_FD to support providing shared > > > > > memory to backend. > > > > > > > > Can you describe the problem that the inflight I/O shared memory region > > > > solves? > > > > > > > > > > The backend need to get inflight I/O and do I/O replay after restart. > > > Now we can only get used_idx in used_ring. It's not enough. Because we > > > might not process descriptors in the same order which they have been > > > made available sometimes. A simple example: > > > https://patchwork.kernel.org/cover/10715305/#22375607. So we need a > > > shared memory to track inflight I/O. > > > > The inflight shared memory region is something that needs to be > > discussed in detail to make sure it is correct not just for vhost-blk > > but also for other vhost-user devices in the future. > > > > Please expand the protocol documentation so that someone can implement > > this feature without looking at the code. Explain the reason for the > > inflight shared memory region and how exactly it used. > > > > OK, will do it in v5. > > > After a quick look at the shared memory region documentation I have a > > few questions: > > > > 1. The avail ring only contains "head" indices, each of which may chain > > non-head descriptors. Does the inflight memory region track only the > > heads? > > > > Yes, we only track the head in inflight region.
Okay, thanks. That is useful information to include in the specification. > > 2. Does the inflight shared memory region preserve avail ring order? > > For example, if the guest submitted 5 2 4, will the new vhost-user > > backend keep that order or does it see 2 4 5? > > > > Now we don't support to resubmit I/O in order. But I think we can add > a timestamp > for each inflight descriptor in shared memory to achieve that. Great, the reason I think that feature is interesting is that other device types may need to preserve order. It depends on the exact meaning of the device's requests... > > I'm concerned that this approach to device recovery is invasive and hard > > to test. Instead I would use VIRTIO's Device Status Field > > DEVICE_NEEDS_RESET bit to tell the guest driver that a reset is > > necessary. This is more disruptive - drivers either have to resubmit or > > fail I/O with EIO - but it's also simple and more likely to work > > correctly (it only needs to be implemented correctly in the guest > > driver, not in the many available vhost-user backend implementations). > > > > So you mean adding one way to notify guest to resubmit inflight I/O. I > think it's a good idea. But would it be more flexible to implement > this in backend. We can support old guest. And it will be easy to fix > bug or add feature. There are trade-offs with either approach. In the long run I think it's beneficial minimize non-trivial logic in vhost-user backends. There will be more vhost-user backend implementations and therefore more bugs if we put the logic into the backend. This is why I think a simple mechanism for marking the device as needing a reset will be more reliable and less trouble. Stefan
signature.asc
Description: PGP signature