On Mon, May 02, 2016 at 01:45:31PM +0300, Michael S. Tsirkin wrote: > On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote: > > On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > > > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > > > But, I > > > > > would worry first about a backend that crashes that it may corrupt the > > > > > VM memory too... > > > > > > > > Not quite sure I follow this. But, backend just touches the virtio > > > > related memory, so it will do no harm to the VM? > > > > > > It crashed so apparently it's not behaving as designed - > > > how can we be sure it does not harm the VM? > > > > Hi Michael, > > > > What I know so far, a crash might could corrupt the virtio memory in two > > ways: > > > > - vring_used_elem might be in stale state when we are in the middle of > > updating used vring. Say, we might have updated the "id" field, but > > leaving "len" untouched. > > > > - vring desc buff might also be in stale state, when we are copying > > data into it. > > > - a random write into VM memory due to backend bug corrupts it. > > > However, the two issues will not be real issue, as used->idx is not > > updated in both case. Thefore, those corrupted memory will not be > > processed by guest. So, no harm I'd say. Or, am I missing anything? > > Why is backend crashing? It shouldn't so maybe this means it's > memory is corrupt. That is the claim.
As far as virtio memory is not corrupted (or even corrupt in above ways), there would be no issue. But, yes, you made a good point: we make no guarantees that it's not the virtio memory corruption caused the crash. > > BTW, Michael, what's your thoughts on the whole reconnect stuff? > > > > --yliu > > My thoughts are that we need to split these patchsets up. > > There are several issues here: > > > 1. Graceful disconnect > - One should be able to do vmstop, disconnect, connect then vm start > This looks like a nice intermediate step. > - Why would we always assume it's always remote initiating the disconnect? > Monitor commands for disconnecting seem called for. A monitor command is a good suggestion. But I'm thinking why vmstop is necessary. Basically, a disconnect is as to a cable unplug to physical NIC; we don't need pause it before unplugging. > > 2. Unexpected disconnect > - If buffers are processed in order or flushed before socket close, > then on disconnect status can be recovered > from ring alone. If that is of interest, we might want to add > a protocol flag for that. F_DISCONNECT_FLUSH ? Sorry, what does this flag supposed to work here? > Without this, > only graceful disconnect or reset with guest's help can be supported. > - As Marc-André said, without graceful shutdown it is not enough to > handle ring state though. We must handle socket getting disconnected > in the middle of send/receive. While it's more work, > it does seem like a nice interface to support. Same as above, what the backend (or QEMU) can do for this case without the guest's (reset) help? > - I understand the concern that existing guests do not handle NEED_RESET > status. One way to fix this would be a new feature bit. If NEED_RESET not > handled, I could check the code by myself, but I'm thinking it might be trivial for you to answer my question: how do we know that NEED_RESET is not handled? > request hot-unplug instead. Same as above, may I know how to request a hot-unplug? > > 3. Running while disconnected > - At the moment, we wait on vm start for remote to connect, > if we support disconnecting backend without stopping > we probably should also support starting it without waiting > for connection Agreed. I guess that would anaswer my first question: we don't actually need to do vmstop before disconnect. --yliu > - Guests expect tx buffers to be used in a timely manner, thus: > - If vm is running we must use them in qemu, maybe discarding packets > in the process. > There already are commands for link down, I'm not sure there's value > in doing this automatically in qemu. > - Alternatively, we should also have an option to stop vm automatically (like > we do on > disk errors) to reduce number of dropped packets. > > 4. Reconnecting > - When acting as a server, we might want an option to go back to > listening state, but it should not be the only option, > there should be a monitor command for moving it back to > listening state. > - When acting as a client, auto-reconnect might be handy sometimes, but > should not be the only > option, there should be a way to manually request connect, possibly to > a different target, so a monitor command for re-connecting seems called for. > - We'll also need monitor events for disconnects so management knows it > must re-connect/restart listening. > - If we stopped VM, there could be an option to restart on reconnect. > > > -- > MST