Hi ----- Original Message ----- > > "role" was designed to only migrate the master. Ability to migrate a pool > > of > > peer would be a significant new feature. I am not aware of such request. > > I see. But how is this supposed to work? > > Before migration: one master and N peers connected to the server on host > A, N>=0. > > After migration: one master and N' of the N peers connected to the > server on host B, N>=N'>=0, and the remaining N-N' peers still on host A > with their ivshmem device unplugged. > > How would I do this even for N'==0? I can't see how I'm supposted to > connect the migrated shared memory to a server on host B.
I am not sure I understand you. You can't migrate the peers. As I said, "ability to migrate a pool of peer would be a significant new feature". > >> Did you try chardev=...,size=S, where S is larger than what the server > >> provides? > > > > It will fall in check_shm_size(). > > Yes. Called from ivshmem_read(). ivshmem_read() will then complain to > stderr, close the file descriptor we got from the server and leave the > BAR unmapped. My question is how guests deal with that state. Could be > anything from "detect the device is broken and fence it" to "kernel > panic". > Whatever it is, it could easily also happen if the guest wins the race > with the server and tries to use the device before it successfully got > its shared memory from the server. It's nothing bad from what I remember on qemu side. On guest side, it depends how your driver/user is implemented I suppose. To me, it's not a normal case, and the error should be enough to diagnose it. > 1. Unless the guest can reliably detect the doorbell feature, the > doorbell feature is *broken*. > > As far as I can tell, a device with a doorbell behaves exactly like > one without a doorbell until it got its shared memory from the > server. If that's correct, then doorbell detection is inherently > racy. There are many ways you can do synchronization. In test_ivshmem_server(), I trivially wait for the membar with the required signature to be mapped. Verify that peers have different ids, and then start using the doorbell. That seems good enough. > The only way to fix this in documentation is "broken, do not use". It works fine in the tests. Feel free to point out races or other issues. > The maximally compatible way to fix this in code is to ensure the > guest can't read register IVPosition before we got the shared memory > from the server. We can make realize wait, or the read. The latter > is probably an even worse idea. > > An easier way to fix it in code is splitting up the device, so guests > can simply check the PCI device ID to figure out whether they have > one with a doorbell. > > An even easier way is dropping the doorbell feature outright. > > 2. The UI is crap. > > We can fix this by rejecting nonsensical option combinations. Yes, I think it's the simplest way for now. I dislike having to break stuff when you can overcome it with a few more checks. > However, the result will be more complex than splitting the device in > two so that nonsensical options combinations are simply impossible. I disagree, adding more checks will add a few dozen lines with minimal impact. Splitting things will break stuff and require significant effort to share correctly what can be shared etc. > If we need to split it anyway to fix the doorbell, we can clean up > the UI at next to no cost. I don't think the doorbell is broken.