Marc-André Lureau <mlur...@redhat.com> writes: > Hi > > ----- Original Message ----- >> > "role" was designed to only migrate the master. Ability to migrate a pool >> > of >> > peer would be a significant new feature. I am not aware of such request. >> >> I see. But how is this supposed to work? >> >> Before migration: one master and N peers connected to the server on host >> A, N>=0. >> >> After migration: one master and N' of the N peers connected to the >> server on host B, N>=N'>=0, and the remaining N-N' peers still on host A >> with their ivshmem device unplugged. >> >> How would I do this even for N'==0? I can't see how I'm supposted to >> connect the migrated shared memory to a server on host B. > > I am not sure I understand you. > > You can't migrate the peers.
Then explain the case N'=0 to me: how can you migrate the master so that it's connected to a server afterwards? > As I said, "ability to migrate a pool of peer would be a significant > new feature". > >> >> Did you try chardev=...,size=S, where S is larger than what the server >> >> provides? >> > >> > It will fall in check_shm_size(). >> >> Yes. Called from ivshmem_read(). ivshmem_read() will then complain to >> stderr, close the file descriptor we got from the server and leave the >> BAR unmapped. My question is how guests deal with that state. Could be >> anything from "detect the device is broken and fence it" to "kernel >> panic". >> Whatever it is, it could easily also happen if the guest wins the race >> with the server and tries to use the device before it successfully got >> its shared memory from the server. > > It's nothing bad from what I remember on qemu side. On guest side, it > depends how your driver/user is implemented I suppose. To me, it's not > a normal case, and the error should be enough to diagnose it. > >> 1. Unless the guest can reliably detect the doorbell feature, the >> doorbell feature is *broken*. >> >> As far as I can tell, a device with a doorbell behaves exactly like >> one without a doorbell until it got its shared memory from the >> server. If that's correct, then doorbell detection is inherently >> racy. > > There are many ways you can do synchronization. > In test_ivshmem_server(), I trivially wait for the membar with the > required signature to be mapped. Verify that peers have different ids, > and then start using the doorbell. That seems good enough. > >> The only way to fix this in documentation is "broken, do not use". > > It works fine in the tests. Feel free to point out races or other issues. I think I did: doorbell detection is inherently racy. If you think it isn't, please refute my reasoning. >> The maximally compatible way to fix this in code is to ensure the >> guest can't read register IVPosition before we got the shared memory >> from the server. We can make realize wait, or the read. The latter >> is probably an even worse idea. >> >> An easier way to fix it in code is splitting up the device, so guests >> can simply check the PCI device ID to figure out whether they have >> one with a doorbell. >> >> An even easier way is dropping the doorbell feature outright. >> >> 2. The UI is crap. >> >> We can fix this by rejecting nonsensical option combinations. > > Yes, I think it's the simplest way for now. I dislike having to break > stuff when you can overcome it with a few more checks. > >> However, the result will be more complex than splitting the device in >> two so that nonsensical options combinations are simply impossible. > > I disagree, adding more checks will add a few dozen lines with minimal > impact. Splitting things will break stuff and require significant > effort to share correctly what can be shared etc. > >> If we need to split it anyway to fix the doorbell, we can clean up >> the UI at next to no cost. > > I don't think the doorbell is broken. If it's not broken, please explain to me how the guest should find out whether its ivshmem device sports a doorbell.