On Mon, Oct 07, 2024 at 04:52:43PM -0400, Steven Sistare wrote: > On 10/7/2024 11:27 AM, Peter Xu wrote: > > On Mon, Sep 30, 2024 at 12:40:35PM -0700, Steve Sistare wrote: > > > Stop the vm earlier for cpr, to guarantee consistent device state when > > > CPR state is saved. > > > > Could you add some more info on why this order matters? > > > > E.g., qmp_migrate should switch migration state machine to SETUP, while > > this path holds BQL, I think it means there's no way devices got hot added > > concurrently of the whole process. > > > > Would other things change in the cpr states (name, fd, etc.)? It'll be > > great to mention these details in the commit message. > > Because of the new cpr-state save operation needed by this mode, > I created this patch to be future proof. Performing a save operation while > the machine is running is asking for trouble. But right now, I am not aware > of any specific issues. > > Later in the "tap and vhost" series there is another reason to stop the vm > here and > save cpr state, because the devices must be stopped in old qemu before they > are initialized in new qemu. If you are curious, see the 2 patches I attached > to the email at > > https://lore.kernel.org/qemu-devel/fa95c40d-b5e5-41eb-bba7-7842bca2f...@oracle.com/ > But, that has nothing to do with the contents of cpr state.
Then I suggest we leave this patch to the vhost/tap series, then please document clearly in the commit mesasge on why this is needed. Linking to that discussion thread could work too. Side note: I saw you have MIG_EVENT_PRECOPY_CPR_SETUP in you own tree, I wonder whether we could reuse MIG_EVENT_PRECOPY_SETUP by moving it earlier in qmp_migrate(). After all CPR-* notifiers are already registered separately with the list of migration_state_notifiers[], so I suppose it'll service the same purpose. But we can discuss that later. -- Peter Xu