Hi, (Sorry for the slowness here.)
On Thu, Jan 20, 2022 at 12:45:02PM +0100, Kevin Wolf wrote: > Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben: [...] > > Hello, > > > > I'm working on a patch in nova to improve the time of file system > > freeze during live-snapshot on an instance with a local disk and I > > need your opinion about the solution I would propose. > > > > My issue during the live migration is the duration of file system > > freeze on an instance with a big local disk. [1] > > > > In my case instance have locally a disk (400Go) and the > > qemu-guest-agent is installed. > > > > Nova process like that: [2] > > dev = guest.get_block_device(disk_path) > > > > 1. guest.freeze_filesystems() > > 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True) > > 3. while not dev.is_job_complete() #wait for the end of mirroring (the > > issue is here, the waiting time depend on the size of the disk and > > the IOPS) > > 4. dev.abort_job() > > 5. guest.thaw_filesystems() > > So first of all, I have to do some translation of terminology which > seems to be different from what I am used to. First, here's the API mapping from Nova to QEMU: - rebase() is referring to a Nova's helper[b] method - ... which maps to libvirt's blockRebase() API - ... which in turns maps to QMP 'block-stream' And here's the broader QEMU and libvirt block API mapping (Eric/Peter correct me if I missed something): - QEMU 'block-commit' == blockCommit() API in libvirt - QEMU 'block-stream' == blockRebase() API in libvirt - QEMU 'drive-mirror' / 'blockdev-mirror' == blockCopy() API in libvirt - QEMU 'blockdev-backup' == backupBegin() API in libvirt > dev.rebase with copy=True seems to result in a mirror block job in QEMU? A detail: if you _only_ have copy=True, then you're right it makes a fully copy. But there's also the "shallow=True" and "reuse_ext=True". It's worth quoting (at least for me :-)) the official libvirt API docs[c] of virDomainRebase(): - "When flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a copy, where base must be the name of a new file to copy the chain to. By default, the copy will pull the entire source chain into the destination file" - ... "but if flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then only the top of the source chain will be copied (the source and destination have a common backing file)" - ... VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT means, "reuse an existing file which was pre-created with the correct format and metadata and sufficient size to hold the copy" - ... "In case the VIR_DOMAIN_BLOCK_REBASE_SHALLOW flag is used the pre-created file has to exhibit the same guest visible contents as the backing file of the original image. This allows a management app to pre-create files with relative backing file names, rather than the default of absolute backing file names; as a security precaution, you should generally only use reuse_ext with the shallow flag and a non-raw destination file" > So what you're calling a snapshot here doesn't seem to be a differential > snapshot (e.g. by adding a COW overlay), but a full copy that results in > two fully independent, standalone images. Is this right? Correct. The "live snapshot" in Nova has always been full copies, afraid. > Adding a bit more context, the whole process seems to be: > > 1. Create a qcow2 for the copy of the top layer that shares the backing > file with the active image. > > 2. Freeze guest filesystems > > 3. Create a full copy of the active layer (into the new qcow2 file) > a. Start a mirror job As noted above, it's a stream job. (Assuming libvirt's blockRebase() is still caling stream under the hood) > b. Wait for the mirror job to move to the READY state > c. Cancel the mirror job with force=false, i.e. complete the mirror > job without changing the active image of the VM Yeah, the "full copy of the active layer" is what libvirt calls "shallow copy" -- shallow=True in the rebase() call above > 4. Thaw the guest filesystems > > 5. qemu-img convert the copied top layer with its full backing chain > to a standalone raw image > > 6. Delete the temporary qcow2 copy > > > My proposition is to move the freeze after the end of mirroring and > > before the stop of mirroring. [3] I have tried on an instance and the > > last written file on the fs corresponds to the end of the mirror. > > Yes, you only need the freeze around the mirror job completion, that is, > step 3c above. Thanks for confirming; I always forget these freeze semantics. > However, the whole process seems very complicated for a rather simple > operation. A comment mentions that the dance with the temporary qcow2 > file is because of a (not further specified) bug in QEMU 1.3. I believe, > libvirt hasn't supported a QEMU version that old for a while, so is this > really still a valid reason? You're right -- you spotted code-rot in Nova here; the QEMU 1.3 code-comment gives it away (although it doesn't tell what the bug was). That part[a] of the Nova code in _live_snapshot() method can be refactored to use newer libvirt/QEMU APIs. That said, some of the "undefine a guest XML and the redefine it later" dance is because blockRebase() doesn't have a way to restart a copy job on guest restart while mirroring is still intact. So the trick when using libvirt's blockRebase() for a copy-job is to temporarily make the domain "transient" (the guest.delete_configuration() ... host.write_instance_config() calls in Nova). However, blockCopy() API has a _TRANSIENT_JOB that works around the limitation of blockRebase() Overall, wherever Nova can, it should completely use replace blockRebase() usage with one of the following APIs: - virDomainBlockCopy() -- blockCopy() -- this is already used by Nova today; but not consistently - virDomainBackupBegin() -- backupBegin() - virDomainBackupGetXMLDesc() -- backupGetXMLDesc() - virDomainCheckpointCreateXML() -- checkpointCreateXML() - virDomainCheckpointDelete() > But what I would actually have used is a backup block job, which makes > sure that the copy will contain the disk content at the point of time > when the block job was started rather than when it happened to complete. I agree, I'd prefer that too for the long term -- using the backup APIs above. I _think_ Pierre can get his problem solved with libvirt's blockCopy() API. Pierre, Nova has a wrapper for it, look at the usage of the copy() wrapper method[d] in Nova. [...] [a] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3166,L3190 [b] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L745,L767 [c] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase [d] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L729,#L743 -- /kashyap