About virsh(1) and Postcopy migration

2024-08-29 Thread Prasad Pandit
Hello,

* virsh(1) offers multiple options to initiate Postcopy migration:

1) virsh migrate --postcopy --postcopy-after-precopy
2) virsh migrate --postcopy + virsh migrate-postcopy
3) virsh migrate --postcopy --timeout  --timeout-postcopy

When Postcopy migration is invoked via options (2) or (3) above, the migrated 
guest on the destination host hangs sometimes. But such a hang is not 
reproducible with option (1) above.

* When using option (1) above, libvirtd(8) waits for the first pass of pre-copy 
to finish before enabling postcopy migration.


* Does the same waiting happen when using options (2) and (3) above?
===
2024-07-24 14:16:27.448+: msg={"execute":"migrate"
2024-07-24 14:16:29.318+: msg={"execute":"migrate-start-postcopy"
2024-07-24 14:28:39.737+: msg={"execute":"migrate"
2024-07-24 14:28:41.119+: msg={"execute":"migrate-start-postcopy"
2024-07-24 14:44:11.684+: msg={"execute":"migrate"
2024-07-24 14:44:12.835+: msg={"execute":"migrate-start-postcopy"
2024-07-24 14:48:00.675+: msg={"execute":"migrate"
2024-07-24 14:48:02.319+: msg={"execute":"migrate-start-postcopy"
2024-07-24 15:03:36.110+: msg={"execute":"migrate"
2024-07-24 15:03:37.341+: msg={"execute":"migrate-start-postcopy"
2024-07-24 16:05:25.602+: msg={"execute":"migrate"
2024-07-24 16:05:26.756+: msg={"execute":"migrate-start-postcopy"
===

* While running migration tests with options (2) and (3) above, switch to 
postcopy appears to happen within 2 seconds of starting migration.
  - Is that reasonable time to switch from pre-copy to postcopy?
  - Is there an ideal time to wait before switching to postcopy?

* The feature page below suggests to wait until one cycle of RAM migration has 
completed
  -> https://wiki.qemu.org/Features/PostCopyLiveMigration 


* I'd much appreciate any clarification/confirmation about this.

Thank you.
---
  -Prasad


Re: About virsh(1) and Postcopy migration

2024-08-29 Thread Jiri Denemark
On Thu, Aug 29, 2024 at 10:11:05 +, Prasad Pandit wrote:
> Hello,
> 
> * virsh(1) offers multiple options to initiate Postcopy migration:
> 
> 1) virsh migrate --postcopy --postcopy-after-precopy
> 2) virsh migrate --postcopy + virsh migrate-postcopy
> 3) virsh migrate --postcopy --timeout  --timeout-postcopy
> 
> When Postcopy migration is invoked via options (2) or (3) above, the migrated 
> guest on the destination host hangs sometimes. But such a hang is not 
> reproducible with option (1) above.
> 
> * When using option (1) above, libvirtd(8) waits for the first pass of 
> pre-copy to finish before enabling postcopy migration.

Right.

> * Does the same waiting happen when using options (2) and (3) above?

No. The explicit "virsh migrate-postcopy" request expects the user to
decide when to switch to post-copy by monitoring the migration.

> ===
> 2024-07-24 14:16:27.448+: msg={"execute":"migrate"
> 2024-07-24 14:16:29.318+: msg={"execute":"migrate-start-postcopy"
> 2024-07-24 14:28:39.737+: msg={"execute":"migrate"
> 2024-07-24 14:28:41.119+: msg={"execute":"migrate-start-postcopy"
> 2024-07-24 14:44:11.684+: msg={"execute":"migrate"
> 2024-07-24 14:44:12.835+: msg={"execute":"migrate-start-postcopy"
> 2024-07-24 14:48:00.675+: msg={"execute":"migrate"
> 2024-07-24 14:48:02.319+: msg={"execute":"migrate-start-postcopy"
> 2024-07-24 15:03:36.110+: msg={"execute":"migrate"
> 2024-07-24 15:03:37.341+: msg={"execute":"migrate-start-postcopy"
> 2024-07-24 16:05:25.602+: msg={"execute":"migrate"
> 2024-07-24 16:05:26.756+: msg={"execute":"migrate-start-postcopy"
> ===
> 
> * While running migration tests with options (2) and (3) above, switch to 
> postcopy appears to happen within 2 seconds of starting migration.
>   - Is that reasonable time to switch from pre-copy to postcopy?

No, that's not very reasonable. Basically every memory page access would
have to be delayed until the page is transferred from the source host.

>   - Is there an ideal time to wait before switching to postcopy?

Not really. As the name suggests this is meant as a timeout, i.e.,
switch to post-copy if pre-copy migration is taking too long and thus is
unlikely to ever converge. So logically the timeout should be long
enough to give pre-copy migration to do its job. In this case, switching
to post-copy is an alternative approach to CPU throttling for helping
migration to converge.

> * The feature page below suggests to wait until one cycle of RAM migration 
> has completed
>   -> https://wiki.qemu.org/Features/PostCopyLiveMigration 

Right, that's definitely a good approach as only memory pages that
changed during migration will have to be transferred from the source.

Jirka