Hi,

We had some sporadic nbd-stop error when trying to migrate vm with rbd storage 
+ writeback between 2 differents cluster:
(This is without my other targetcpu patch)


2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - 
failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' 
failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)


I'm not sure, maybe it's related to writeback, because it never happend with a 
fresh started vm, but vms running since some time can trigger this.
(I'm not sure, maybe nbd need to flush pending datas in cache ?)


Currently, the tunnel command have a 30s timeout, but the qmp command is only 
at 5s.
Also the tunnel v2 command don't have any eval, so the migration abort keeping 
both source && target vm locked.
unlocking target vm and resume it manually is working, so it really seem to be 
a too low timeout.


Alexandre Derumier (2):
  nbd_stop: increase timeout to 25s
  migration: add missing eval on nbdstop with tunnel v2.

 PVE/QemuMigrate.pm | 8 +++++++-
 PVE/QemuServer.pm  | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.39.2


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to