Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Alexandre DERUMIER Fri, 28 Jul 2017 02:22:29 -0700

>>I wonder wether reusing (/extending) the existing SSH tunnel for the 
>>commands we run on the target node might reduce the overhead as well? 
>>for cleanup in error cases opening a new connection is probably still 
>>advisable.


yes maybe. Don't known if the time is to fork the qm process, or established 
the ssh tunnel or get response. I'll try to add timer on this.

another idea, why not use https api call through pveproxy directly ? 




I have verified with qmp status,

without pvesr call , around 20ms

2017-07-28 10:24:45,184 -- VM status: paused (inmigrate)
2017-07-28 10:24:45,208 -- VM status: running


with pvesr call , around 4s

2017-07-28 10:38:28,711 -- VM status: paused (inmigrate)
2017-07-28 10:38:28,745 -- VM status: paused
2017-07-28 10:38:28,799 -- VM status: paused
2017-07-28 10:38:28,818 -- VM status: paused
2017-07-28 10:38:28,837 -- VM status: paused
....
2017-07-28 10:38:33,912 -- VM status: running







----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbich...@proxmox.com>
À: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 10:46:55
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to 
several seconds (compared to PVE 4)

On Fri, Jul 28, 2017 at 10:09:55AM +0200, Alexandre DERUMIER wrote: 
> 
> I have added some timer and done a migration without storage replication 
> 
> ->main migration loop : 150ms increase. (it's lower if I put a usleep of 1ms) 
> 
> 2017-07-28 10:00:10 transfer_replication_state: 1.436832 
> 2017-07-28 10:00:10 move config: 0.001174 
> 2017-07-28 10:00:10 switch_replication_job_target: 0.003125 
> 2017-07-28 10:00:12 qm resume: 1.634583 -> (this is the time from source, to 
> get the response, not sure how many time it take exactly on remote) 

I guess only marginally less on the target until the VM is actually 
resumed. 

> 
> seem to be transfer_replication_state which call 
> my $cmd = [ @{$self->{rem_ssh}}, 'pvesr', 'set-state', $self->{vmid}, 
> $state]; 
> 
> 
> I think calling remote qm command take some time to get response. 
> Note that I don't use pvesr, so I think we should bypass theses commands if 
> not needed. 
> 

yes, checking whether a state / job exists earlier on, and only 
transferring state and switching the direction conditionally if needed 
would be an improvement for sure. 

I wonder wether reusing (/extending) the existing SSH tunnel for the 
commands we run on the target node might reduce the overhead as well? 
for cleanup in error cases opening a new connection is probably still 
advisable. 

those two improvements might get us into the <1s range again, without 
sacrificing consistency on the way. 

_______________________________________________ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Reply via email to