Hi, I have notice that we already send a guest-ping in
PVE::QemuServer::qga_check_running($vmid); sub qga_check_running { my ($vmid) = @_; eval { vm_mon_cmd($vmid, "guest-ping", timeout => 3); }; if ($@) { warn "Qemu Guest Agent is not running - $@"; return 0; } return 1; } (already use in vzdump and other parts). ex: if ($self->{vmlist}->{$vmid}->{agent} && $vm_is_running) { $agent_running = PVE::QemuServer::qga_check_running($vmid); } if ($agent_running){ eval { PVE::QemuServer::vm_mon_cmd($vmid, "guest-fsfreeze-freeze"); }; if (my $err = $@) { $self->logerr($err); } } My problem is that I'm using "qm agent " , and we don't have this ping /PVE/API2/Qemu/Agent.pm die "No Qemu Guest Agent\n" if !defined($conf->{agent}); die "VM $vmid is not running\n" if !PVE::QemuServer::check_running($vmid); my $cmd = $param->{command} // $command; my $res = PVE::QemuServer::vm_mon_cmd($vmid, "guest-$cmd"); I'll send a patch ----- Mail original ----- De: "aderumier" <aderum...@odiso.com> À: "Thomas Lamprecht" <t.lampre...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mardi 22 Mai 2018 09:59:37 Objet: Re: [pve-devel] pvedaemon hanging because of qga retry >>But, AFAICT, this isn't your real concern yes, indeed. it's normal to have a high timeout for fsfreeze (libvirt also do it). >>you propose to make a "simple" >>qmp call, be it through the VSERPORT_CHANGE, or a backward compatible ping, >>where we know that the time needed to answer cannot be that high, as no IO >>is involved. exactly ! >> That could be done with a relative small timeout and if that >>fails we know that it doesn't makes sense to make the fsfreeze call with it >>- reasonable - high timeout. If I understood correctly? yes ! ----- Mail original ----- De: "Thomas Lamprecht" <t.lampre...@proxmox.com> À: "pve-devel" <pve-devel@pve.proxmox.com>, "aderumier" <aderum...@odiso.com>, "dietmar" <diet...@proxmox.com> Envoyé: Mardi 22 Mai 2018 09:56:13 Objet: Re: [pve-devel] pvedaemon hanging because of qga retry On 5/21/18 3:02 PM, Alexandre DERUMIER wrote: >>> Seems this patch does not solve the 'high load' problem at all? > > I can't reproduce this high load, so I can't say. For the high fsfreeze timeout my commit message should provide some context: > commit cfb7a70165199eca25f92272490c863551efcd89 > Author: Thomas Lamprecht <t.lampre...@proxmox.com> > Date: Wed Nov 23 11:40:41 2016 +0100 > > increase timeout from guest-fsfreeze-freeze > > The qmp command 'guest-fsfreeze-freeze' issues in linux a FIFREEZE > ioctl call on all mounted guest FS. > This ioctl call locks the filesystem and gets it into an consistent > state. For this all caches must be synced after blocking new writes > to the FS, which may need a relative long time, especially under high > IO load on the backing storage. > > In windows a VSS (Volume Shadow Copy Service) request_freeze will > issued. As of the closed Windows nature the exact mechanisms cannot > be checked but some microsoft blog posts and other forum post suggest > that it should return fast but certain workloads can still trigger a > long delay resulting an similar problems. > > Thus try to minimize the error probability and increase the timeout > significantly. > We use 60 minutes as timeout as this seems a limit which should not > get trespassed in a somewhat healthy system. > > See: > https://forum.proxmox.com/threads/22192/ > > see the 'freeze_super' and 'thaw_super' function in fs/super.c from > the linux kernel tree for more details on the freeze behavior in > Linux guests. > My main concern is to not wait for a down daemon. (which will never > response). > > If we can be sure that daemon is running, with high load, simply wait for a > response with a longer timeout. > > But, AFAICT, this isn't your real concern, you propose to make a "simple" qmp call, be it through the VSERPORT_CHANGE, or a backward compatible ping, where we know that the time needed to answer cannot be that high, as no IO is involved. That could be done with a relative small timeout and if that fails we know that it doesn't makes sense to make the fsfreeze call with it - reasonable - high timeout. If I understood correctly? > > > ----- Mail original ----- > De: "dietmar" <diet...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Lundi 21 Mai 2018 09:56:03 > Objet: Re: [pve-devel] pvedaemon hanging because of qga retry > >> I have looked at libvirt/ovirt. >> >> It seem that's it's possible to detect if agent is connected, through a qmp >> event VSERPORT_CHANGE. >> >> https://git.qemu.org/?p=qemu.git;a=commit;h=e2ae6159 >> https://git.qemu.org/?p=qemu.git;a=blobdiff;f=docs/qmp/qmp-events.txt;h=d759d197486a3edf3b629fb11e9922ad92fb041a;hp=9d7439e3073ac63b639ce282c7466933ccb411b4;hb=032baddea36330384b3654fcbfafa74cc815471c;hpb=db52658b38fea4e54c23c9cfbced9478d368aa84 >> > > Seems this patch does not solve the 'high load' problem at all? > > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel