Re: [pve-devel] [PATCH manager 2/3] pvescheduler: reworking child pid tracking

2021-11-22 Thread Fabian Ebner

On 18.11.21 14:28, Dominik Csapak wrote:

previously, systemd timers were responsible for running replication jobs.
those timers would not restart if the previous one is still running.

though trying again while it is running does no harm really, it spams
the log with errors about not being able to acquire the correct lock

to fix this, we rework the handling of child processes such that we only
start one per loop if there is currently none running. for that,
introduce the types of forks we do and allow one child process per type
(for now, we have 'jobs' and 'replication' as types)

Signed-off-by: Dominik Csapak 
---
  PVE/Service/pvescheduler.pm | 42 -
  1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/PVE/Service/pvescheduler.pm b/PVE/Service/pvescheduler.pm
index d4f73702..466cc599 100755
--- a/PVE/Service/pvescheduler.pm
+++ b/PVE/Service/pvescheduler.pm
@@ -17,12 +17,16 @@ my $cmdline = [$0, @ARGV];
  my %daemon_options = (stop_wait_time => 180, max_workers => 0);
  my $daemon = __PACKAGE__->new('pvescheduler', $cmdline, %daemon_options);
  
+my @types = qw(replication jobs);

+
  my $finish_jobs = sub {
  my ($self) = @_;
-foreach my $cpid (keys %{$self->{jobs}}) {
-   my $waitpid = waitpid($cpid, WNOHANG);
-   if (defined($waitpid) && ($waitpid == $cpid)) {
-   delete ($self->{jobs}->{$cpid});
+for my $type (@types) {
+   if (my $cpid = $self->{jobs}->{$type}) {
+   my $waitpid = waitpid($cpid, WNOHANG);
+   if (defined($waitpid) && ($waitpid == $cpid)) {
+   $self->{jobs}->{$type} = undef;
+   }
}
  }
  };
@@ -41,7 +45,11 @@ sub run {
  };
  
  my $fork = sub {

-   my ($sub) = @_;
+   my ($type, $sub) = @_;
+
+   # don't fork again if the previous iteration still runs
+   return if defined($self->{jobs}->{$type});
+
my $child = fork();
if (!defined($child)) {
die "fork failed: $!\n";
@@ -56,16 +64,16 @@ sub run {
POSIX::_exit(0);
}
  
-	$jobs->{$child} = 1;

+   $jobs->{$type} = $child;
  };
  
  my $run_jobs = sub {
  
-	$fork->(sub {

+   $fork->('replication', sub {
PVE::API2::Replication::run_jobs(undef, sub {}, 0, 1);
});
  
-	$fork->(sub {

+   $fork->('jobs', sub {
PVE::Jobs::run_jobs();
});
  };
@@ -92,14 +100,16 @@ sub run {
}
  }
  
-# jobs have a lock timeout of 60s, wait a bit more for graceful termination

+# replication jobs have a lock timeout of 60s, wait a bit more for 
graceful termination
  my $timeout = 0;
-while (keys %$jobs > 0 && $timeout < 75) {
-   kill 'TERM', keys %$jobs;
-   $timeout += sleep(5);
+for my $type (@types) {
+   while (defined($jobs->{$type}) && $timeout < 75) {
+   kill 'TERM', $jobs->{$type};
+   $timeout += sleep(5);
+   }
+   # ensure the rest gets stopped
+   kill 'KILL', $jobs->{$type} if defined($jobs->{$type});
  }
-# ensure the rest gets stopped
-kill 'KILL', keys %$jobs if (keys %$jobs > 0);
  }


Nit: this changes the behavior a bit, as it can happen that the timeout 
is "used up" for one job type and all following ones only get the KILL 
singal. Because of the code below, each child still gets at least one 
TERM, so not a big deal.


  
  sub shutdown {

@@ -107,7 +117,9 @@ sub shutdown {
  
  syslog('info', 'got shutdown request, signal running jobs to stop');
  
-kill 'TERM', keys %{$self->{jobs}};

+for my $type (@types) {
+   kill 'TERM', $self->{jobs}->{$type} if $self->{jobs}->{$type};
+}
  $self->{shutdown_request} = 1;
  }
  




___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH qemu-server 1/2] migrate: improve start STDIN-parameter parsing

2021-11-22 Thread Fabian Grünbichler
only do the compat fallback if no explicit spice ticket was given, and
warn on unknown parameters on STDIN.

Signed-off-by: Fabian Grünbichler 
---
 PVE/API2/Qemu.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 6830009..8b834bb 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -2328,9 +2328,11 @@ __PACKAGE__->register_method({
$nbd_protocol_version = $1;
} elsif ($line =~ m/^replicated_volume: (.*)$/) {
$replicated_volumes->{$1} = 1;
-   } else {
+   } elsif (!$spice_ticket) {
# fallback for old source node
$spice_ticket = $line;
+   } else {
+   warn "unknown 'start' parameter on STDIN: '$line'\n";
}
}
}
-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH qemu-server 2/2] migrate: send updated tpmstate volid to target node

2021-11-22 Thread Fabian Grünbichler
and update the in-memory config for starting the target VM accordingly.
this possibly breaks migration new -> old iff
- spice is not used (else the explicit ticket wins because it comes
  later)
- a local TPM state volume is used
- that local TPM state volume has a different volume id on the target
  node (switched storage, volname already taken, ..)

because the target node will then mis-interpret the tpmstate0 line as
spice ticket and set it accordingly. if the old tpm state volume ID does
not exist on the target node, migration will fail. if it exists by
chance, it might work albeit with a wrong spice ticket (new because of
this patch) and tpm state volume (pre-existing breakage).

Signed-off-by: Fabian Grünbichler 
---

Notes:

https://forum.proxmox.com/threads/error-live-migrate-a-windows-vm-with-tpm.99906/#post-431345

IMHO this is okay since new->old is always best effort and might fail for
$reasons. we could extend this to check the broadcasted version on the 
source
node once pve-manager requires a new enough qemu-server, and die early 
there to
avoid running into the compat issue..

the original compat logic was unfortunate - an "old" source node would only
ever send an unprefixed spice ticket as first line on STDIN, if the compat
logic had included a check for that this would be a bit easier..

 PVE/API2/Qemu.pm   |  4 
 PVE/QemuMigrate.pm |  8 
 PVE/QemuServer.pm  | 10 +-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 8b834bb..6992f6f 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -2319,6 +2319,7 @@ __PACKAGE__->register_method({
my $spice_ticket;
my $nbd_protocol_version = 0;
my $replicated_volumes = {};
+   my $tpmstate_vol;
if ($stateuri && ($stateuri eq 'tcp' || $stateuri eq 'unix') && 
$migratedfrom && ($rpcenv->{type} eq 'cli')) {
while (defined(my $line = )) {
chomp $line;
@@ -2328,6 +2329,8 @@ __PACKAGE__->register_method({
$nbd_protocol_version = $1;
} elsif ($line =~ m/^replicated_volume: (.*)$/) {
$replicated_volumes->{$1} = 1;
+   } elsif ($line =~ m/^tpmstate0: (.*)$/) {
+   $tpmstate_vol = $1;
} elsif (!$spice_ticket) {
# fallback for old source node
$spice_ticket = $line;
@@ -2369,6 +2372,7 @@ __PACKAGE__->register_method({
storagemap => $storagemap,
nbd_proto_version => $nbd_protocol_version,
replicated_volumes => $replicated_volumes,
+   tpmstate_vol => $tpmstate_vol,
};
 
my $params = {
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index ae3eaf1..c9bc39d 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -870,6 +870,14 @@ sub phase2 {
 # version > 0 for unix socket support
 my $nbd_protocol_version = 1;
 my $input = "nbd_protocol_version: $nbd_protocol_version\n";
+
+if ($conf->{tpmstate0}) {
+   my $tpmdrive = PVE::QemuServer::parse_drive('tpmstate0', 
$conf->{tpmstate0});
+   my $tpmvol = $tpmdrive->{file};
+   $input .= "tpmstate0: $self->{volume_map}->{$tpmvol}"
+   if $self->{volume_map}->{$tpmvol} && $tpmvol ne 
$self->{volume_map}->{$tpmvol};
+}
+
 $input .= "spice_ticket: $spice_ticket\n" if $spice_ticket;
 
 my @online_replicated_volumes = $self->filter_local_volumes('online', 1);
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 76d45a2..45b704d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5370,7 +5370,8 @@ sub vm_start {
 #   network => CIDR of migration network
 #   type => secure/insecure - tunnel over encrypted connection or plain-text
 #   nbd_proto_version => int, 0 for TCP, 1 for UNIX
-#   replicated_volumes = which volids should be re-used with bitmaps for nbd 
migration
+#   replicated_volumes => which volids should be re-used with bitmaps for nbd 
migration
+#   tpmstate_vol => new volid of tpmstate0, not yet contained in config
 sub vm_start_nolock {
 my ($storecfg, $vmid, $conf, $params, $migrate_opts) = @_;
 
@@ -5395,6 +5396,13 @@ sub vm_start_nolock {
 # this way we can reuse the old ISO with the correct config
 PVE::QemuServer::Cloudinit::generate_cloudinitconfig($conf, $vmid) if 
!$migratedfrom;
 
+# override TPM state vol if migrated, conf is out of date still
+if (my $tpmvol = $migrate_opts->{tpmstate_vol}) {
+my $parsed = parse_drive("tpmstate0", $conf->{tpmstate0});
+$parsed->{file} = $tpmvol;
+$conf->{tpmstate0} = print_drive($parsed);
+}
+
 my $defaults = load_defaults();
 
 # set environment variable useful inside network script
-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/

[pve-devel] applied-series: [PATCH qemu-server 1/2] migrate: improve start STDIN-parameter parsing

2021-11-22 Thread Thomas Lamprecht
On 22.11.21 11:30, Fabian Grünbichler wrote:
> only do the compat fallback if no explicit spice ticket was given, and
> warn on unknown parameters on STDIN.
> 
> Signed-off-by: Fabian Grünbichler 
> ---
>  PVE/API2/Qemu.pm | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
>

applied, thanks! Adapted the commit message of 2/2 (included the link and tried 
to also
spell out what was wrong).


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied-series: [PATCH manager 1/3] pvescheduler: catch errors in forked childs

2021-11-22 Thread Thomas Lamprecht
On 18.11.21 14:28, Dominik Csapak wrote:
> if '$sub' dies, the error handler of PVE::Daemon triggers, which
> initiates a shutdown of the child, resulting in confusing error logs
> (e.g. 'got shutdown request, signal running jobs to stop')
> 
> instead, run it under 'eval' and print the error to the sylog instead
> 
> Signed-off-by: Dominik Csapak 
> ---
>  PVE/Service/pvescheduler.pm | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
>

applied series, thanks!

I reworked the worker termination on stop though and also fixed a issue with a 
possible
artificial delay of the reload command (it was also aligned to the minute 
boundary).


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH widget-toolkit 3/3] ui: logpanel: work around browser 'setTimeout' quirks

2021-11-22 Thread Dominik Csapak
for some reason, some browsers omit the scroll event if the 'scrollTo'
was done before layouting in some cases, so manually trigger our 'onScroll'
handler to set the new start parameter alwyas after scrolling

Signed-off-by: Dominik Csapak 
---
 src/panel/LogView.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/panel/LogView.js b/src/panel/LogView.js
index 4a273a1..be3850e 100644
--- a/src/panel/LogView.js
+++ b/src/panel/LogView.js
@@ -67,7 +67,7 @@ Ext.define('Proxmox.panel.LogView', {
 
if (view.scrollToEnd && scrollPos <= 5) {
// we use setTimeout to work around scroll handling on 
touchscreens
-   setTimeout(function() { view.scrollTo(0, Infinity); }, 10);
+   setTimeout(function() { view.scrollTo(0, Infinity); 
me.onScroll(); }, 10);
}
},
 
-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH widget-toolkit 0/3] improve logpanel behaviour for fast tasks

2021-11-22 Thread Dominik Csapak
improves the handling of fast moving tasks
* no glitching
* catching up to the end of the task log
* workaround browser quirks

Dominik Csapak (3):
  ui: logpanel: fix glitching on fast task logs
  ui: logpanel: catch up to very fast task logs with api calls
  ui: logpanel: work around browser 'setTimeout' quirks

 src/panel/LogView.js | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH widget-toolkit 2/3] ui: logpanel: catch up to very fast task logs with api calls

2021-11-22 Thread Dominik Csapak
if we have a task log that produces more lines than we ask for
(start+limit), set the start to the (current) total-limit.

otherwise, keep the logic to set the start so that our current line is
about in the middle of the requested data (to have a buffer in both
directions)

Signed-off-by: Dominik Csapak 
---
 src/panel/LogView.js | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/panel/LogView.js b/src/panel/LogView.js
index bbf38ee..4a273a1 100644
--- a/src/panel/LogView.js
+++ b/src/panel/LogView.js
@@ -132,14 +132,20 @@ Ext.define('Proxmox.panel.LogView', {
let line = view.getScrollY()/lineHeight;
let start = viewModel.get('params.start');
let limit = viewModel.get('params.limit');
+   let total = viewModel.get('data.total');
let viewLines = view.getHeight()/lineHeight;
 
let viewStart = Math.max(parseInt(line - 1 - view.viewBuffer, 10), 
0);
let viewEnd = parseInt(line + viewLines + 1 + view.viewBuffer, 10);
 
if (viewStart < start || viewEnd > start+limit) {
-   viewModel.set('params.start',
-   Math.max(parseInt(line - (limit / 2) + 10, 10), 0));
+   if (view.scrollToEnd && total) {
+   viewModel.set('params.start',
+   Math.max(parseInt(total - limit, 10), 0));
+   } else {
+   viewModel.set('params.start',
+   Math.max(parseInt(line - (limit / 2) + 10, 10), 0));
+   }
view.loadTask.delay(200);
}
},
-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] [PATCH widget-toolkit 1/3] ui: logpanel: fix glitching on fast task logs

2021-11-22 Thread Dominik Csapak
when we follow (scrollToEnd === true) a task log that produces many lines
per second, we often get a 'total' back that exceeds our start+limit.

in that case we do not want to pad the bottom of the task log with empty
lines, we only want that for finished task logs to preserve the scroll
bar position+size.

Signed-off-by: Dominik Csapak 
---
 src/panel/LogView.js | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/panel/LogView.js b/src/panel/LogView.js
index 1772737..bbf38ee 100644
--- a/src/panel/LogView.js
+++ b/src/panel/LogView.js
@@ -97,7 +97,9 @@ Ext.define('Proxmox.panel.LogView', {
lines[line.n - 1] = Ext.htmlEncode(line.t);
});
 
-   lines.length = total;
+   if (!view.scrollToEnd) {
+   lines.length = total;
+   }
me.updateView(lines.join(''), first - 1, total);
me.running = false;
if (me.requested) {
-- 
2.30.2



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel