[pve-devel] [PATCH manager] vzdump: Fix typo in UPID error message
Signed-off-by: Dominic Jäger --- PVE/VZDump.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm index 6e0d3dbf..aea7389b 100644 --- a/PVE/VZDump.pm +++ b/PVE/VZDump.pm @@ -522,7 +522,7 @@ sub getlock { my $maxwait = $self->{opts}->{lockwait} || $self->{lockwait}; -die "missimg UPID" if !$upid; # should not happen +die "missing UPID" if !$upid; # should not happen if (!open (SERVER_FLCK, ">>$lockfile")) { debugmsg ('err', "can't open lock on file '$lockfile' - $!", undef, 1); -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v5 manager 1/2] Allow prune-backups as an alternative to maxfiles
and make the two options mutally exclusive as long as they are specified on the same level (e.g. both from the storage configuration). Otherwise prefer option > storage config > default (only maxfiles has a default currently). Defines the backup limit for prune-backups as the sum of all keep-values. There is no perfect way to determine whether a new backup would trigger a removal with prune later: 1. we would need a way to include the not yet existing backup in a 'prune --dry-run' check. 2. even if we had that check, if it's executed right before a full hour, and the actual backup happens after the full hour, the information from the check is not correct. So in some cases, we allow backup jobs with remove=0, that will lead to a removal when the next prune is executed. Still, the job with remove=0 does not execute a prune, so: 1. There is a well-defined limit. 2. A job with remove=0 never removes an old backup. Signed-off-by: Fabian Ebner --- Changes from v4: * add newline to 'cannot have ... at the same time' error message * fix typo and correctly assign to $opts->{'prune-backups'} instead of $opts->{'prune_backups'}. Because of this, the mapping of maxfiles to keep-last had no effect in v4 PVE/API2/VZDump.pm | 4 +-- PVE/VZDump.pm | 88 -- 2 files changed, 64 insertions(+), 28 deletions(-) diff --git a/PVE/API2/VZDump.pm b/PVE/API2/VZDump.pm index 2eda973e..19fa1e3b 100644 --- a/PVE/API2/VZDump.pm +++ b/PVE/API2/VZDump.pm @@ -25,7 +25,7 @@ __PACKAGE__->register_method ({ method => 'POST', description => "Create backup.", permissions => { - description => "The user needs 'VM.Backup' permissions on any VM, and 'Datastore.AllocateSpace' on the backup storage. The 'maxfiles', 'tmpdir', 'dumpdir', 'script', 'bwlimit' and 'ionice' parameters are restricted to the 'root\@pam' user.", + description => "The user needs 'VM.Backup' permissions on any VM, and 'Datastore.AllocateSpace' on the backup storage. The 'maxfiles', 'prune-backups', 'tmpdir', 'dumpdir', 'script', 'bwlimit' and 'ionice' parameters are restricted to the 'root\@pam' user.", user => 'all', }, protected => 1, @@ -58,7 +58,7 @@ __PACKAGE__->register_method ({ if $param->{stdout}; } - foreach my $key (qw(maxfiles tmpdir dumpdir script bwlimit ionice)) { + foreach my $key (qw(maxfiles prune-backups tmpdir dumpdir script bwlimit ionice)) { raise_param_exc({ $key => "Only root may set this option."}) if defined($param->{$key}) && ($user ne 'root@pam'); } diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm index 6e0d3dbf..1fe4c4ee 100644 --- a/PVE/VZDump.pm +++ b/PVE/VZDump.pm @@ -89,6 +89,12 @@ sub storage_info { maxfiles => $scfg->{maxfiles}, }; +$info->{'prune-backups'} = PVE::JSONSchema::parse_property_string('prune-backups', $scfg->{'prune-backups'}) + if defined($scfg->{'prune-backups'}); + +die "cannot have 'maxfiles' and 'prune-backups' configured at the same time\n" + if defined($info->{'prune-backups'}) && defined($info->{maxfiles}); + if ($type eq 'pbs') { $info->{pbs} = 1; } else { @@ -459,12 +465,18 @@ sub new { if ($opts->{storage}) { my $info = eval { storage_info ($opts->{storage}) }; - $errors .= "could not get storage information for '$opts->{storage}': $@" - if ($@); - $opts->{dumpdir} = $info->{dumpdir}; - $opts->{scfg} = $info->{scfg}; - $opts->{pbs} = $info->{pbs}; - $opts->{maxfiles} //= $info->{maxfiles}; + if (my $err = $@) { + $errors .= "could not get storage information for '$opts->{storage}': $err"; + } else { + $opts->{dumpdir} = $info->{dumpdir}; + $opts->{scfg} = $info->{scfg}; + $opts->{pbs} = $info->{pbs}; + + if (!defined($opts->{'prune-backups'}) && !defined($opts->{maxfiles})) { + $opts->{'prune-backups'} = $info->{'prune-backups'}; + $opts->{maxfiles} = $info->{maxfiles}; + } + } } elsif ($opts->{dumpdir}) { $errors .= "dumpdir '$opts->{dumpdir}' does not exist" if ! -d $opts->{dumpdir}; @@ -472,7 +484,9 @@ sub new { die "internal error"; } -$opts->{maxfiles} //= $defaults->{maxfiles}; +if (!defined($opts->{'prune-backups'}) && !defined($opts->{maxfiles})) { + $opts->{maxfiles} = $defaults->{maxfiles}; +} if ($opts->{tmpdir} && ! -d $opts->{tmpdir}) { $errors .= "\n" if $errors; @@ -653,6 +667,7 @@ sub exec_backup_task { my $opts = $self->{opts}; +my $cfg = PVE::Storage::config(); my $vmid = $task->{vmid}; my $plugin = $task->{plugin}; my $vmtype = $plugin->type(); @@ -706,8 +721,18 @@ sub exec_backup_task { my $basename = $bkname . strftime("-%Y_%m_%d-%H_%M_%S", localtime($task->{backup_time
[pve-devel] [PATCH-SERIES v5] fix #2649: introduce prune-backups property for storages supporting backups
Make use of the new 'prune-backups' storage property with vzdump. Changes from v4: * drop already applied patches * rebase on current master * fix typo * add newline to error message Fabian Ebner (2): Allow prune-backups as an alternative to maxfiles Always use prune-backups instead of maxfiles internally PVE/API2/VZDump.pm | 4 +-- PVE/VZDump.pm | 72 +++--- 2 files changed, 51 insertions(+), 25 deletions(-) -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v5 manager 2/2] Always use prune-backups instead of maxfiles internally
For the use case with '--dumpdir', it's not possible to call prune_backups directly, so a little bit of special handling is required there. Signed-off-by: Fabian Ebner --- PVE/VZDump.pm | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/PVE/VZDump.pm b/PVE/VZDump.pm index 1fe4c4ee..c8f37d04 100644 --- a/PVE/VZDump.pm +++ b/PVE/VZDump.pm @@ -484,8 +484,10 @@ sub new { die "internal error"; } -if (!defined($opts->{'prune-backups'}) && !defined($opts->{maxfiles})) { - $opts->{maxfiles} = $defaults->{maxfiles}; +if (!defined($opts->{'prune-backups'})) { + $opts->{maxfiles} //= $defaults->{maxfiles}; + $opts->{'prune-backups'} = { 'keep-last' => $opts->{maxfiles} }; + delete $opts->{maxfiles}; } if ($opts->{tmpdir} && ! -d $opts->{tmpdir}) { @@ -720,16 +722,11 @@ sub exec_backup_task { my $bkname = "vzdump-$vmtype-$vmid"; my $basename = $bkname . strftime("-%Y_%m_%d-%H_%M_%S", localtime($task->{backup_time})); - my $maxfiles = $opts->{maxfiles}; my $prune_options = $opts->{'prune-backups'}; my $backup_limit = 0; - if (defined($maxfiles)) { - $backup_limit = $maxfiles; - } elsif (defined($prune_options)) { - foreach my $keep (values %{$prune_options}) { - $backup_limit += $keep; - } + foreach my $keep (values %{$prune_options}) { + $backup_limit += $keep; } if ($backup_limit && !$opts->{remove}) { @@ -952,25 +949,18 @@ sub exec_backup_task { # purge older backup if ($opts->{remove}) { - if ($maxfiles) { + if (!defined($opts->{storage})) { + my $bklist = get_backup_file_list($opts->{dumpdir}, $bkname, $task->{target}); + PVE::Storage::prune_mark_backup_group($bklist, $prune_options); - if ($self->{opts}->{pbs}) { - my $args = [$pbs_group_name, '--quiet', '1', '--keep-last', $maxfiles]; - my $logfunc = sub { my $line = shift; debugmsg ('info', $line, $logfd); }; - PVE::Storage::PBSPlugin::run_raw_client_cmd( - $opts->{scfg}, $opts->{storage}, 'prune', $args, logfunc => $logfunc); - } else { - my $bklist = get_backup_file_list($opts->{dumpdir}, $bkname, $task->{target}); - $bklist = [ sort { $b->{ctime} <=> $a->{ctime} } @$bklist ]; - - while (scalar (@$bklist) >= $maxfiles) { - my $d = pop @$bklist; - my $archive_path = $d->{path}; - debugmsg ('info', "delete old backup '$archive_path'", $logfd); - PVE::Storage::archive_remove($archive_path); - } + foreach my $prune_entry (@{$bklist}) { + next if $prune_entry->{mark} ne 'remove'; + + my $archive_path = $prune_entry->{path}; + debugmsg ('info', "delete old backup '$archive_path'", $logfd); + PVE::Storage::archive_remove($archive_path); } - } elsif (defined($prune_options)) { + } else { my $logfunc = sub { debugmsg($_[0], $_[1], $logfd) }; PVE::Storage::prune_backups($cfg, $opts->{storage}, $prune_options, $vmid, $vmtype, 0, $logfunc); } -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote: > Here a new test http://odisoweb1.odiso.net/test5 > > This has occured at corosync start > > > node1: > - > start corosync : 17:30:19 > > > node2: /etc/pve locked > -- > Current time : 17:30:24 > > > I have done backtrace of all nodes at same time with parallel ssh at 17:35:22 > > and a coredump of all nodes at same time with parallel ssh at 17:42:26 > > > (Note that this time, /etc/pve was still locked after backtrace/coredump) okay, so this time two more log lines got printed on the (again) problem causing node #13, but it still stops logging at a point where this makes no sense. I rebuilt the packages: f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb with a change of how the logging is set up (I now suspect that some messages might get dropped if the logging throughput is high enough), let's hope this gets us the information we need. please repeat the test5 again with these packages. is there anything special about node 13? network topology, slower hardware, ... ? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH manager] vzdump: Fix typo in UPID error message
On 29.09.20 10:07, Dominic Jäger wrote: > Signed-off-by: Dominic Jäger > --- > PVE/VZDump.pm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>>with a change of how the logging is set up (I now suspect that some >>messages might get dropped if the logging throughput is high enough), >>let's hope this gets us the information we need. please repeat the test5 >>again with these packages. I'll test this afternoon >>is there anything special about node 13? network topology, slower >>hardware, ... ? no nothing special, all nodes have exactly same hardware/cpu (24cores/48threads 3ghz)/memory/disk. this node is around 10% cpu usage, load is around 5. - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 10:51:32 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote: > Here a new test http://odisoweb1.odiso.net/test5 > > This has occured at corosync start > > > node1: > - > start corosync : 17:30:19 > > > node2: /etc/pve locked > -- > Current time : 17:30:24 > > > I have done backtrace of all nodes at same time with parallel ssh at 17:35:22 > > and a coredump of all nodes at same time with parallel ssh at 17:42:26 > > > (Note that this time, /etc/pve was still locked after backtrace/coredump) okay, so this time two more log lines got printed on the (again) problem causing node #13, but it still stops logging at a point where this makes no sense. I rebuilt the packages: f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb with a change of how the logging is set up (I now suspect that some messages might get dropped if the logging throughput is high enough), let's hope this gets us the information we need. please repeat the test5 again with these packages. is there anything special about node 13? network topology, slower hardware, ... ? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
here a new test: http://odisoweb1.odiso.net/test6/ node1 - start corosync : 12:08:33 node2 (/etc/pve lock) - Current time : 12:08:39 node1 (stop corosync : unlock /etc/pve) - 12:28:11 : systemctl stop corosync backtraces: 12:26:30 coredump : 12:27:21 - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 11:37:41 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown >>with a change of how the logging is set up (I now suspect that some >>messages might get dropped if the logging throughput is high enough), >>let's hope this gets us the information we need. please repeat the test5 >>again with these packages. I'll test this afternoon >>is there anything special about node 13? network topology, slower >>hardware, ... ? no nothing special, all nodes have exactly same hardware/cpu (24cores/48threads 3ghz)/memory/disk. this node is around 10% cpu usage, load is around 5. - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 10:51:32 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote: > Here a new test http://odisoweb1.odiso.net/test5 > > This has occured at corosync start > > > node1: > - > start corosync : 17:30:19 > > > node2: /etc/pve locked > -- > Current time : 17:30:24 > > > I have done backtrace of all nodes at same time with parallel ssh at 17:35:22 > > and a coredump of all nodes at same time with parallel ssh at 17:42:26 > > > (Note that this time, /etc/pve was still locked after backtrace/coredump) okay, so this time two more log lines got printed on the (again) problem causing node #13, but it still stops logging at a point where this makes no sense. I rebuilt the packages: f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb with a change of how the logging is set up (I now suspect that some messages might get dropped if the logging throughput is high enough), let's hope this gets us the information we need. please repeat the test5 again with these packages. is there anything special about node 13? network topology, slower hardware, ... ? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>> >>node1 (stop corosync : unlock /etc/pve) >>- >>12:28:11 : systemctl stop corosync sorry, this was wrong,I need to start corosync after the stop to get it working again I'll reupload theses logs - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 12:52:44 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown here a new test: http://odisoweb1.odiso.net/test6/ node1 - start corosync : 12:08:33 node2 (/etc/pve lock) - Current time : 12:08:39 node1 (stop corosync : unlock /etc/pve) - 12:28:11 : systemctl stop corosync backtraces: 12:26:30 coredump : 12:27:21 - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 11:37:41 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown >>with a change of how the logging is set up (I now suspect that some >>messages might get dropped if the logging throughput is high enough), >>let's hope this gets us the information we need. please repeat the test5 >>again with these packages. I'll test this afternoon >>is there anything special about node 13? network topology, slower >>hardware, ... ? no nothing special, all nodes have exactly same hardware/cpu (24cores/48threads 3ghz)/memory/disk. this node is around 10% cpu usage, load is around 5. - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 10:51:32 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote: > Here a new test http://odisoweb1.odiso.net/test5 > > This has occured at corosync start > > > node1: > - > start corosync : 17:30:19 > > > node2: /etc/pve locked > -- > Current time : 17:30:24 > > > I have done backtrace of all nodes at same time with parallel ssh at 17:35:22 > > and a coredump of all nodes at same time with parallel ssh at 17:42:26 > > > (Note that this time, /etc/pve was still locked after backtrace/coredump) okay, so this time two more log lines got printed on the (again) problem causing node #13, but it still stops logging at a point where this makes no sense. I rebuilt the packages: f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb with a change of how the logging is set up (I now suspect that some messages might get dropped if the logging throughput is high enough), let's hope this gets us the information we need. please repeat the test5 again with these packages. is there anything special about node 13? network topology, slower hardware, ... ? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
I have reuploaded the logs node1 - start corosync : 12:08:33 (corosync.log) node2 (/etc/pve lock) - Current time : 12:08:39 node1 (stop corosync : ---> not unlocked) (corosync-stop.log) - 12:28:11 : systemctl stop corosync node2 (start corosync: > /etc/pve unlocked(corosync-start.log) 13:41:16 : systemctl start corosync - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 13:43:08 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown >> >>node1 (stop corosync : unlock /etc/pve) >>- >>12:28:11 : systemctl stop corosync sorry, this was wrong,I need to start corosync after the stop to get it working again I'll reupload theses logs - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 12:52:44 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown here a new test: http://odisoweb1.odiso.net/test6/ node1 - start corosync : 12:08:33 node2 (/etc/pve lock) - Current time : 12:08:39 node1 (stop corosync : unlock /etc/pve) - 12:28:11 : systemctl stop corosync backtraces: 12:26:30 coredump : 12:27:21 - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 11:37:41 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown >>with a change of how the logging is set up (I now suspect that some >>messages might get dropped if the logging throughput is high enough), >>let's hope this gets us the information we need. please repeat the test5 >>again with these packages. I'll test this afternoon >>is there anything special about node 13? network topology, slower >>hardware, ... ? no nothing special, all nodes have exactly same hardware/cpu (24cores/48threads 3ghz)/memory/disk. this node is around 10% cpu usage, load is around 5. - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 10:51:32 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote: > Here a new test http://odisoweb1.odiso.net/test5 > > This has occured at corosync start > > > node1: > - > start corosync : 17:30:19 > > > node2: /etc/pve locked > -- > Current time : 17:30:24 > > > I have done backtrace of all nodes at same time with parallel ssh at 17:35:22 > > and a coredump of all nodes at same time with parallel ssh at 17:42:26 > > > (Note that this time, /etc/pve was still locked after backtrace/coredump) okay, so this time two more log lines got printed on the (again) problem causing node #13, but it still stops logging at a point where this makes no sense. I rebuilt the packages: f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb with a change of how the logging is set up (I now suspect that some messages might get dropped if the logging throughput is high enough), let's hope this gets us the information we need. please repeat the test5 again with these packages. is there anything special about node 13? network topology, slower hardware, ... ? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [RFC zfsonlinux 1/1] Add systemd-unit for importing specific pools
On 16.09.20 14:14, Stoiko Ivanov wrote: > This patch addresses the problems some users experience when some zpools are > created/imported with cachefile (which then causes other pools not to get > imported during boot) - when our tooling creates a pool we explictly > instantiate the service with the pool's name, ensuring that it will get > imported by scanning. > > Suggested-by: Fabian Grünbichler > Signed-off-by: Stoiko Ivanov > --- > ...md-unit-for-importing-specific-pools.patch | 75 +++ > debian/patches/series | 1 + > debian/zfsutils-linux.install | 1 + > 3 files changed, 77 insertions(+) > create mode 100644 > debian/patches/0008-Add-systemd-unit-for-importing-specific-pools.patch > > applied, thanks! Dropped the "Require=systemd-udev-settle.service", though. But, it's still ordered after. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
huge thanks for all the work on this btw! I think I've found a likely culprit (a missing lock around a non-thread-safe corosync library call) based on the last logs (which were now finally complete!). rebuilt packages with a proof-of-concept-fix: 23b03a48d3aa9c14e86fe8cf9bbb7b00bd8fe9483084b9e0fd75fd67f29f10bec00e317e2a66758713050f36c165d72f107ee3449f9efeb842d3a57c25f8bca7 pve-cluster_6.1-8_amd64.deb 9e1addd676513b176f5afb67cc6d85630e7da9bbbf63562421b4fd2a3916b3b2af922df555059b99f8b0b9e64171101a1c9973846e25f9144ded9d487450baef pve-cluster-dbgsym_6.1-8_amd64.deb I removed some logging statements which are no longer needed, so output is a bit less verbose again. if you are not able to trigger the issue with this package, feel free to remove the -debug and let it run for a little longer without the massive logs. if feedback from your end is positive, I'll whip up a proper patch tomorrow or on Thursday. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>>huge thanks for all the work on this btw! huge thanks to you ! ;) >>I think I've found a likely culprit (a missing lock around a >>non-thread-safe corosync library call) based on the last logs (which >>were now finally complete!). YES :) >>if feedback from your end is positive, I'll whip up a proper patch >>tomorrow or on Thursday. I'm going to launch a new test right now ! - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 15:28:19 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown huge thanks for all the work on this btw! I think I've found a likely culprit (a missing lock around a non-thread-safe corosync library call) based on the last logs (which were now finally complete!). rebuilt packages with a proof-of-concept-fix: 23b03a48d3aa9c14e86fe8cf9bbb7b00bd8fe9483084b9e0fd75fd67f29f10bec00e317e2a66758713050f36c165d72f107ee3449f9efeb842d3a57c25f8bca7 pve-cluster_6.1-8_amd64.deb 9e1addd676513b176f5afb67cc6d85630e7da9bbbf63562421b4fd2a3916b3b2af922df555059b99f8b0b9e64171101a1c9973846e25f9144ded9d487450baef pve-cluster-dbgsym_6.1-8_amd64.deb I removed some logging statements which are no longer needed, so output is a bit less verbose again. if you are not able to trigger the issue with this package, feel free to remove the -debug and let it run for a little longer without the massive logs. if feedback from your end is positive, I'll whip up a proper patch tomorrow or on Thursday. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH pve-qemu 1/5] Add transaction patches and fix for blocking finish
On 28.09.20 17:48, Stefan Reiter wrote: > With the transaction patches, patch 0026-PVE-Backup-modify-job-api.patch > is no longer necessary, so drop it and rebase all following patches on > top. > > Signed-off-by: Stefan Reiter > --- > applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH qemu-server 5/5] vzdump: log 'finishing' state
On 28.09.20 17:48, Stefan Reiter wrote: > ...and avoid printing 100% status twice > > Signed-off-by: Stefan Reiter > --- > PVE/VZDump/QemuServer.pm | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > applied, thanks! But, I did s/verification/backup validation/ to avoid some possible confusion with the more costly/in-depth server verification. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH storage] fix regression in zfs volume activation
commit 815df2dd08ac4c7295135262e60d64fbb57b8f5c introduced a small issue when activating linked clone volumes - the volname passed contains basevol/subvol, which needs to be translated to subvol. using the path method should be a robust way to get the actual path for activation. Found and tested by building the package as root (otherwise the zfs regressiontests are skipped). Reported-by: Thomas Lamprecht Signed-off-by: Stoiko Ivanov --- PVE/Storage/ZFSPoolPlugin.pm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm index 4f8df5e..6ac05b4 100644 --- a/PVE/Storage/ZFSPoolPlugin.pm +++ b/PVE/Storage/ZFSPoolPlugin.pm @@ -554,9 +554,10 @@ sub activate_volume { if ($format eq 'raw') { $class->zfs_wait_for_zvol_link($scfg, $volname); } elsif ($format eq 'subvol') { - my $mounted = $class->zfs_get_properties($scfg, 'mounted', "$scfg->{pool}/$volname"); + my ($path, undef, undef) = $class->path($scfg, $volname, $storeid); + my $mounted = $class->zfs_get_properties($scfg, 'mounted', "$path"); if ($mounted !~ m/^yes$/) { - $class->zfs_request($scfg, undef, 'mount', "$scfg->{pool}/$volname"); + $class->zfs_request($scfg, undef, 'mount', "$path"); } } -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [RFC storage 1/1] Disks: instantiate import unit for created zpool
On 16.09.20 14:14, Stoiko Ivanov wrote: > When creating a new ZFS storage, also instantiate an import-unit for the pool. > This should help mitigate the case where some pools don't get imported during > boot, because they are not listed in an existing zpool.cache file. > > This patch needs the corresponding addition of 'zfs-import@.service' in > the zfsonlinux repository. > > Suggested-by: Fabian Grünbichler > Signed-off-by: Stoiko Ivanov > --- > PVE/API2/Disks/ZFS.pm | 6 ++ > 1 file changed, 6 insertions(+) > > applied, thanks! As we have no dependency on zfsutils-linux here to do a bump to the versioned dependency, I added a simple check if the zfs-import@ template service exists (roughly, if -e '/lib/systemd/system/zfs-import@.service') ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH storage] fix regression in zfs volume activation
On 29.09.20 18:49, Stoiko Ivanov wrote: > commit 815df2dd08ac4c7295135262e60d64fbb57b8f5c introduced a small issue > when activating linked clone volumes - the volname passed contains > basevol/subvol, which needs to be translated to subvol. > > using the path method should be a robust way to get the actual path for > activation. > > Found and tested by building the package as root (otherwise the zfs > regressiontests are skipped). > > Reported-by: Thomas Lamprecht > Signed-off-by: Stoiko Ivanov > --- > PVE/Storage/ZFSPoolPlugin.pm | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Hi, some news, my last test is running for 14h now, and I don't have had any problem :) So, it seem that is indeed fixed ! Congratulations ! I wonder if it could be related to this forum user https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/ His problem is that after corosync lag (he's have 1 cluster stretch on 2DC with 10km distance, so I think sometimes he's having some small lag, 1 node is flooding other nodes with a lot of udp packets. (and making things worst, as corosync cpu is going to 100% / overloaded, and then can't see other onodes I had this problem 6month ago after shutting down a node, that's why I'm thinking it could "maybe" related. So, I wonder if it could be same pmxcfs bug, when something looping or send again again packets. The forum user seem to have the problem multiple times in some week, so maybe he'll be able to test the new fixed pmxcs, and tell us if it's fixing this bug too. - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 15:52:18 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown >>huge thanks for all the work on this btw! huge thanks to you ! ;) >>I think I've found a likely culprit (a missing lock around a >>non-thread-safe corosync library call) based on the last logs (which >>were now finally complete!). YES :) >>if feedback from your end is positive, I'll whip up a proper patch >>tomorrow or on Thursday. I'm going to launch a new test right now ! - Mail original - De: "Fabian Grünbichler" À: "Proxmox VE development discussion" Envoyé: Mardi 29 Septembre 2020 15:28:19 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown huge thanks for all the work on this btw! I think I've found a likely culprit (a missing lock around a non-thread-safe corosync library call) based on the last logs (which were now finally complete!). rebuilt packages with a proof-of-concept-fix: 23b03a48d3aa9c14e86fe8cf9bbb7b00bd8fe9483084b9e0fd75fd67f29f10bec00e317e2a66758713050f36c165d72f107ee3449f9efeb842d3a57c25f8bca7 pve-cluster_6.1-8_amd64.deb 9e1addd676513b176f5afb67cc6d85630e7da9bbbf63562421b4fd2a3916b3b2af922df555059b99f8b0b9e64171101a1c9973846e25f9144ded9d487450baef pve-cluster-dbgsym_6.1-8_amd64.deb I removed some logging statements which are no longer needed, so output is a bit less verbose again. if you are not able to trigger the issue with this package, feel free to remove the -debug and let it run for a little longer without the massive logs. if feedback from your end is positive, I'll whip up a proper patch tomorrow or on Thursday. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Hi, On 30.09.20 08:09, Alexandre DERUMIER wrote: > some news, my last test is running for 14h now, and I don't have had any > problem :) > great! Thanks for all your testing time, this would have been much harder, if even possible at all, without you probiving so much testing effort on a production(!) cluster - appreciated! Naturally many thanks to Fabian too, for reading so many logs without going insane :-) > So, it seem that is indeed fixed ! Congratulations ! > honza comfirmed Fabians suspicion about lacking guarantees of thread safety for cpg_mcast_joined, which was sadly not documented, so this is surely a bug, let's hope the last of such hard to reproduce ones. > > > I wonder if it could be related to this forum user > https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/ > > His problem is that after corosync lag (he's have 1 cluster stretch on 2DC > with 10km distance, so I think sometimes he's having some small lag, > 1 node is flooding other nodes with a lot of udp packets. (and making things > worst, as corosync cpu is going to 100% / overloaded, and then can't see > other onodes I can imagine this problem showing up as a a side effect of a flood where partition changes happen. Not so sure that this can be the cause of that directly. > > I had this problem 6month ago after shutting down a node, that's why I'm > thinking it could "maybe" related. > > So, I wonder if it could be same pmxcfs bug, when something looping or send > again again packets. > > The forum user seem to have the problem multiple times in some week, so maybe > he'll be able to test the new fixed pmxcs, and tell us if it's fixing this > bug too. Testing once available would be sure a good idea for them. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel