Re: [pve-devel] [PATCH qemu-server] vzdump: use minimal VM config for offline backup

2020-09-07 Thread Dominik Csapak

does that not break the feature that
we can start a vm that started a backup while stopped?

atm we can start a backup on a stopped vm, and then simply start
it, without aborting the backup. if i read the patch
correctly, the vm now has just a minimal config and not
what the user configured

On 8/24/20 11:21 AM, Stefan Reiter wrote:

In case we backup a stopped VM, we start an instance of QEMU to run the
backup job. This instance will be killed afterwards without ever running
the actual VM, so there's no need to potentially allocate or use host
system resources for features never used.

The minimal_trim_opts array contains elements that will be cleaned from
the config before starting.

We only write back the config in case of resume, which is never set together
with the new "minimal" option.

Reported in the forum:
https://forum.proxmox.com/threads/pbs-tries-to-start-vms-during-backup-not-enough-ram.74773/

Signed-off-by: Stefan Reiter 
---

It is weird to me that this even happens, AFAIU QEMU only allocates the guest
RAM as virtual as long as the guest doesn't write to it - but the forum post
proves that it does, and it's probably also a good idea to not assign hostpci
and usb devices for offline backups.

  PVE/QemuServer.pm| 34 ++
  PVE/VZDump/QemuServer.pm |  1 +
  2 files changed, 35 insertions(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index a5ee8e2..73eb561 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4778,6 +4778,37 @@ sub vm_migrate_alloc_nbd_disks {
  return $nbd;
  }
  
+my @minimal_trim_opts = (

+qr/^hostpci\d+/,
+qr/^serial\d+/,
+qr/^usb\d+/,
+qr/^net\d+/,
+qr/^rng\d+/,
+qr/^ivshmem/,
+qr/^hugepages/,
+qr/^smp/,
+qr/^cores/,
+qr/^sockets/,
+qr/^vcpus/,
+qr/^cpu/,
+qr/^agent/,
+qr/^numa(?:\d+)?/,
+);
+
+my sub vm_trim_conf_minimal {
+my ($conf) = @_;
+
+$conf->{memory} = 1;
+$conf->{balloon} = 0;
+
+# remove anything that does not affect backup but can claim host resources
+foreach my $r (@minimal_trim_opts) {
+   foreach my $key (keys %{$conf}) {
+   delete $conf->{$key} if $key =~ $r;
+   }
+}
+}
+
  # see vm_start_nolock for parameters, additionally:
  # migrate_opts:
  #   storagemap = parsed storage map for allocating NBD disks
@@ -4823,6 +4854,7 @@ sub vm_start {
  #   timeout => in seconds
  #   paused => start VM in paused state (backup)
  #   resume => resume from hibernation
+#   minimal => only use necessary resources (backup)
  # migrate_opts:
  #   nbd => volumes for NBD exports (vm_migrate_alloc_nbd_disks)
  #   migratedfrom => source node
@@ -4851,6 +4883,8 @@ sub vm_start_nolock {
$conf = PVE::QemuConfig->load_config($vmid); # update/reload
  }
  
+vm_trim_conf_minimal($conf) if $params->{minimal};

+
  PVE::QemuServer::Cloudinit::generate_cloudinitconfig($conf, $vmid);
  
  my $defaults = load_defaults();

diff --git a/PVE/VZDump/QemuServer.pm b/PVE/VZDump/QemuServer.pm
index 7297795..ea8faa3 100644
--- a/PVE/VZDump/QemuServer.pm
+++ b/PVE/VZDump/QemuServer.pm
@@ -827,6 +827,7 @@ sub enforce_vm_running_for_backup {
skiplock => 1,
skiptemplate => 1,
paused => 1,
+   minimal => !$self->{vm_was_running},
};
PVE::QemuServer::vm_start($self->{storecfg}, $vmid, $params);
  };





___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread Thomas Lamprecht
On 06.09.20 14:19, dietmar wrote:
>> On 09/06/2020 2:14 PM dietmar  wrote:
>>
>>  
>>> Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds)
>>> Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds)
>>
>> Indeed, this should not happen. Do you use a spearate network for corosync? 
>> Or
>> was there high traffic on the network? What kind of maintenance was the 
>> reason
>> for the shutdown?
> 
> Do you use the default corosync timeout values, or do you have a special 
> setup?
> 


Can you please post the full corosync config?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH qemu-server] vzdump: use minimal VM config for offline backup

2020-09-07 Thread Thomas Lamprecht
On 07.09.20 08:59, Dominik Csapak wrote:
> does that not break the feature that
> we can start a vm that started a backup while stopped?
> 
> atm we can start a backup on a stopped vm, and then simply start
> it, without aborting the backup. if i read the patch
> correctly, the vm now has just a minimal config and not
> what the user configured
> 

Yes, as the commit subject mentions, the whole idea of the patch was to use
a minimal config ;-)

this was before I talked about the whole "no shutdown" thing for VMs, where
Stefan acknowledged himself that this patch breaks various things, existing
and planned, about that behavior.
We should had replied that to this patch too though.



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread Alexandre DERUMIER
>>Indeed, this should not happen. Do you use a spearate network for corosync? 

No, I use 2x40GB lacp link. 

>>was there high traffic on the network? 

but I'm far from saturated them. (in pps or througput),  (I'm around 3-4gbps)


The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms)


From my understanding, watchdog-mux was still runing as the watchdog have reset 
only after 1min and not 10s,
 so it's like the lrm was blocked and not sending watchdog timer reset to 
watchdog-mux.


I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able 
to debug.



>>What kind of maintenance was the reason for the shutdown?

ram upgrade. (the server was running ok before shutdown, no hardware problem)  
(I just shutdown the server, and don't have started it yet when problem occur)



>>Do you use the default corosync timeout values, or do you have a special 
>>setup?


no special tuning, default values. (I don't have any retransmit since months in 
the logs)

>>Can you please post the full corosync config?

(I have verified, the running version was corosync was 3.0.3 with libknet 1.15)


here the config:

"
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
name: m6kvm1
nodeid: 1
quorum_votes: 1
ring0_addr: m6kvm1
  }
  node {
name: m6kvm10
nodeid: 10
quorum_votes: 1
ring0_addr: m6kvm10
  }
  node {
name: m6kvm11
nodeid: 11
quorum_votes: 1
ring0_addr: m6kvm11
  }
  node {
name: m6kvm12
nodeid: 12
quorum_votes: 1
ring0_addr: m6kvm12
  }
  node {
name: m6kvm13
nodeid: 13
quorum_votes: 1
ring0_addr: m6kvm13
  }
  node {
name: m6kvm14
nodeid: 14
quorum_votes: 1
ring0_addr: m6kvm14
  }
  node {
name: m6kvm2
nodeid: 2
quorum_votes: 1
ring0_addr: m6kvm2
  }
  node {
name: m6kvm3
nodeid: 3
quorum_votes: 1
ring0_addr: m6kvm3
  }
  node {
name: m6kvm4
nodeid: 4
quorum_votes: 1
ring0_addr: m6kvm4
  }
  node {
name: m6kvm5
nodeid: 5
quorum_votes: 1
ring0_addr: m6kvm5
  }
  node {
name: m6kvm6
nodeid: 6
quorum_votes: 1
ring0_addr: m6kvm6
  }
  node {
name: m6kvm7
nodeid: 7
quorum_votes: 1
ring0_addr: m6kvm7
  }

  node {
name: m6kvm8
nodeid: 8
quorum_votes: 1
ring0_addr: m6kvm8
  }
  node {
name: m6kvm9
nodeid: 9
quorum_votes: 1
ring0_addr: m6kvm9
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: m6kvm
  config_version: 19
  interface {
bindnetaddr: 10.3.94.89
ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  transport: knet
  version: 2
}



- Mail original -
De: "dietmar" 
À: "aderumier" , "Proxmox VE development discussion" 

Cc: "pve-devel" 
Envoyé: Dimanche 6 Septembre 2020 14:14:06
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

> Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) 
> Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) 

Indeed, this should not happen. Do you use a spearate network for corosync? Or 
was there high traffic on the network? What kind of maintenance was the reason 
for the shutdown? 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata

2020-09-07 Thread Alexandre DERUMIER
Hi,

not related to cloudinit, but for virtio-net nic, it's already possible to add 
"mtu=xxx" option to netX:.

It's not yet available in gui, but you should be able to do it with "qm set 
  --net0  ...,mtu="



- Mail original -
De: "proxmox" 
À: "Proxmox VE development discussion" 
Envoyé: Vendredi 4 Septembre 2020 17:21:24
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata

Hello 



I didn't know this patch mail got approved, so sorry for the (very) late 
response. 



My intention for not going with snippets was the fact that they could not be 
created via the API and one would have to manually create a file on the target 
machine for cloud-init userdata. 



One possible use case was to spin up a kubernetes cluster on proxmox only via 
API. 



I wanted to have something similar to the hetzner cloud API where the full 
userdata can be submitted for VM provisioning: 
https://docs.hetzner.cloud/#servers-create-a-server 



So going further here you want me to submit the MTU patches separately? 



Should I integrate userdata into the cicustom field? I thought this would make 
things more complex in favor of parsing out the base64 stuff. So I would still 
go with an extra field. 

Thoughts? 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied-series: [PATCH common/manager v2] adapt PVE::Tools::sendmail to match rust-implementation and use it for apt update notifications

2020-09-07 Thread Thomas Lamprecht
On 03.09.20 14:09, Stoiko Ivanov wrote:
> v1->v2:
> * dropped the moving of the email regex for reuse in the sendmail helper:
>   we use local usernames (mostly 'root') quite extensively in our code-base 
> and
>   our users probably do so as well (for backup notifications)
> * replaced the direct invocation of /usr/sbin/sendmail by a call to the helper
>   in pve-manager/PVE/API2/APT.pm
> * adapted the sendmail-helper to allow for an empty display name in the from
>   header, since the apt update notification mail got sent without one.
> 
> The second patch in pve-common can be squashed into the first one, if
> preferred.

The separation in two patches as you did was good and warranted.

> 
> original cover-letter for v1:
> The 2 patches adapt PVE::Tools::sendmail to closely match the recently merged
> implementation in our rust repository - see [0].
> 
> I moved the email regex from JSONSchema to Tools to reuse it for the sendmail
> function (and eliminate one of the few email-address regexes in our codebase).
> 
> I did not add a dependency on libtimedate-perl (where Date::Format is), since
> we already use  Date::Parse in PVE::Certificate, without explicit dependency,
> and it gets pulled in via libwww-perl -> libhttp-date-perl -> 
> libtimedate-perl.
> 
> Glad to send an update for the dependency of course.

should still note that, depending on indirect dependency chains isn't to ideal.

> 
> [0] https://lists.proxmox.com/pipermail/pbs-devel/2020-August/000423.html
> 
> Stoiko Ivanov (2):
>   move email regex from JSONSchema to Tools
>   sendmail-helper: only send multipart if necessary
> 
>  src/PVE/JSONSchema.pm |  4 ++--
>  src/PVE/Tools.pm  | 49 +--
>  2 files changed, 35 insertions(+), 18 deletions(-)
> 
> pve-common:
> Stoiko Ivanov (2):
>   sendmail-helper: only send multipart if necessary
>   sendmail helper: allow empty display name in from
> 
>  src/PVE/Tools.pm | 43 +--
>  1 file changed, 29 insertions(+), 14 deletions(-)
> 
> pve-manager:
> Stoiko Ivanov (1):
>   use PVE::Tools::sendmail for update notifications
> 
>  PVE/API2/APT.pm | 19 ---
>  1 file changed, 4 insertions(+), 15 deletions(-)
> 



applied series, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread dietmar
There is a similar report in the forum:

https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111

No HA involved...


> On 09/07/2020 9:19 AM Alexandre DERUMIER  wrote:
> 
>  
> >>Indeed, this should not happen. Do you use a spearate network for corosync? 
> 
> No, I use 2x40GB lacp link. 
> 
> >>was there high traffic on the network? 
> 
> but I'm far from saturated them. (in pps or througput),  (I'm around 3-4gbps)
> 
> 
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms)
> 
> 
> From my understanding, watchdog-mux was still runing as the watchdog have 
> reset only after 1min and not 10s,
>  so it's like the lrm was blocked and not sending watchdog timer reset to 
> watchdog-mux.
> 
> 
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able 
> to debug.
> 
> 
> 
> >>What kind of maintenance was the reason for the shutdown?
> 
> ram upgrade. (the server was running ok before shutdown, no hardware problem) 
>  
> (I just shutdown the server, and don't have started it yet when problem occur)
> 
> 
> 
> >>Do you use the default corosync timeout values, or do you have a special 
> >>setup?
> 
> 
> no special tuning, default values. (I don't have any retransmit since months 
> in the logs)
> 
> >>Can you please post the full corosync config?
> 
> (I have verified, the running version was corosync was 3.0.3 with libknet 
> 1.15)
> 
> 
> here the config:
> 
> "
> logging {
>   debug: off
>   to_syslog: yes
> }
> 
> nodelist {
>   node {
> name: m6kvm1
> nodeid: 1
> quorum_votes: 1
> ring0_addr: m6kvm1
>   }
>   node {
> name: m6kvm10
> nodeid: 10
> quorum_votes: 1
> ring0_addr: m6kvm10
>   }
>   node {
> name: m6kvm11
> nodeid: 11
> quorum_votes: 1
> ring0_addr: m6kvm11
>   }
>   node {
> name: m6kvm12
> nodeid: 12
> quorum_votes: 1
> ring0_addr: m6kvm12
>   }
>   node {
> name: m6kvm13
> nodeid: 13
> quorum_votes: 1
> ring0_addr: m6kvm13
>   }
>   node {
> name: m6kvm14
> nodeid: 14
> quorum_votes: 1
> ring0_addr: m6kvm14
>   }
>   node {
> name: m6kvm2
> nodeid: 2
> quorum_votes: 1
> ring0_addr: m6kvm2
>   }
>   node {
> name: m6kvm3
> nodeid: 3
> quorum_votes: 1
> ring0_addr: m6kvm3
>   }
>   node {
> name: m6kvm4
> nodeid: 4
> quorum_votes: 1
> ring0_addr: m6kvm4
>   }
>   node {
> name: m6kvm5
> nodeid: 5
> quorum_votes: 1
> ring0_addr: m6kvm5
>   }
>   node {
> name: m6kvm6
> nodeid: 6
> quorum_votes: 1
> ring0_addr: m6kvm6
>   }
>   node {
> name: m6kvm7
> nodeid: 7
> quorum_votes: 1
> ring0_addr: m6kvm7
>   }
> 
>   node {
> name: m6kvm8
> nodeid: 8
> quorum_votes: 1
> ring0_addr: m6kvm8
>   }
>   node {
> name: m6kvm9
> nodeid: 9
> quorum_votes: 1
> ring0_addr: m6kvm9
>   }
> }
> 
> quorum {
>   provider: corosync_votequorum
> }
> 
> totem {
>   cluster_name: m6kvm
>   config_version: 19
>   interface {
> bindnetaddr: 10.3.94.89
> ringnumber: 0
>   }
>   ip_version: ipv4
>   secauth: on
>   transport: knet
>   version: 2
> }
> 
> 
> 
> - Mail original -
> De: "dietmar" 
> À: "aderumier" , "Proxmox VE development discussion" 
> 
> Cc: "pve-devel" 
> Envoyé: Dimanche 6 Septembre 2020 14:14:06
> Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
> 
> > Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) 
> > Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) 
> 
> Indeed, this should not happen. Do you use a spearate network for corosync? 
> Or 
> was there high traffic on the network? What kind of maintenance was the 
> reason 
> for the shutdown?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [Patch v2 access-control] fix #2947 login name for the LDAP/AD realm can be case-insensitive

2020-09-07 Thread Dominik Csapak

one comment inline

On 9/3/20 10:36 AM, Wolfgang Link wrote:

This is an optional for LDAP and AD realm.
The default behavior is case-sensitive.

Signed-off-by: Wolfgang Link 
---
v1 -> v2:* naming of paramenter
* use grep instead of a loop, to avoid login errors
  with ambiguous usernames

  PVE/API2/AccessControl.pm | 23 +++
  PVE/Auth/AD.pm|  1 +
  PVE/Auth/LDAP.pm  |  7 +++
  3 files changed, 31 insertions(+)

diff --git a/PVE/API2/AccessControl.pm b/PVE/API2/AccessControl.pm
index fd27786..3155d67 100644
--- a/PVE/API2/AccessControl.pm
+++ b/PVE/API2/AccessControl.pm
@@ -226,6 +226,28 @@ __PACKAGE__->register_method ({
  returns => { type => "null" },
  code => sub { return undef; }});
  
+sub lookup_username {

+my ($username) = @_;
+
+$username =~ /@(.+)/;


i do not know if you saw my last mail, but we have to do a
better regex here, since the username can contain an '@'

so foo@bar@pve is a valid username (:-S)
and the realm here would parse to 'bar@pve'

so a better regex would be
/@([^@]+)$/


+
+my $realm = $1;
+my $domain_cfg = cfs_read_file("domains.cfg");
+my $casesensitive = $domain_cfg->{ids}->{$realm}->{'case-sensitive'} // 1;
+my $usercfg = cfs_read_file('user.cfg');
+
+if (!$casesensitive) {
+   my @matches = grep { lc $username eq lc $_ } (keys 
%{$usercfg->{users}});
+
+   die "ambiguous case insensitive match of username '$username', cannot safely 
grant access!\n"
+   if scalar @matches > 1;
+
+   return $matches[0]
+}
+
+return $username;
+};
+
  __PACKAGE__->register_method ({
  name => 'create_ticket',
  path => 'ticket',
@@ -292,6 +314,7 @@ __PACKAGE__->register_method ({
my $username = $param->{username};
$username .= "\@$param->{realm}" if $param->{realm};
  
+	$username = lookup_username($username);

my $rpcenv = PVE::RPCEnvironment::get();
  
  	my $res;

diff --git a/PVE/Auth/AD.pm b/PVE/Auth/AD.pm
index 4d64c20..88b2098 100755
--- a/PVE/Auth/AD.pm
+++ b/PVE/Auth/AD.pm
@@ -94,6 +94,7 @@ sub options {
group_classes => { optional => 1 },
'sync-defaults-options' => { optional => 1 },
mode => { optional => 1 },
+   'case-sensitive' => { optional => 1 },
  };
  }
  
diff --git a/PVE/Auth/LDAP.pm b/PVE/Auth/LDAP.pm

index 09b2202..97d0778 100755
--- a/PVE/Auth/LDAP.pm
+++ b/PVE/Auth/LDAP.pm
@@ -129,6 +129,12 @@ sub properties {
optional => 1,
default => 'ldap',
},
+'case-sensitive' => {
+   description => "username is case-sensitive",
+   type => 'boolean',
+   optional => 1,
+   default => 1,
+   }
  };
  }
  
@@ -159,6 +165,7 @@ sub options {

group_classes => { optional => 1 },
'sync-defaults-options' => { optional => 1 },
mode => { optional => 1 },
+   'case-sensitive' => { optional => 1 },
  };
  }
  





___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [Patch v2 access-control] fix #2947 login name for the LDAP/AD realm can be case-insensitive

2020-09-07 Thread Wolfgang Link
No I missed your mail.
Will fix it and resend it.
> On 09/07/2020 10:20 AM Dominik Csapak  wrote:
> 
>  
> one comment inline
> 
> On 9/3/20 10:36 AM, Wolfgang Link wrote:
> > This is an optional for LDAP and AD realm.
> > The default behavior is case-sensitive.
> > 
> > Signed-off-by: Wolfgang Link 
> > ---
> > v1 -> v2:   * naming of paramenter
> > * use grep instead of a loop, to avoid login errors
> >   with ambiguous usernames
> > 
> >   PVE/API2/AccessControl.pm | 23 +++
> >   PVE/Auth/AD.pm|  1 +
> >   PVE/Auth/LDAP.pm  |  7 +++
> >   3 files changed, 31 insertions(+)
> > 
> > diff --git a/PVE/API2/AccessControl.pm b/PVE/API2/AccessControl.pm
> > index fd27786..3155d67 100644
> > --- a/PVE/API2/AccessControl.pm
> > +++ b/PVE/API2/AccessControl.pm
> > @@ -226,6 +226,28 @@ __PACKAGE__->register_method ({
> >   returns => { type => "null" },
> >   code => sub { return undef; }});
> >   
> > +sub lookup_username {
> > +my ($username) = @_;
> > +
> > +$username =~ /@(.+)/;
> 
> i do not know if you saw my last mail, but we have to do a
> better regex here, since the username can contain an '@'
> 
> so foo@bar@pve is a valid username (:-S)
> and the realm here would parse to 'bar@pve'
> 
> so a better regex would be
> /@([^@]+)$/
> 
> > +
> > +my $realm = $1;
> > +my $domain_cfg = cfs_read_file("domains.cfg");
> > +my $casesensitive = $domain_cfg->{ids}->{$realm}->{'case-sensitive'} 
> > // 1;
> > +my $usercfg = cfs_read_file('user.cfg');
> > +
> > +if (!$casesensitive) {
> > +   my @matches = grep { lc $username eq lc $_ } (keys 
> > %{$usercfg->{users}});
> > +
> > +   die "ambiguous case insensitive match of username '$username', cannot 
> > safely grant access!\n"
> > +   if scalar @matches > 1;
> > +
> > +   return $matches[0]
> > +}
> > +
> > +return $username;
> > +};
> > +
> >   __PACKAGE__->register_method ({
> >   name => 'create_ticket',
> >   path => 'ticket',
> > @@ -292,6 +314,7 @@ __PACKAGE__->register_method ({
> > my $username = $param->{username};
> > $username .= "\@$param->{realm}" if $param->{realm};
> >   
> > +   $username = lookup_username($username);
> > my $rpcenv = PVE::RPCEnvironment::get();
> >   
> > my $res;
> > diff --git a/PVE/Auth/AD.pm b/PVE/Auth/AD.pm
> > index 4d64c20..88b2098 100755
> > --- a/PVE/Auth/AD.pm
> > +++ b/PVE/Auth/AD.pm
> > @@ -94,6 +94,7 @@ sub options {
> > group_classes => { optional => 1 },
> > 'sync-defaults-options' => { optional => 1 },
> > mode => { optional => 1 },
> > +   'case-sensitive' => { optional => 1 },
> >   };
> >   }
> >   
> > diff --git a/PVE/Auth/LDAP.pm b/PVE/Auth/LDAP.pm
> > index 09b2202..97d0778 100755
> > --- a/PVE/Auth/LDAP.pm
> > +++ b/PVE/Auth/LDAP.pm
> > @@ -129,6 +129,12 @@ sub properties {
> > optional => 1,
> > default => 'ldap',
> > },
> > +'case-sensitive' => {
> > +   description => "username is case-sensitive",
> > +   type => 'boolean',
> > +   optional => 1,
> > +   default => 1,
> > +   }
> >   };
> >   }
> >   
> > @@ -159,6 +165,7 @@ sub options {
> > group_classes => { optional => 1 },
> > 'sync-defaults-options' => { optional => 1 },
> > mode => { optional => 1 },
> > +   'case-sensitive' => { optional => 1 },
> >   };
> >   }
> >   
> > 
> 
> 
> 
> ___
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [Patch v2 access-control] fix #2947 login name for the LDAP/AD realm can be case-insensitive

2020-09-07 Thread Thomas Lamprecht
On 07.09.20 10:42, Wolfgang Link wrote:
> No I missed your mail.
> Will fix it and resend it.


Please also include my proposed change from then:

On 28.08.20 14:39, Thomas Lamprecht wrote:
> And we then actually want to use this method also in the API call for adding
> new users, to ensure an admin do not accidentally adds the case-sensitive same
> user multiple times.


Also, "lookup_username" does not seem to fit the API module much, I'd rather
put it in the more general PVE::AccessControl module.



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread Alexandre DERUMIER
>>https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111
>> 
>>
>>No HA involved... 

I had already help this user some week ago

https://forum.proxmox.com/threads/proxmox-6-2-4-cluster-die-node-auto-reboot-need-help.74643/#post-333093

HA was actived at this time. (Maybe the watchdog was still running, I'm not 
sure if you disable HA from all vms if LRM disable the watchdog ?)


- Mail original -
De: "dietmar" 
À: "aderumier" 
Cc: "Proxmox VE development discussion" 
Envoyé: Lundi 7 Septembre 2020 10:18:42
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

There is a similar report in the forum: 

https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111
 

No HA involved... 


> On 09/07/2020 9:19 AM Alexandre DERUMIER  wrote: 
> 
> 
> >>Indeed, this should not happen. Do you use a spearate network for corosync? 
> 
> No, I use 2x40GB lacp link. 
> 
> >>was there high traffic on the network? 
> 
> but I'm far from saturated them. (in pps or througput), (I'm around 3-4gbps) 
> 
> 
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms) 
> 
> 
> From my understanding, watchdog-mux was still runing as the watchdog have 
> reset only after 1min and not 10s, 
> so it's like the lrm was blocked and not sending watchdog timer reset to 
> watchdog-mux. 
> 
> 
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able 
> to debug. 
> 
> 
> 
> >>What kind of maintenance was the reason for the shutdown? 
> 
> ram upgrade. (the server was running ok before shutdown, no hardware problem) 
> (I just shutdown the server, and don't have started it yet when problem 
> occur) 
> 
> 
> 
> >>Do you use the default corosync timeout values, or do you have a special 
> >>setup? 
> 
> 
> no special tuning, default values. (I don't have any retransmit since months 
> in the logs) 
> 
> >>Can you please post the full corosync config? 
> 
> (I have verified, the running version was corosync was 3.0.3 with libknet 
> 1.15) 
> 
> 
> here the config: 
> 
> " 
> logging { 
> debug: off 
> to_syslog: yes 
> } 
> 
> nodelist { 
> node { 
> name: m6kvm1 
> nodeid: 1 
> quorum_votes: 1 
> ring0_addr: m6kvm1 
> } 
> node { 
> name: m6kvm10 
> nodeid: 10 
> quorum_votes: 1 
> ring0_addr: m6kvm10 
> } 
> node { 
> name: m6kvm11 
> nodeid: 11 
> quorum_votes: 1 
> ring0_addr: m6kvm11 
> } 
> node { 
> name: m6kvm12 
> nodeid: 12 
> quorum_votes: 1 
> ring0_addr: m6kvm12 
> } 
> node { 
> name: m6kvm13 
> nodeid: 13 
> quorum_votes: 1 
> ring0_addr: m6kvm13 
> } 
> node { 
> name: m6kvm14 
> nodeid: 14 
> quorum_votes: 1 
> ring0_addr: m6kvm14 
> } 
> node { 
> name: m6kvm2 
> nodeid: 2 
> quorum_votes: 1 
> ring0_addr: m6kvm2 
> } 
> node { 
> name: m6kvm3 
> nodeid: 3 
> quorum_votes: 1 
> ring0_addr: m6kvm3 
> } 
> node { 
> name: m6kvm4 
> nodeid: 4 
> quorum_votes: 1 
> ring0_addr: m6kvm4 
> } 
> node { 
> name: m6kvm5 
> nodeid: 5 
> quorum_votes: 1 
> ring0_addr: m6kvm5 
> } 
> node { 
> name: m6kvm6 
> nodeid: 6 
> quorum_votes: 1 
> ring0_addr: m6kvm6 
> } 
> node { 
> name: m6kvm7 
> nodeid: 7 
> quorum_votes: 1 
> ring0_addr: m6kvm7 
> } 
> 
> node { 
> name: m6kvm8 
> nodeid: 8 
> quorum_votes: 1 
> ring0_addr: m6kvm8 
> } 
> node { 
> name: m6kvm9 
> nodeid: 9 
> quorum_votes: 1 
> ring0_addr: m6kvm9 
> } 
> } 
> 
> quorum { 
> provider: corosync_votequorum 
> } 
> 
> totem { 
> cluster_name: m6kvm 
> config_version: 19 
> interface { 
> bindnetaddr: 10.3.94.89 
> ringnumber: 0 
> } 
> ip_version: ipv4 
> secauth: on 
> transport: knet 
> version: 2 
> } 
> 
> 
> 
> - Mail original - 
> De: "dietmar"  
> À: "aderumier" , "Proxmox VE development discussion" 
>  
> Cc: "pve-devel"  
> Envoyé: Dimanche 6 Septembre 2020 14:14:06 
> Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean 
> shutdown 
> 
> > Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) 
> > Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) 
> 
> Indeed, this should not happen. Do you use a spearate network for corosync? 
> Or 
> was there high traffic on the network? What kind of maintenance was the 
> reason 
> for the shutdown? 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH 0/7] Handle guest shutdown during backups

2020-09-07 Thread Thomas Lamprecht
On 03.09.20 10:58, Stefan Reiter wrote:
> Use QEMU's -no-shutdown argument so the QEMU instance stays alive even if the
> guest shuts down. This allows running backups to continue.
> 
> To handle cleanup of QEMU processes, this series extends the qmeventd to 
> handle
> SHUTDOWN events not just for detecting guest triggered shutdowns, but also to
> clean the QEMU process via SIGTERM (which quits it even with -no-shutdown
> enabled).
> 
> A VZDump instance can then signal qmeventd (via the /var/run/qmeventd.sock) to
> keep alive certain VM processes if they're backing up, and once the backup is
> done, they close their connection to the socket, and qmeventd knows that it 
> can
> now safely kill the VM (as long as the guest hasn't booted again, which is
> possible with some changes to the vm_start code also done in this series).
> 
> This series requires a lot of testing, since there can be quite a few edge 
> cases
> lounging around. So far it's been doing well for me, aside from the VNC GUI
> looking a bit confused when you do the 'shutdown during backup' motion (i.e. 
> the
> last image from the framebuffer stays in the VNC window, looks more like the
> guest has crashed than shut down) - but I haven't found a solution for that.
> 
> 

@Dominik, I'd like a review from you on this series, no pressure though. :)


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread Alexandre DERUMIER
Looking at theses logs:

Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: lost lock 'ha_manager_lock - cfs lock 
update failed - Permission denied
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: lost lock 'ha_agent_m6kvm7_lock - cfs 
lock update failed - Permission denied

in PVE/HA/Env/PVE2.pm
"
my $ctime = time();
my $last_lock_time = $last->{lock_time} // 0;
my $last_got_lock = $last->{got_lock};

my $retry_timeout = 120; # hardcoded lock lifetime limit from pmxcfs

eval {

mkdir $lockdir;

# pve cluster filesystem not online
die "can't create '$lockdir' (pmxcfs not mounted?)\n" if ! -d $lockdir;

if (($ctime - $last_lock_time) < $retry_timeout) {
# try cfs lock update request (utime)
if (utime(0, $ctime, $filename))  {
$got_lock = 1;
return;
}
die "cfs lock update failed - $!\n";
}
"


If the retry_timeout is = 120, could it explain why I don't have log on others 
node, if the watchdog trigger after 60s ?

I don't known too much how locks are working in pmxcfs, but when a corosync 
member leave or join, and a new cluster memership is formed,
could we have some lock lost or hang ?



- Mail original -
De: "aderumier" 
À: "dietmar" 
Cc: "Proxmox VE development discussion" 
Envoyé: Lundi 7 Septembre 2020 11:32:13
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

>>https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111
>> 
>> 
>>No HA involved... 

I had already help this user some week ago 

https://forum.proxmox.com/threads/proxmox-6-2-4-cluster-die-node-auto-reboot-need-help.74643/#post-333093
 

HA was actived at this time. (Maybe the watchdog was still running, I'm not 
sure if you disable HA from all vms if LRM disable the watchdog ?) 


- Mail original - 
De: "dietmar"  
À: "aderumier"  
Cc: "Proxmox VE development discussion"  
Envoyé: Lundi 7 Septembre 2020 10:18:42 
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown 

There is a similar report in the forum: 

https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111
 

No HA involved... 


> On 09/07/2020 9:19 AM Alexandre DERUMIER  wrote: 
> 
> 
> >>Indeed, this should not happen. Do you use a spearate network for corosync? 
> 
> No, I use 2x40GB lacp link. 
> 
> >>was there high traffic on the network? 
> 
> but I'm far from saturated them. (in pps or througput), (I'm around 3-4gbps) 
> 
> 
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms) 
> 
> 
> From my understanding, watchdog-mux was still runing as the watchdog have 
> reset only after 1min and not 10s, 
> so it's like the lrm was blocked and not sending watchdog timer reset to 
> watchdog-mux. 
> 
> 
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able 
> to debug. 
> 
> 
> 
> >>What kind of maintenance was the reason for the shutdown? 
> 
> ram upgrade. (the server was running ok before shutdown, no hardware problem) 
> (I just shutdown the server, and don't have started it yet when problem 
> occur) 
> 
> 
> 
> >>Do you use the default corosync timeout values, or do you have a special 
> >>setup? 
> 
> 
> no special tuning, default values. (I don't have any retransmit since months 
> in the logs) 
> 
> >>Can you please post the full corosync config? 
> 
> (I have verified, the running version was corosync was 3.0.3 with libknet 
> 1.15) 
> 
> 
> here the config: 
> 
> " 
> logging { 
> debug: off 
> to_syslog: yes 
> } 
> 
> nodelist { 
> node { 
> name: m6kvm1 
> nodeid: 1 
> quorum_votes: 1 
> ring0_addr: m6kvm1 
> } 
> node { 
> name: m6kvm10 
> nodeid: 10 
> quorum_votes: 1 
> ring0_addr: m6kvm10 
> } 
> node { 
> name: m6kvm11 
> nodeid: 11 
> quorum_votes: 1 
> ring0_addr: m6kvm11 
> } 
> node { 
> name: m6kvm12 
> nodeid: 12 
> quorum_votes: 1 
> ring0_addr: m6kvm12 
> } 
> node { 
> name: m6kvm13 
> nodeid: 13 
> quorum_votes: 1 
> ring0_addr: m6kvm13 
> } 
> node { 
> name: m6kvm14 
> nodeid: 14 
> quorum_votes: 1 
> ring0_addr: m6kvm14 
> } 
> node { 
> name: m6kvm2 
> nodeid: 2 
> quorum_votes: 1 
> ring0_addr: m6kvm2 
> } 
> node { 
> name: m6kvm3 
> nodeid: 3 
> quorum_votes: 1 
> ring0_addr: m6kvm3 
> } 
> node { 
> name: m6kvm4 
> nodeid: 4 
> quorum_votes: 1 
> ring0_addr: m6kvm4 
> } 
> node { 
> name: m6kvm5 
> nodeid: 5 
> quorum_votes: 1 
> ring0_addr: m6kvm5 
> } 
> node { 
> name: m6kvm6 
> nodeid: 6 
> quorum_votes: 1 
> ring0_addr: m6kvm6 
> } 
> node { 
> name: m6kvm7 
> nodeid: 7 
> quorum_votes: 1 
> ring0_addr: m6kvm7 
> } 
> 
> node { 
> name: m6kvm8 
> nodeid: 8 
> quorum_votes: 1 
> ring0_addr: m6kvm8 
> } 
> node { 
> name: m6kvm9 
> nodeid: 9 
> quorum_votes: 1 
> ring0_addr: m6kvm9 
> } 
> } 
> 
> quorum { 
> provider: corosync_votequorum 
> } 
> 
> totem { 
> cluster_name: m6kvm 
> config_v

Re: [pve-devel] Telegraf added in-built Proxmox support - thoughts versus our external metric support?

2020-09-07 Thread Alexandre DERUMIER
Hi,

>>Anyway, I do not think that we should drop our direct plugins (yet), some
>>people like me, are happy feeding directly to InfluxDB without anything
>>in-between.

me too ;)


I'm not sure, but I think than we send more metrics to influxdb, than we stream 
through the cluster. 

and the plugin does seem to use /cluster/resources api, but seem to do 1 api 
request by vm.
(So with a lot of vm, maybe it'll flood the api)


- Mail original -
De: "Thomas Lamprecht" 
À: "Proxmox VE development discussion" , "Victor 
Hooi" 
Envoyé: Lundi 7 Septembre 2020 08:57:31
Objet: Re: [pve-devel] Telegraf added in-built Proxmox support - thoughts 
versus our external metric support?

Hi, 

On 07.09.20 03:42, Victor Hooi wrote: 
> I know that Proxmox has it's own inbuilt InfluxDB client: 
> 
> https://pve.proxmox.com/wiki/External_Metric_Server 
> 
> However, Telegraf recently added first-party support for Proxmox: 
> 
> https://github.com/influxdata/telegraf/tree/master/plugins/inputs/proxmox 

great! 

> Telegraf lets you output to InfluxDB, Graphite, Prometheus, as well as a 
> bunch of others (Telegraf in-built output clients 
> ) 
> 
> What do you think of using the above and contributing to that, instead of 
> maintaining our own Proxmox InfluxDB support? 
> 
> Or are there advantages to maintaining our own code here? 

The pvestatd, which queries statistics periodically, does also the sending 
of said statistics without extra overhead. API request may get, at least 
partially, up to date information with an extra overhead, e.g., if storage 
stats are to be queried too. That could be addressed by providing a pvestatd 
fed cache in /run (fast memory tmpfs) or so and provide access to that over 
the API. 

Anyway, I do not think that we should drop our direct plugins (yet), some 
people like me, are happy feeding directly to InfluxDB without anything 
in-between. 

But, we definitively want to mention this in the documentation and see how 
we can improve integration. 

cheers, 
Thomas 



___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] applied: [PATCH pve-docs] faq & pct: Improve explanation of containers

2020-09-07 Thread Thomas Lamprecht
On 27.08.20 10:16, Dylan Whyte wrote:
> This adds more clarity to the explanation of containers and to
> the different terms we use to refer to containers, in both the FAQ and
> the introduction section of pct.
> 
> It also contains minor grammar fixes and rewording where appropriate.
> 
> Signed-off-by: Dylan Whyte 
> ---
>  pct.adoc | 25 +++--
>  pve-faq.adoc | 47 +--
>  2 files changed, 40 insertions(+), 32 deletions(-)
> 
>

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [RFC container] Improve feedback for startup

2020-09-07 Thread Thomas Lamprecht
On 27.08.20 10:44, Wolfgang Bumiller wrote:
> On Thu, Aug 20, 2020 at 11:36:39AM +0200, Thomas Lamprecht wrote:
>> On 19.08.20 12:30, Fabian Ebner wrote:
>>> Since it was necessary to switch to 'Type=Simple' in the systemd
>>> service (see 545d6f0a13ac2bf3a8d3f224c19c0e0def12116d ),
>>> 'systemctl start pve-container@ID' would not wait for the 'lxc-start'
>>> command anymore. Thus every container start was reported as a success
>>> and the 'post-start' hook would trigger immediately after the
>>> 'systemctl start' command.
>>>
>>> Use 'lxc-monitor' to get the necessary information and detect
>>> startup failure and only run the 'post-start' hookscript after
>>> the container is effectively running. If something goes wrong
>>> with the monitor, fall back to the old behavior.
>>>
>>> Signed-off-by: Fabian Ebner 
>>> ---
>>>  src/PVE/LXC.pm | 36 +++-
>>>  1 file changed, 35 insertions(+), 1 deletion(-)
>>>
>>
>> appreciate the effort!
>> We could also directly connect to /run/lxc/var/lib/lxc/monitor-fifo (or the 
>> abstract
>> unix socket, but not much gained/difference here) of the lxc-monitord which 
>> publishes
>> all state changes and unpack the new state [0] directly.
>>
>> [0] 
>> https://github.com/lxc/lxc/blob/8bdacc22a48f9c09902a1d2febd71439cb38c082/src/lxc/state.h#L10
>>
>> @Wolfgang, what do you think?
> 
> Just tested adding a state client to our Command.pm directly, seems to
> work, so we would depend neither on lxc-monitor nor lxc-monitord.
> 
> Example & code follow below. The only issue with it is that we'd need to
> retry connecting to the command socket a few times since we don't know
> when it becomes available, but that shouldn't be too bad IMO.
> 
> [..snip..]

With below I never get the initial stopped -> running edge, though.
I can monitor the CT getting stopped, but not the other way around.
Adding extra code to check if the CTs running to abort the recv would
feel like this missing the point a bit...

> 
> Usage example:
> 
> use PVE::LXC::Command;
> 
> my $sock = PVE::LXC::Command::get_state_client(404);
> die "not running\n" if !defined($sock);
> 
> while (1) {
> my ($type, $name, $value) = 
> PVE::LXC::Command::read_lxc_message($sock);
> last if !defined($type);
> print("$name: $type => $value\n");
> }
> 
> Patch for Command.pm:
> 
> ---8<---
> From 6ac578ef889a3a9c8aefc4f05215b4ec66049546 Mon Sep 17 00:00:00 2001
> From: Wolfgang Bumiller 
> Date: Thu, 27 Aug 2020 10:31:06 +0200
> Subject: [PATCH container] command: add state client functions
> 
> Signed-off-by: Wolfgang Bumiller 
> ---
>  src/PVE/LXC/Command.pm | 91 ++
>  1 file changed, 91 insertions(+)
> 
> diff --git a/src/PVE/LXC/Command.pm b/src/PVE/LXC/Command.pm
> index beed890..6df767d 100644
> --- a/src/PVE/LXC/Command.pm
> +++ b/src/PVE/LXC/Command.pm
> @@ -11,20 +11,36 @@ use warnings;
>  
>  use IO::Socket::UNIX;
>  use Socket qw(SOCK_STREAM SOL_SOCKET SO_PASSCRED);
> +use POSIX qw(NAME_MAX);
>  
>  use base 'Exporter';
>  
>  use constant {
> +LXC_CMD_GET_STATE => 3,
>  LXC_CMD_GET_CGROUP => 6,
> +LXC_CMD_ADD_STATE_CLIENT => 10,
>  LXC_CMD_FREEZE => 15,
>  LXC_CMD_UNFREEZE => 16,
>  LXC_CMD_GET_LIMITING_CGROUP => 19,
>  };
>  
> +use constant {
> +STATE_STOPPED => 0,
> +STATE_STARTING => 1,
> +STATE_RUNNING => 2,
> +STATE_STOPPING => 3,
> +STATE_ABORTING => 4,
> +STATE_FREEZING => 5,
> +STATE_FROZEN => 6,
> +STATE_THAWED => 7,
> +MAX_STATE => 8,
> +};
> +
>  our @EXPORT_OK = qw(
>  raw_command_transaction
>  simple_command
>  get_cgroup_path
> +get_state_client
>  );
>  
>  # Get the command socket for a container.
> @@ -81,6 +97,33 @@ my sub _unpack_lxc_cmd_rsp($) {
>  return ($ret, $len);
>  }
>  
> +my $LXC_MSG_SIZE = length(pack('I! Z'.(NAME_MAX+1).' x![I] I', 0, "", 0));
> +# Unpack an lxc_msg struct.
> +my sub _unpack_lxc_msg($) {
> +my ($packet) = @_;
> +
> +# struct lxc_msg {
> +# lxc_msg_type_t type;
> +# char name[NAME_MAX+1];
> +# int value;
> +# };
> +
> +my ($type, $name, $value) = unpack('I!Z'.(NAME_MAX+1).'I!', $packet);
> +
> +if ($type == 0) {
> + $type = 'STATE';
> +} elsif ($type == 1) {
> + $type = 'PRIORITY';
> +} elsif ($type == 2) {
> + $type = 'EXITCODE';
> +} else {
> + warn "unsupported lxc message type $type received\n";
> + $type = undef;
> +}
> +
> +return ($type, $name, $value);
> +}
> +
>  # Send a complete packet:
>  my sub _do_send($$) {
>  my ($sock, $data) = @_;
> @@ -206,4 +249,52 @@ sub unfreeze($$) {
>  return $res;
>  }
>  
> +# Add this command socket as a state client.
> +#
> +# Currently all states are observed.
> +#
> +# Returns undef if the container is not running, dies on errors.
> +sub get_state_client($) {
> +my ($vmid) = @_;
> +
> +my $socket = _get_command_socket($vmid)
> +  

Re: [pve-devel] [PATCH v2 pve-container] POC : add/del/update ip from vnet-subnet-ipam

2020-09-07 Thread Thomas Lamprecht
On 24.08.20 18:49, Alexandre Derumier wrote:
> This is a POC to call ip to retreive ip address from ipam.
> 
> (it's really just a poc && buggt , it need to be improve for vnet changes, 
> pending config apply/revert,...)

When trying this I got the gateway IP returned for both, as CT IP and gateway 
IP.
Did not checked this patch closer, but I figured that this behavior is caused by
the SDN code.

Using a simple zone with PVE IPam and snat subnet "10.12.13.0/24" with GW 
"10.12.13.1"
as test.

On another node, do you think it makes sense to have vnets, subnets, IPam, DNS 
completely
split and separated from each other? I mean, it is flexible, but a user needs 
to do a lot
of, almost boilerplate-like, work to get this started.
Advanced users may profit from this, maybe we just need a "simple wizard" for 
the easiest
beginner case..



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v2 pve-container] POC : add/del/update ip from vnet-subnet-ipam

2020-09-07 Thread Alexandre DERUMIER
>>When trying this I got the gateway IP returned for both, as CT IP and gateway 
>>IP. 
>>Did not checked this patch closer, but I figured that this behavior is caused 
>>by 
>>the SDN code. 

mmm, that's strange. 

When you create or update the subnet, the gateway ip you define on the subnet 
should be registered in the ipam.
(you have enable an ipam right ?)


Then, when you create CT, without any ip, it'll try to find first available ip 
in ipam.
(So if the gateway was not registered in ipam (bug maybe), that could explain 
why you have it both).

for internal ipam, i'm writing ipam database in /etc/pve/priv/ipam.db. (BTW,I'm 
not sure that it's the best path location)





>>On another node, do you think it makes sense to have vnets, subnets, IPam, 
>>DNS completely 
>>split and separated from each other? I mean, it is flexible, but a user needs 
>>to do a lot 
>>of, almost boilerplate-like, work to get this started. 
>>Advanced users may profit from this, maybe we just need a "simple wizard" for 
>>the easiest 
>>beginner case.. 

Well for subnet, you can assign multiple subnets by vnet, so yes, it's really 
need to by separated.
(Somebody at hertzner for example, buying subnets or /32 failovers ips, and 
want to add them to a vnet)
IPAM/DNS, are more reusable configurations. (like api url,key,). So I think 
you'll define 1 or 2 of them max.

I think subnet+ipam+dns are ip features.
zones,vnets,controller are physical network features


But, yes, a gui wizard could be great for fast setup. 


- Mail original -
De: "Thomas Lamprecht" 
À: "Proxmox VE development discussion" , 
"aderumier" 
Envoyé: Lundi 7 Septembre 2020 18:40:39
Objet: Re: [pve-devel] [PATCH v2 pve-container] POC : add/del/update ip from 
vnet-subnet-ipam

On 24.08.20 18:49, Alexandre Derumier wrote: 
> This is a POC to call ip to retreive ip address from ipam. 
> 
> (it's really just a poc && buggt , it need to be improve for vnet changes, 
> pending config apply/revert,...) 

When trying this I got the gateway IP returned for both, as CT IP and gateway 
IP. 
Did not checked this patch closer, but I figured that this behavior is caused 
by 
the SDN code. 

Using a simple zone with PVE IPam and snat subnet "10.12.13.0/24" with GW 
"10.12.13.1" 
as test. 

On another node, do you think it makes sense to have vnets, subnets, IPam, DNS 
completely 
split and separated from each other? I mean, it is flexible, but a user needs 
to do a lot 
of, almost boilerplate-like, work to get this started. 
Advanced users may profit from this, maybe we just need a "simple wizard" for 
the easiest 
beginner case.. 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-07 Thread dietmar
> I don't known too much how locks are working in pmxcfs, but when a corosync 
> member leave or join, and a new cluster memership is formed, could we have 
> some lock lost or hang ?

It would really help if we can reproduce the bug somehow. Do you have and idea 
how
to trigger the bug?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel