Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata

2020-09-09 Thread proxmox
But this does not change the MTU inside the VM right?



-Ursprüngliche Nachricht-
Von: Alexandre DERUMIER 
Gesendet: Montag 7 September 2020 09:34
An: Proxmox VE development discussion 
Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu 
and userdata


Hi,

not related to cloudinit, but for virtio-net nic, it's already possible to add 
"mtu=xxx" option to netX:.

It's not yet available in gui, but you should be able to do it with "qm set 
  --net0  ...,mtu="



- Mail original -
De: "proxmox" 
À: "Proxmox VE development discussion" 
Envoyé: Vendredi 4 Septembre 2020 17:21:24
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata

Hello 



I didn't know this patch mail got approved, so sorry for the (very) late 
response. 



My intention for not going with snippets was the fact that they could not be 
created via the API and one would have to manually create a file on the target 
machine for cloud-init userdata. 



One possible use case was to spin up a kubernetes cluster on proxmox only via 
API. 



I wanted to have something similar to the hetzner cloud API where the full 
userdata can be submitted for VM provisioning: 
https://docs.hetzner.cloud/#servers-create-a-server 



So going further here you want me to submit the MTU patches separately? 



Should I integrate userdata into the cicustom field? I thought this would make 
things more complex in favor of parsing out the base64 stuff. So I would still 
go with an extra field. 

Thoughts? 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata

2020-09-09 Thread Alexandre DERUMIER
>>But this does not change the MTU inside the VM right?

yes, it change the mtu inside the vm!

(at least on recent kernel, don't remember when this had been added)

- Mail original -
De: "proxmox" 
À: "Proxmox VE development discussion" 
Envoyé: Mercredi 9 Septembre 2020 11:06:13
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata

But this does not change the MTU inside the VM right? 



-Ursprüngliche Nachricht- 
Von: Alexandre DERUMIER  
Gesendet: Montag 7 September 2020 09:34 
An: Proxmox VE development discussion  
Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu 
and userdata 


Hi, 

not related to cloudinit, but for virtio-net nic, it's already possible to add 
"mtu=xxx" option to netX:. 

It's not yet available in gui, but you should be able to do it with "qm set 
 --net0 ...,mtu=" 



- Mail original - 
De: "proxmox"  
À: "Proxmox VE development discussion"  
Envoyé: Vendredi 4 Septembre 2020 17:21:24 
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata 

Hello 



I didn't know this patch mail got approved, so sorry for the (very) late 
response. 



My intention for not going with snippets was the fact that they could not be 
created via the API and one would have to manually create a file on the target 
machine for cloud-init userdata. 



One possible use case was to spin up a kubernetes cluster on proxmox only via 
API. 



I wanted to have something similar to the hetzner cloud API where the full 
userdata can be submitted for VM provisioning: 
https://docs.hetzner.cloud/#servers-create-a-server 



So going further here you want me to submit the MTU patches separately? 



Should I integrate userdata into the cicustom field? I thought this would make 
things more complex in favor of parsing out the base64 stuff. So I would still 
go with an extra field. 

Thoughts? 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata

2020-09-09 Thread Alexandre DERUMIER
it has been added to kernel in 2016

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.8.7&id=14de9d114a82a564b94388c95af79a701dc93134

- Mail original -
De: "aderumier" 
À: "Proxmox VE development discussion" 
Envoyé: Mercredi 9 Septembre 2020 15:02:33
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata

>>But this does not change the MTU inside the VM right? 

yes, it change the mtu inside the vm! 

(at least on recent kernel, don't remember when this had been added) 

- Mail original - 
De: "proxmox"  
À: "Proxmox VE development discussion"  
Envoyé: Mercredi 9 Septembre 2020 11:06:13 
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata 

But this does not change the MTU inside the VM right? 



-Ursprüngliche Nachricht- 
Von: Alexandre DERUMIER  
Gesendet: Montag 7 September 2020 09:34 
An: Proxmox VE development discussion  
Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu 
and userdata 


Hi, 

not related to cloudinit, but for virtio-net nic, it's already possible to add 
"mtu=xxx" option to netX:. 

It's not yet available in gui, but you should be able to do it with "qm set 
 --net0 ...,mtu=" 



- Mail original - 
De: "proxmox"  
À: "Proxmox VE development discussion"  
Envoyé: Vendredi 4 Septembre 2020 17:21:24 
Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and 
userdata 

Hello 



I didn't know this patch mail got approved, so sorry for the (very) late 
response. 



My intention for not going with snippets was the fact that they could not be 
created via the API and one would have to manually create a file on the target 
machine for cloud-init userdata. 



One possible use case was to spin up a kubernetes cluster on proxmox only via 
API. 



I wanted to have something similar to the hetzner cloud API where the full 
userdata can be submitted for VM provisioning: 
https://docs.hetzner.cloud/#servers-create-a-server 



So going further here you want me to submit the MTU patches separately? 



Should I integrate userdata into the cicustom field? I thought this would make 
things more complex in favor of parsing out the base64 stuff. So I would still 
go with an extra field. 

Thoughts? 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata

2020-09-09 Thread Mira Limbeck

Hi,

On 9/4/20 5:21 PM, proxmox wrote:

Hello



I didn't know this patch mail got approved, so sorry for the (very) late 
response.



My intention for not going with snippets was the fact that they could not be 
created via the API and one would have to manually create a file on the target 
machine for cloud-init userdata.


There is currently a bug report open for this: 
https://bugzilla.proxmox.com/show_bug.cgi?id=2208


This together with backup and migration of snippets could be useful.





One possible use case was to spin up a kubernetes cluster on proxmox only via 
API.



I wanted to have something similar to the hetzner cloud API where the full 
userdata can be submitted for VM provisioning:
https://docs.hetzner.cloud/#servers-create-a-server



So going further here you want me to submit the MTU patches separately?

The MTU patches separately would be great.




Should I integrate userdata into the cicustom field? I thought this would make 
things more complex in favor of parsing out the base64 stuff. So I would still 
go with an extra field.


After some discussions we think that putting the userdata in the config 
file is not the right approach. As the cluster filesystem is limited to 
32M and a single VM config file is limited to 512K you can easily run 
into the limit with a small number of VMs. We would most likely have to 
limit the userdata per config to a very small amount (1K?). But with 
those limits it is difficult to get everything in.




Thoughts?
___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel





___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH v2 firewall] introduce new icmp-type parameter

2020-09-09 Thread Thomas Lamprecht
On 29.05.20 14:22, Mira Limbeck wrote:
> Currently icmp types are handled via 'dport'. This is not documented
> anywhere except for a single line of comment in the code. To untangle
> the icmp-type handling from the dport handling a new 'icmp-type'
> parameter is introduced.
> 
> The valid 'icmp-type' values are limited to the names
> (icmp[v6]_type_names hash in the code, same as ip[6]tables provides).
> Type[/Code] values are not supported.
> 
> Support for ipv6-icmp is added to icmp-type parameter handling. This makes it
> possible to specify icmpv6 types via the GUI.
> 
> Signed-off-by: Mira Limbeck 
> ---
> v2:
>  - rebased on master
>  - removed type[/code] value support, now only names are accepted
> 
>  src/PVE/API2/Firewall/Rules.pm |  4 +++
>  src/PVE/Firewall.pm| 50 +-
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
>

applied, thanks!


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH v3 access-control] add ui capabilities endpoint

2020-09-09 Thread Thomas Lamprecht
On 06.07.20 14:45, Tim Marx wrote:
> Signed-off-by: Tim Marx 
> ---
> * no changes

Maybe we could merge this into the "/access/permissions" endpoint, maybe with a
"heurisitic" parameter?

> 
>  PVE/API2/AccessControl.pm | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/PVE/API2/AccessControl.pm b/PVE/API2/AccessControl.pm
> index fd27786..66319cc 100644
> --- a/PVE/API2/AccessControl.pm
> +++ b/PVE/API2/AccessControl.pm
> @@ -718,4 +718,33 @@ __PACKAGE__->register_method({
>   return $res;
>  }});
> 
> +__PACKAGE__->register_method({
> +name => 'uicapabilities',
> +path => 'uicapabilities',
> +method => 'GET',
> +description => 'Retrieve user interface capabilities for calling 
> user/token.',
> +permissions => {
> + description => "Each user/token is allowed to retrieve their own 
> capabilities.",
> + user => 'all',
> +},
> +parameters => {},
> +returns => {
> + type => 'object',
> + properties => {
> + cap => {
> + type => 'object',
> + description => 'The user interface capabilities of the calling 
> user/token'
> + }
> + },
> +},
> +code => sub {
> + my ($param) = @_;
> +
> + my $rpcenv = PVE::RPCEnvironment::get();
> + my $userid = $rpcenv->get_user();
> + my $res->{cap} = &$compute_api_permission($rpcenv, $userid);
> +
> + return $res;
> +}});
> +
>  1;
> --
> 2.20.1
> 
> ___
> pve-devel mailing list
> pve-de...@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 




___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] [PATCH lxc 0/2] fix apparmor rules and improve cgroupv2 experience

2020-09-09 Thread Thomas Lamprecht
On 22.07.20 13:05, Stoiko Ivanov wrote:
> This patchset addresses 2 minor inconveniences I ran into, while running my
> host with 'systemd.unified_cgroup_hierarchy=1':
> 
> * apparmor mount denies for '/proc/sys/kernel/random/boot_id' (this happens
>   irrespective of the cgroup-layout
> * having to add
>   `lxc.init.cmd: /lib/systemd/systemd systemd.unified_cgroup_hierarchy=1`
>   to all my container configs (for debian and arch containers at least
>   alpine runs without issues) - see [0] for a discussion of the topic
> 
> While investigating this I noticed that the fixes for both issues were already
> on upstream/master (with one small other fix in between) - so instead of
> cherry-picking both patches I fast-forwarded to the last needed commit.
> Glad to resend with the patches cherry-picked and added to our patchqueue.
> 
> I would probably submit the apparmor fix upstream (after a quick check by
> another set of eyes :)
> 
> [0] https://github.com/lxc/lxc/issues/3183
> 
> Stoiko Ivanov (2):
>   update lxc to include fixes for cgroupv2 setups
>   apparmor: add rule for allowing remount of boot_id
> 
>  ...apparmor-Allow-ro-remount-of-boot_id.patch | 26 +++
>  debian/patches/series |  1 +
>  lxc   |  2 +-
>  3 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 
> debian/patches/pve/0004-apparmor-Allow-ro-remount-of-boot_id.patch
> 

2/2 got merged into upstream and is available with 4.0.4, could you see
if we can seamlessly update from currently packaged 4.0.3 to 4.0.4?



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH container] setup: add kali-rolling in supported releases

2020-09-09 Thread Thomas Lamprecht
On 01.09.20 12:44, Oguz Bektas wrote:
> for our setup purposes, it's the same as bullseye since it's following a
> rolling release model.
> 
> Signed-off-by: Oguz Bektas 
> ---
>  src/PVE/LXC/Setup/Debian.pm | 1 +
>  1 file changed, 1 insertion(+)
> 
>

applied, thanks! Albeit it could make sense to have a special "sid" version
and map this then also to "sid", avoiding the need to update it again?


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied-series: [PATCH v2 container 1/2] Add module for reading state changes from monitor socket

2020-09-09 Thread Thomas Lamprecht
On 08.09.20 13:58, Fabian Ebner wrote:
> Will be used to monitor state changes on container startup.
> 
> Co-developed-by: Wolfgang Bumiller 
> Signed-off-by: Fabian Ebner 
> ---
> 
> New in v2.
> 
> I hard-coded the name of the abstract UNIX socket instead of
> trying to re-implement lxc/monitor.c's lxc_monitor_sock_name
> function. For us, is it true that the lxcpath is always '/varl/lib/lxc'?
> Otherwise this won't always work.
> 
>  src/PVE/LXC/Makefile   |  1 +
>  src/PVE/LXC/Monitor.pm | 92 ++
>  2 files changed, 93 insertions(+)
>  create mode 100644 src/PVE/LXC/Monitor.pm
> 
>

applied series, thanks! Wolfgang may improve his variant, e.g., by patching 
lxc, to avoid
a runtime dependency on the lxc monitor, but until that works out this is 
1x times
better that the current status quo.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH container 5/5] setup: heuristically warn if the FS hosting /etc is not mounted

2020-09-09 Thread Thomas Lamprecht
Check for the existence of /etc, use -e as it could also be a symlink
(and it's just a heuristic). But only do so if the expected ostype
from the config does not match the detected one, this normally
indicates that we had a "reals" distro running but detected the
fallback "unmanaged". Only warn though, as a hint for the user.

Signed-off-by: Thomas Lamprecht 
---
 src/PVE/LXC/Setup.pm | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index fb0be37..8b8fee9 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -96,8 +96,11 @@ sub new {
$type = &$autodetect_type($self, $rootdir, $os_release);
my $expected_type = $conf->{ostype} || $type;
 
-   warn "got unexpected ostype ($type != $expected_type)\n"
-   if $type ne $expected_type;
+   if ($type ne $expected_type) {
+   warn "WARNING: /etc not present in CT, is the rootfs mounted?\n"
+   if ! -e "$rootdir/etc";
+   warn "got unexpected ostype ($type != $expected_type)\n"
+   }
 }
 
 if ($type eq 'unmanaged') {
-- 
2.20.1



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH container 2/5] implement debug start

2020-09-09 Thread Thomas Lamprecht
Signed-off-by: Thomas Lamprecht 
---
 src/Makefile |  6 --
 src/PVE/API2/LXC/Status.pm   |  8 +++-
 src/PVE/LXC.pm   | 10 +++---
 src/PVE/LXC/Config.pm|  6 ++
 src/pve-container-debug@.service | 22 ++
 5 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 src/pve-container-debug@.service

diff --git a/src/Makefile b/src/Makefile
index 7166708..450a8eb 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -38,8 +38,9 @@ check: test
make -C test
 
 .PHONY: install
-install: pct lxc-pve.conf lxc-pve-prestart-hook lxc-pve-autodev-hook 
lxc-pve-poststop-hook \
-   lxcnetaddbr pct.1 pct.conf.5 pct.bash-completion pct.zsh-completion 
pve-userns.seccomp
+install: pct lxc-pve.conf pct.1 pct.conf.5 pct.bash-completion 
pct.zsh-completion \
+pve-userns.seccomp pve-container@.service pve-container-debug@.service \
+lxc-pve-prestart-hook lxc-pve-autodev-hook lxc-pve-poststop-hook 
lxcnetaddbr
PVE_GENERATING_DOCS=1 perl -I. -T -e "use PVE::CLI::pct; 
PVE::CLI::pct->verify_api();"
install -d ${SBINDIR}
install -m 0755 pct ${SBINDIR}
@@ -48,6 +49,7 @@ install: pct lxc-pve.conf lxc-pve-prestart-hook 
lxc-pve-autodev-hook lxc-pve-pos
install -m 0755 pve-container-stop-wrapper ${LXC_SCRIPT_DIR}
install -d -m0755 ${SERVICEDIR}
install -m0644 pve-container@.service ${SERVICEDIR}/
+   install -m0644 pve-container-debug@.service ${SERVICEDIR}/
install -m0644 'system-pve\x2dcontainer.slice' ${SERVICEDIR}/
install -d ${LXC_HOOK_DIR}
install -m 0755 lxc-pve-prestart-hook ${LXC_HOOK_DIR}
diff --git a/src/PVE/API2/LXC/Status.pm b/src/PVE/API2/LXC/Status.pm
index 89186ae..766c2ce 100644
--- a/src/PVE/API2/LXC/Status.pm
+++ b/src/PVE/API2/LXC/Status.pm
@@ -130,6 +130,12 @@ __PACKAGE__->register_method({
node => get_standard_option('pve-node'),
vmid => get_standard_option('pve-vmid', { completion => 
\&PVE::LXC::complete_ctid_stopped }),
skiplock => get_standard_option('skiplock'),
+   debug => {
+   optional => 1,
+   type => 'boolean',
+   description => "If set, enables very verbose debug log-level on 
start.",
+   default => 0,
+   },
},
 },
 returns => {
@@ -188,7 +194,7 @@ __PACKAGE__->register_method({
});
}
 
-   PVE::LXC::vm_start($vmid, $conf, $skiplock);
+   PVE::LXC::vm_start($vmid, $conf, $skiplock, $param->{debug});
};
 
my $lockcmd = sub {
diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index f99..b3e3581 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2218,7 +2218,7 @@ my sub monitor_start($$) {
 }
 
 sub vm_start {
-my ($vmid, $conf, $skiplock) = @_;
+my ($vmid, $conf, $skiplock, $debug) = @_;
 
 # apply pending changes while starting
 if (scalar(keys %{$conf->{pending}})) {
@@ -2247,15 +2247,19 @@ sub vm_start {
 
 unlink "/run/pve/ct-$vmid.stderr"; # systemd does not truncate log files
 
-my $base_unit = $conf->{debug} ? 'pve-container-debug' : 'pve-container';
+my $is_debug = $debug || (!defined($debug) && $conf->{debug});
+my $base_unit = $is_debug ? 'pve-container-debug' : 'pve-container';
 
-my $cmd = ['systemctl', 'start', "pve-container\@$vmid"];
+my $cmd = ['systemctl', 'start', "$base_unit\@$vmid"];
 
 PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1);
 eval {
PVE::Tools::run_command($cmd);
 
monitor_start($monitor_socket, $vmid) if defined($monitor_socket);
+
+   # if debug is requested, print the log it also when the start succeeded
+   print_ct_stderr_log($vmid) if $is_debug;
 };
 if (my $err = $@) {
unlink $skiplock_flag_fn;
diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
index 044e2e1..4cd669c 100644
--- a/src/PVE/LXC/Config.pm
+++ b/src/PVE/LXC/Config.pm
@@ -508,6 +508,12 @@ my $confdesc = {
description => 'Tags of the Container. This is only meta information.',
optional => 1,
 },
+debug => {
+   optional => 1,
+   type => 'boolean',
+   description => "Try to be more verbose. For now this only enables debug 
log-level on start.",
+   default => 0,
+},
 };
 
 my $valid_lxc_conf_keys = {
diff --git a/src/pve-container-debug@.service b/src/pve-container-debug@.service
new file mode 100644
index 000..7cfebaa
--- /dev/null
+++ b/src/pve-container-debug@.service
@@ -0,0 +1,22 @@
+# based on lxc@.service, but without an install section because
+# starting and stopping should be initiated by PVE code, not
+# systemd.
+[Unit]
+Description=PVE LXC Container: %i
+DefaultDependencies=No
+After=lxc.service
+Wants=lxc.service
+Documentation=man:lxc-start man:lxc man:pct
+
+[Service]
+Type=simple
+Delegate=yes
+KillMode=mixed
+TimeoutStopSec=120s
+ExecSta

[pve-devel] applied: [PATCH container 3/5] protected_call: remove left-over rootdir/dev mkdir

2020-09-09 Thread Thomas Lamprecht
commit 797e12e8a5df246d8afc53b045e632977cdf0088 got rid of our "just
bind-mount the root /dev to the CT temporarily for some stuff" for
good a while ago (2015), but creating the /dev directory in the CT
root was kept, from what I can tell, by mistake.

This can be a problem if, whyever, the CT rootfs is not mounted, as
we then break a future mount as we create this /dev directory inside
what would be the CTs rootfs mount point. It is then not empty
anymore and a normal mount cannot happen, failing with "directory is
not empty"

Signed-off-by: Thomas Lamprecht 
---
 src/PVE/LXC/Setup.pm | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm
index d424aaa..fb0be37 100644
--- a/src/PVE/LXC/Setup.pm
+++ b/src/PVE/LXC/Setup.pm
@@ -134,11 +134,6 @@ sub protected_call {
 # avoid recursion:
 return $sub->() if $self->{in_chroot};
 
-my $rootdir = $self->{rootdir};
-if (!-d "$rootdir/dev" && !mkdir("$rootdir/dev")) {
-   die "failed to create temporary /dev directory: $!\n";
-}
-
 pipe(my $res_in, my $res_out) or die "pipe failed: $!\n";
 
 my $child = fork();
@@ -149,6 +144,7 @@ sub protected_call {
# avoid recursive forks
$self->{in_chroot} = 1;
eval {
+   my $rootdir = $self->{rootdir};
chroot($rootdir) or die "failed to change root to: $rootdir: $!\n";
chdir('/') or die "failed to change to root directory\n";
my $res = $sub->();
-- 
2.20.1



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH container 4/5] alpine: setup net: pass whole config to parent method

2020-09-09 Thread Thomas Lamprecht
We expected the whole $conf to be passed in a call to setup_network,
a bit ago it worked if their where only the netX keys present, for
some plugin that still is the case.
But, in the Debian version, reused by Alpine, we now check if the CT
distro version is recent enough to support (or need) the address in
CIDR format.
So, at least "ostype" needs to be passed to, else we get ugly
warnings in the syslog (or the recently added --debug log CLI switch)

Just pass the whole config, the setup_network method need to cope
with that anyway.

Signed-off-by: Thomas Lamprecht 
---
 src/PVE/LXC/Setup/Alpine.pm | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/PVE/LXC/Setup/Alpine.pm b/src/PVE/LXC/Setup/Alpine.pm
index 75d6ebe..e486971 100644
--- a/src/PVE/LXC/Setup/Alpine.pm
+++ b/src/PVE/LXC/Setup/Alpine.pm
@@ -86,21 +86,24 @@ sub setup_network {
 # at least with the workaround the networking starts and if an ipv4 is
 # configured slaac for ipv6 works (unless accept_ra = 0 in the node)
 
-my $netconf = {};
+my $newconf = {};
 my $networks = {};
 foreach my $k (keys %$conf) {
+   my $value = $conf->{$k};
+   $newconf->{$k} = $value;
next if $k !~ m/^net(\d+)$/;
-   my $netstring = $conf->{$k};
+
+   my $netstring = $value;
# check for dhcp6:
my $d = PVE::LXC::Config->parse_lxc_network($netstring);
if (defined($d->{ip6}) && ($d->{ip6} eq 'dhcp' || $d->{ip6} eq 'auto')) 
{
$d->{ip6} = 'manual';
$netstring = PVE::LXC::Config->print_lxc_network($d);
}
-   $netconf->{$k} = $netstring;
+   $newconf->{$k} = $netstring;
 }
 
-PVE::LXC::Setup::Debian::setup_network($self, $netconf);
+PVE::LXC::Setup::Debian::setup_network($self, $newconf);
 }
 
 1;
-- 
2.20.1



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



[pve-devel] applied: [PATCH container 1/5] ct start: track lxc-start stderr and print in error case

2020-09-09 Thread Thomas Lamprecht
Signed-off-by: Thomas Lamprecht 
---
 src/PVE/LXC.pm | 15 +++
 src/pve-container@.service |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index e13f7e6..f99 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2167,6 +2167,17 @@ sub userns_command {
 return [];
 }
 
+my sub print_ct_stderr_log {
+my ($vmid) = @_;
+my $log = eval { file_get_contents("/run/pve/ct-$vmid.stderr") };
+return if !$log;
+
+while ($log =~ /^\h*(lxc-start:?\s+$vmid:?\s*\S+\s*)?(.*?)\h*$/gm) {
+   my $line = $2;
+   print STDERR "$line\n";
+}
+}
+
 my sub monitor_state_change($$) {
 my ($monitor_socket, $vmid) = @_;
 die "no monitor socket\n" if !defined($monitor_socket);
@@ -2201,6 +2212,7 @@ my sub monitor_start($$) {
 if (my $err = $@) {
warn "problem with monitor socket, but continuing anyway: $err\n";
 } elsif (!$success) {
+   print_ct_stderr_log($vmid);
die "startup for container '$vmid' failed\n";
 }
 }
@@ -2233,6 +2245,9 @@ sub vm_start {
 my $monitor_socket = eval { PVE::LXC::Monitor::get_monitor_socket() };
 warn $@ if $@;
 
+unlink "/run/pve/ct-$vmid.stderr"; # systemd does not truncate log files
+
+my $base_unit = $conf->{debug} ? 'pve-container-debug' : 'pve-container';
 
 my $cmd = ['systemctl', 'start', "pve-container\@$vmid"];
 
diff --git a/src/pve-container@.service b/src/pve-container@.service
index d9185bc..fdc373e 100644
--- a/src/pve-container@.service
+++ b/src/pve-container@.service
@@ -19,4 +19,4 @@ ExecStop=/usr/share/lxc/pve-container-stop-wrapper %i
 # Environment=CONSOLETYPE=serial
 # Prevent container init from putting all its output into the journal
 StandardOutput=null
-StandardError=null
+StandardError=file:/run/pve/ct-%i.stderr
-- 
2.20.1



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-09 Thread Thomas Lamprecht
On 08.09.20 09:11, Alexandre DERUMIER wrote:
>>> It would really help if we can reproduce the bug somehow. Do you have and 
>>> idea how
>>> to trigger the bug?
> 
> I really don't known. I'm currently trying to reproduce on the same cluster, 
> with softdog && noboot=1, and rebooting node.
> 
> 
> Maybe it's related with the number of vms, or the number of nodes, don't have 
> any clue ...

I checked a bit the watchdog code, our user-space mux one and the kernel 
drivers,
and just noting a few things here (thinking out aloud):

The /dev/watchdog itself is always active, else we could  loose it to some
other program and not be able to activate HA dynamically.
But, as long as no HA service got active, it's a simple dummy "wake up every
second and do an ioctl keep-alive update".
This is really simple and efficiently written, so if that fails for over 10s
the systems is really loaded, probably barely responding to anything.

Currently the watchdog-mux runs as normal process, no re-nice, no real-time
scheduling. This is IMO wrong, as it is a critical process which needs to be
run with high priority. I've a patch here which sets it to the highest RR
realtime-scheduling priority available, effectively the same what corosync does.


diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
index 818ae00..71981d7 100644
--- a/src/watchdog-mux.c
+++ b/src/watchdog-mux.c
@@ -8,2 +8,3 @@
 #include 
+#include 
 #include 
@@ -151,2 +177,15 @@ main(void)
 
+int sched_priority = sched_get_priority_max (SCHED_RR);
+if (sched_priority != -1) {
+struct sched_param global_sched_param;
+global_sched_param.sched_priority = sched_priority;
+int res = sched_setscheduler (0, SCHED_RR, &global_sched_param);
+if (res == -1) {
+fprintf(stderr, "Could not set SCHED_RR at priority %d\n", 
sched_priority);
+} else {
+fprintf(stderr, "set SCHED_RR at priority %d\n", sched_priority);
+}
+}
+
+
 if ((watchdog_fd = open(WATCHDOG_DEV, O_WRONLY)) == -1) {

The issue with no HA but watchdog reset due to massively overloaded system
should be avoided already a lot with the scheduling change alone.

Interesting, IMO, is that lots of nodes rebooted at the same time, with no HA 
active.
This *could* come from a side-effect like ceph rebalacing kicking off and 
producing
a load spike for >10s, hindering the scheduling of the watchdog-mux.
This is a theory, but with HA off it needs to be something like that, as in 
HA-off
case there's *no* direct or indirect connection between corosync/pmxcfs and the
watchdog-mux. It simply does not cares, or notices, quorum partition changes at 
all.


There may be a approach to reserve the watchdog for the mux, but avoid having it
as "ticking time bomb":
Theoretically one could open it, then disable it with an ioctl (it can be 
queried
if a driver support that) and only enable it for real once the first client 
connects
to the MUX. This may not work for all watchdog modules, and if, we may want to 
make
it configurable, as some people actually want a reset if a (future) real-time 
process
cannot be scheduled for >= 10 seconds.

With HA active, well then there could be something off, either in corosync/knet 
or
also in how we interface with it in pmxcfs, that could well be, but won't 
explain the
non-HA issues.

Speaking of pmxcfs, that one runs also with standard priority, we may want to 
change
that too to a RT scheduler, so that its ensured it can process all corosync 
events.

I have also a few other small watchdog mux patches around, it should nowadays 
actually
be able to tell us why a reset happened (can also be over/under voltage, 
temperature,
...) and I'll repeat doing the ioctl for keep-alive a few times if it fails, 
can only
win with that after all.



___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

2020-09-09 Thread Alexandre DERUMIER
Thanks Thomas for the investigations.

I'm still trying to reproduce...
I think I have some special case here, because the user of the forum with 30 
nodes had corosync cluster split. (Note that I had this bug 6 months ago,when 
shuting down a node too, and the only way was stop full stop corosync on all 
nodes, and start corosync again on all nodes).


But this time, corosync logs looks fine. (every node, correctly see node2 down, 
and see remaning nodes)

surviving node7, was the only node with HA, and LRM didn't have enable watchog 
(I don't have found any log like "pve-ha-lrm: watchdog active" for the last 
6months on this nodes


So, the timing was:

10:39:05 : "halt" command is send to node2
10:39:16 : node2 is leaving corosync / halt  -> every node is seeing it and 
correctly do a new membership with 13 remaining nodes

...don't see any special logs (corosync,pmxcfs,pve-ha-crm,pve-ha-lrm) after the 
node2 leaving.
But they are still activity on the server, pve-firewall is still logging, vms 
are running fine


between 10:40:25 - 10:40:34 : watchdog reset nodes, but not node7.

-> so between 70s-80s after the node2 was done, so I think that watchdog-mux 
was still running fine until that.
   (That's sound like lrm was stuck, and client_watchdog_timeout have expired 
in watchdog-mux) 



10:40:41 node7, loose quorum (as all others nodes have reset),



10:40:50: node7 crm/lrm finally log.

Sep  3 10:40:50 m6kvm7 pve-ha-crm[16196]: got unexpected error - error during 
cfs-locked 'domain-ha' operation: no quorum!
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: lost lock 'ha_manager_lock - cfs lock 
update failed - Permission denied
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: lost lock 'ha_agent_m6kvm7_lock - cfs 
lock update failed - Permission denied



So, I really think that something have stucked lrm/crm loop, and watchdog was 
not resetted because of that.





- Mail original -
De: "Thomas Lamprecht" 
À: "Proxmox VE development discussion" , 
"aderumier" , "dietmar" 
Envoyé: Mercredi 9 Septembre 2020 22:05:49
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 08.09.20 09:11, Alexandre DERUMIER wrote: 
>>> It would really help if we can reproduce the bug somehow. Do you have and 
>>> idea how 
>>> to trigger the bug? 
> 
> I really don't known. I'm currently trying to reproduce on the same cluster, 
> with softdog && noboot=1, and rebooting node. 
> 
> 
> Maybe it's related with the number of vms, or the number of nodes, don't have 
> any clue ... 

I checked a bit the watchdog code, our user-space mux one and the kernel 
drivers, 
and just noting a few things here (thinking out aloud): 

The /dev/watchdog itself is always active, else we could loose it to some 
other program and not be able to activate HA dynamically. 
But, as long as no HA service got active, it's a simple dummy "wake up every 
second and do an ioctl keep-alive update". 
This is really simple and efficiently written, so if that fails for over 10s 
the systems is really loaded, probably barely responding to anything. 

Currently the watchdog-mux runs as normal process, no re-nice, no real-time 
scheduling. This is IMO wrong, as it is a critical process which needs to be 
run with high priority. I've a patch here which sets it to the highest RR 
realtime-scheduling priority available, effectively the same what corosync 
does. 


diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c 
index 818ae00..71981d7 100644 
--- a/src/watchdog-mux.c 
+++ b/src/watchdog-mux.c 
@@ -8,2 +8,3 @@ 
#include  
+#include  
#include  
@@ -151,2 +177,15 @@ main(void) 

+ int sched_priority = sched_get_priority_max (SCHED_RR); 
+ if (sched_priority != -1) { 
+ struct sched_param global_sched_param; 
+ global_sched_param.sched_priority = sched_priority; 
+ int res = sched_setscheduler (0, SCHED_RR, &global_sched_param); 
+ if (res == -1) { 
+ fprintf(stderr, "Could not set SCHED_RR at priority %d\n", sched_priority); 
+ } else { 
+ fprintf(stderr, "set SCHED_RR at priority %d\n", sched_priority); 
+ } 
+ } 
+ 
+ 
if ((watchdog_fd = open(WATCHDOG_DEV, O_WRONLY)) == -1) { 

The issue with no HA but watchdog reset due to massively overloaded system 
should be avoided already a lot with the scheduling change alone. 

Interesting, IMO, is that lots of nodes rebooted at the same time, with no HA 
active. 
This *could* come from a side-effect like ceph rebalacing kicking off and 
producing 
a load spike for >10s, hindering the scheduling of the watchdog-mux. 
This is a theory, but with HA off it needs to be something like that, as in 
HA-off 
case there's *no* direct or indirect connection between corosync/pmxcfs and the 
watchdog-mux. It simply does not cares, or notices, quorum partition changes at 
all. 


There may be a approach to reserve the watchdog for the mux, but avoid having