Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata
But this does not change the MTU inside the VM right? -Ursprüngliche Nachricht- Von: Alexandre DERUMIER Gesendet: Montag 7 September 2020 09:34 An: Proxmox VE development discussion Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hi, not related to cloudinit, but for virtio-net nic, it's already possible to add "mtu=xxx" option to netX:. It's not yet available in gui, but you should be able to do it with "qm set --net0 ...,mtu=" - Mail original - De: "proxmox" À: "Proxmox VE development discussion" Envoyé: Vendredi 4 Septembre 2020 17:21:24 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hello I didn't know this patch mail got approved, so sorry for the (very) late response. My intention for not going with snippets was the fact that they could not be created via the API and one would have to manually create a file on the target machine for cloud-init userdata. One possible use case was to spin up a kubernetes cluster on proxmox only via API. I wanted to have something similar to the hetzner cloud API where the full userdata can be submitted for VM provisioning: https://docs.hetzner.cloud/#servers-create-a-server So going further here you want me to submit the MTU patches separately? Should I integrate userdata into the cicustom field? I thought this would make things more complex in favor of parsing out the base64 stuff. So I would still go with an extra field. Thoughts? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata
>>But this does not change the MTU inside the VM right? yes, it change the mtu inside the vm! (at least on recent kernel, don't remember when this had been added) - Mail original - De: "proxmox" À: "Proxmox VE development discussion" Envoyé: Mercredi 9 Septembre 2020 11:06:13 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata But this does not change the MTU inside the VM right? -Ursprüngliche Nachricht- Von: Alexandre DERUMIER Gesendet: Montag 7 September 2020 09:34 An: Proxmox VE development discussion Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hi, not related to cloudinit, but for virtio-net nic, it's already possible to add "mtu=xxx" option to netX:. It's not yet available in gui, but you should be able to do it with "qm set --net0 ...,mtu=" - Mail original - De: "proxmox" À: "Proxmox VE development discussion" Envoyé: Vendredi 4 Septembre 2020 17:21:24 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hello I didn't know this patch mail got approved, so sorry for the (very) late response. My intention for not going with snippets was the fact that they could not be created via the API and one would have to manually create a file on the target machine for cloud-init userdata. One possible use case was to spin up a kubernetes cluster on proxmox only via API. I wanted to have something similar to the hetzner cloud API where the full userdata can be submitted for VM provisioning: https://docs.hetzner.cloud/#servers-create-a-server So going further here you want me to submit the MTU patches separately? Should I integrate userdata into the cicustom field? I thought this would make things more complex in favor of parsing out the base64 stuff. So I would still go with an extra field. Thoughts? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata
it has been added to kernel in 2016 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.8.7&id=14de9d114a82a564b94388c95af79a701dc93134 - Mail original - De: "aderumier" À: "Proxmox VE development discussion" Envoyé: Mercredi 9 Septembre 2020 15:02:33 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata >>But this does not change the MTU inside the VM right? yes, it change the mtu inside the vm! (at least on recent kernel, don't remember when this had been added) - Mail original - De: "proxmox" À: "Proxmox VE development discussion" Envoyé: Mercredi 9 Septembre 2020 11:06:13 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata But this does not change the MTU inside the VM right? -Ursprüngliche Nachricht- Von: Alexandre DERUMIER Gesendet: Montag 7 September 2020 09:34 An: Proxmox VE development discussion Betreff: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hi, not related to cloudinit, but for virtio-net nic, it's already possible to add "mtu=xxx" option to netX:. It's not yet available in gui, but you should be able to do it with "qm set --net0 ...,mtu=" - Mail original - De: "proxmox" À: "Proxmox VE development discussion" Envoyé: Vendredi 4 Septembre 2020 17:21:24 Objet: Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata Hello I didn't know this patch mail got approved, so sorry for the (very) late response. My intention for not going with snippets was the fact that they could not be created via the API and one would have to manually create a file on the target machine for cloud-init userdata. One possible use case was to spin up a kubernetes cluster on proxmox only via API. I wanted to have something similar to the hetzner cloud API where the full userdata can be submitted for VM provisioning: https://docs.hetzner.cloud/#servers-create-a-server So going further here you want me to submit the MTU patches separately? Should I integrate userdata into the cicustom field? I thought this would make things more complex in favor of parsing out the base64 stuff. So I would still go with an extra field. Thoughts? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] api: cloud-init support for mtu and userdata
Hi, On 9/4/20 5:21 PM, proxmox wrote: Hello I didn't know this patch mail got approved, so sorry for the (very) late response. My intention for not going with snippets was the fact that they could not be created via the API and one would have to manually create a file on the target machine for cloud-init userdata. There is currently a bug report open for this: https://bugzilla.proxmox.com/show_bug.cgi?id=2208 This together with backup and migration of snippets could be useful. One possible use case was to spin up a kubernetes cluster on proxmox only via API. I wanted to have something similar to the hetzner cloud API where the full userdata can be submitted for VM provisioning: https://docs.hetzner.cloud/#servers-create-a-server So going further here you want me to submit the MTU patches separately? The MTU patches separately would be great. Should I integrate userdata into the cicustom field? I thought this would make things more complex in favor of parsing out the base64 stuff. So I would still go with an extra field. After some discussions we think that putting the userdata in the config file is not the right approach. As the cluster filesystem is limited to 32M and a single VM config file is limited to 512K you can easily run into the limit with a small number of VMs. We would most likely have to limit the userdata per config to a very small amount (1K?). But with those limits it is difficult to get everything in. Thoughts? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH v2 firewall] introduce new icmp-type parameter
On 29.05.20 14:22, Mira Limbeck wrote: > Currently icmp types are handled via 'dport'. This is not documented > anywhere except for a single line of comment in the code. To untangle > the icmp-type handling from the dport handling a new 'icmp-type' > parameter is introduced. > > The valid 'icmp-type' values are limited to the names > (icmp[v6]_type_names hash in the code, same as ip[6]tables provides). > Type[/Code] values are not supported. > > Support for ipv6-icmp is added to icmp-type parameter handling. This makes it > possible to specify icmpv6 types via the GUI. > > Signed-off-by: Mira Limbeck > --- > v2: > - rebased on master > - removed type[/code] value support, now only names are accepted > > src/PVE/API2/Firewall/Rules.pm | 4 +++ > src/PVE/Firewall.pm| 50 +- > 2 files changed, 53 insertions(+), 1 deletion(-) > > applied, thanks! ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 access-control] add ui capabilities endpoint
On 06.07.20 14:45, Tim Marx wrote: > Signed-off-by: Tim Marx > --- > * no changes Maybe we could merge this into the "/access/permissions" endpoint, maybe with a "heurisitic" parameter? > > PVE/API2/AccessControl.pm | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/PVE/API2/AccessControl.pm b/PVE/API2/AccessControl.pm > index fd27786..66319cc 100644 > --- a/PVE/API2/AccessControl.pm > +++ b/PVE/API2/AccessControl.pm > @@ -718,4 +718,33 @@ __PACKAGE__->register_method({ > return $res; > }}); > > +__PACKAGE__->register_method({ > +name => 'uicapabilities', > +path => 'uicapabilities', > +method => 'GET', > +description => 'Retrieve user interface capabilities for calling > user/token.', > +permissions => { > + description => "Each user/token is allowed to retrieve their own > capabilities.", > + user => 'all', > +}, > +parameters => {}, > +returns => { > + type => 'object', > + properties => { > + cap => { > + type => 'object', > + description => 'The user interface capabilities of the calling > user/token' > + } > + }, > +}, > +code => sub { > + my ($param) = @_; > + > + my $rpcenv = PVE::RPCEnvironment::get(); > + my $userid = $rpcenv->get_user(); > + my $res->{cap} = &$compute_api_permission($rpcenv, $userid); > + > + return $res; > +}}); > + > 1; > -- > 2.20.1 > > ___ > pve-devel mailing list > pve-de...@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH lxc 0/2] fix apparmor rules and improve cgroupv2 experience
On 22.07.20 13:05, Stoiko Ivanov wrote: > This patchset addresses 2 minor inconveniences I ran into, while running my > host with 'systemd.unified_cgroup_hierarchy=1': > > * apparmor mount denies for '/proc/sys/kernel/random/boot_id' (this happens > irrespective of the cgroup-layout > * having to add > `lxc.init.cmd: /lib/systemd/systemd systemd.unified_cgroup_hierarchy=1` > to all my container configs (for debian and arch containers at least > alpine runs without issues) - see [0] for a discussion of the topic > > While investigating this I noticed that the fixes for both issues were already > on upstream/master (with one small other fix in between) - so instead of > cherry-picking both patches I fast-forwarded to the last needed commit. > Glad to resend with the patches cherry-picked and added to our patchqueue. > > I would probably submit the apparmor fix upstream (after a quick check by > another set of eyes :) > > [0] https://github.com/lxc/lxc/issues/3183 > > Stoiko Ivanov (2): > update lxc to include fixes for cgroupv2 setups > apparmor: add rule for allowing remount of boot_id > > ...apparmor-Allow-ro-remount-of-boot_id.patch | 26 +++ > debian/patches/series | 1 + > lxc | 2 +- > 3 files changed, 28 insertions(+), 1 deletion(-) > create mode 100644 > debian/patches/pve/0004-apparmor-Allow-ro-remount-of-boot_id.patch > 2/2 got merged into upstream and is available with 4.0.4, could you see if we can seamlessly update from currently packaged 4.0.3 to 4.0.4? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH container] setup: add kali-rolling in supported releases
On 01.09.20 12:44, Oguz Bektas wrote: > for our setup purposes, it's the same as bullseye since it's following a > rolling release model. > > Signed-off-by: Oguz Bektas > --- > src/PVE/LXC/Setup/Debian.pm | 1 + > 1 file changed, 1 insertion(+) > > applied, thanks! Albeit it could make sense to have a special "sid" version and map this then also to "sid", avoiding the need to update it again? ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied-series: [PATCH v2 container 1/2] Add module for reading state changes from monitor socket
On 08.09.20 13:58, Fabian Ebner wrote: > Will be used to monitor state changes on container startup. > > Co-developed-by: Wolfgang Bumiller > Signed-off-by: Fabian Ebner > --- > > New in v2. > > I hard-coded the name of the abstract UNIX socket instead of > trying to re-implement lxc/monitor.c's lxc_monitor_sock_name > function. For us, is it true that the lxcpath is always '/varl/lib/lxc'? > Otherwise this won't always work. > > src/PVE/LXC/Makefile | 1 + > src/PVE/LXC/Monitor.pm | 92 ++ > 2 files changed, 93 insertions(+) > create mode 100644 src/PVE/LXC/Monitor.pm > > applied series, thanks! Wolfgang may improve his variant, e.g., by patching lxc, to avoid a runtime dependency on the lxc monitor, but until that works out this is 1x times better that the current status quo. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH container 5/5] setup: heuristically warn if the FS hosting /etc is not mounted
Check for the existence of /etc, use -e as it could also be a symlink (and it's just a heuristic). But only do so if the expected ostype from the config does not match the detected one, this normally indicates that we had a "reals" distro running but detected the fallback "unmanaged". Only warn though, as a hint for the user. Signed-off-by: Thomas Lamprecht --- src/PVE/LXC/Setup.pm | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm index fb0be37..8b8fee9 100644 --- a/src/PVE/LXC/Setup.pm +++ b/src/PVE/LXC/Setup.pm @@ -96,8 +96,11 @@ sub new { $type = &$autodetect_type($self, $rootdir, $os_release); my $expected_type = $conf->{ostype} || $type; - warn "got unexpected ostype ($type != $expected_type)\n" - if $type ne $expected_type; + if ($type ne $expected_type) { + warn "WARNING: /etc not present in CT, is the rootfs mounted?\n" + if ! -e "$rootdir/etc"; + warn "got unexpected ostype ($type != $expected_type)\n" + } } if ($type eq 'unmanaged') { -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH container 2/5] implement debug start
Signed-off-by: Thomas Lamprecht --- src/Makefile | 6 -- src/PVE/API2/LXC/Status.pm | 8 +++- src/PVE/LXC.pm | 10 +++--- src/PVE/LXC/Config.pm| 6 ++ src/pve-container-debug@.service | 22 ++ 5 files changed, 46 insertions(+), 6 deletions(-) create mode 100644 src/pve-container-debug@.service diff --git a/src/Makefile b/src/Makefile index 7166708..450a8eb 100644 --- a/src/Makefile +++ b/src/Makefile @@ -38,8 +38,9 @@ check: test make -C test .PHONY: install -install: pct lxc-pve.conf lxc-pve-prestart-hook lxc-pve-autodev-hook lxc-pve-poststop-hook \ - lxcnetaddbr pct.1 pct.conf.5 pct.bash-completion pct.zsh-completion pve-userns.seccomp +install: pct lxc-pve.conf pct.1 pct.conf.5 pct.bash-completion pct.zsh-completion \ +pve-userns.seccomp pve-container@.service pve-container-debug@.service \ +lxc-pve-prestart-hook lxc-pve-autodev-hook lxc-pve-poststop-hook lxcnetaddbr PVE_GENERATING_DOCS=1 perl -I. -T -e "use PVE::CLI::pct; PVE::CLI::pct->verify_api();" install -d ${SBINDIR} install -m 0755 pct ${SBINDIR} @@ -48,6 +49,7 @@ install: pct lxc-pve.conf lxc-pve-prestart-hook lxc-pve-autodev-hook lxc-pve-pos install -m 0755 pve-container-stop-wrapper ${LXC_SCRIPT_DIR} install -d -m0755 ${SERVICEDIR} install -m0644 pve-container@.service ${SERVICEDIR}/ + install -m0644 pve-container-debug@.service ${SERVICEDIR}/ install -m0644 'system-pve\x2dcontainer.slice' ${SERVICEDIR}/ install -d ${LXC_HOOK_DIR} install -m 0755 lxc-pve-prestart-hook ${LXC_HOOK_DIR} diff --git a/src/PVE/API2/LXC/Status.pm b/src/PVE/API2/LXC/Status.pm index 89186ae..766c2ce 100644 --- a/src/PVE/API2/LXC/Status.pm +++ b/src/PVE/API2/LXC/Status.pm @@ -130,6 +130,12 @@ __PACKAGE__->register_method({ node => get_standard_option('pve-node'), vmid => get_standard_option('pve-vmid', { completion => \&PVE::LXC::complete_ctid_stopped }), skiplock => get_standard_option('skiplock'), + debug => { + optional => 1, + type => 'boolean', + description => "If set, enables very verbose debug log-level on start.", + default => 0, + }, }, }, returns => { @@ -188,7 +194,7 @@ __PACKAGE__->register_method({ }); } - PVE::LXC::vm_start($vmid, $conf, $skiplock); + PVE::LXC::vm_start($vmid, $conf, $skiplock, $param->{debug}); }; my $lockcmd = sub { diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index f99..b3e3581 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -2218,7 +2218,7 @@ my sub monitor_start($$) { } sub vm_start { -my ($vmid, $conf, $skiplock) = @_; +my ($vmid, $conf, $skiplock, $debug) = @_; # apply pending changes while starting if (scalar(keys %{$conf->{pending}})) { @@ -2247,15 +2247,19 @@ sub vm_start { unlink "/run/pve/ct-$vmid.stderr"; # systemd does not truncate log files -my $base_unit = $conf->{debug} ? 'pve-container-debug' : 'pve-container'; +my $is_debug = $debug || (!defined($debug) && $conf->{debug}); +my $base_unit = $is_debug ? 'pve-container-debug' : 'pve-container'; -my $cmd = ['systemctl', 'start', "pve-container\@$vmid"]; +my $cmd = ['systemctl', 'start', "$base_unit\@$vmid"]; PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1); eval { PVE::Tools::run_command($cmd); monitor_start($monitor_socket, $vmid) if defined($monitor_socket); + + # if debug is requested, print the log it also when the start succeeded + print_ct_stderr_log($vmid) if $is_debug; }; if (my $err = $@) { unlink $skiplock_flag_fn; diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm index 044e2e1..4cd669c 100644 --- a/src/PVE/LXC/Config.pm +++ b/src/PVE/LXC/Config.pm @@ -508,6 +508,12 @@ my $confdesc = { description => 'Tags of the Container. This is only meta information.', optional => 1, }, +debug => { + optional => 1, + type => 'boolean', + description => "Try to be more verbose. For now this only enables debug log-level on start.", + default => 0, +}, }; my $valid_lxc_conf_keys = { diff --git a/src/pve-container-debug@.service b/src/pve-container-debug@.service new file mode 100644 index 000..7cfebaa --- /dev/null +++ b/src/pve-container-debug@.service @@ -0,0 +1,22 @@ +# based on lxc@.service, but without an install section because +# starting and stopping should be initiated by PVE code, not +# systemd. +[Unit] +Description=PVE LXC Container: %i +DefaultDependencies=No +After=lxc.service +Wants=lxc.service +Documentation=man:lxc-start man:lxc man:pct + +[Service] +Type=simple +Delegate=yes +KillMode=mixed +TimeoutStopSec=120s +ExecSta
[pve-devel] applied: [PATCH container 3/5] protected_call: remove left-over rootdir/dev mkdir
commit 797e12e8a5df246d8afc53b045e632977cdf0088 got rid of our "just bind-mount the root /dev to the CT temporarily for some stuff" for good a while ago (2015), but creating the /dev directory in the CT root was kept, from what I can tell, by mistake. This can be a problem if, whyever, the CT rootfs is not mounted, as we then break a future mount as we create this /dev directory inside what would be the CTs rootfs mount point. It is then not empty anymore and a normal mount cannot happen, failing with "directory is not empty" Signed-off-by: Thomas Lamprecht --- src/PVE/LXC/Setup.pm | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/PVE/LXC/Setup.pm b/src/PVE/LXC/Setup.pm index d424aaa..fb0be37 100644 --- a/src/PVE/LXC/Setup.pm +++ b/src/PVE/LXC/Setup.pm @@ -134,11 +134,6 @@ sub protected_call { # avoid recursion: return $sub->() if $self->{in_chroot}; -my $rootdir = $self->{rootdir}; -if (!-d "$rootdir/dev" && !mkdir("$rootdir/dev")) { - die "failed to create temporary /dev directory: $!\n"; -} - pipe(my $res_in, my $res_out) or die "pipe failed: $!\n"; my $child = fork(); @@ -149,6 +144,7 @@ sub protected_call { # avoid recursive forks $self->{in_chroot} = 1; eval { + my $rootdir = $self->{rootdir}; chroot($rootdir) or die "failed to change root to: $rootdir: $!\n"; chdir('/') or die "failed to change to root directory\n"; my $res = $sub->(); -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH container 4/5] alpine: setup net: pass whole config to parent method
We expected the whole $conf to be passed in a call to setup_network, a bit ago it worked if their where only the netX keys present, for some plugin that still is the case. But, in the Debian version, reused by Alpine, we now check if the CT distro version is recent enough to support (or need) the address in CIDR format. So, at least "ostype" needs to be passed to, else we get ugly warnings in the syslog (or the recently added --debug log CLI switch) Just pass the whole config, the setup_network method need to cope with that anyway. Signed-off-by: Thomas Lamprecht --- src/PVE/LXC/Setup/Alpine.pm | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/PVE/LXC/Setup/Alpine.pm b/src/PVE/LXC/Setup/Alpine.pm index 75d6ebe..e486971 100644 --- a/src/PVE/LXC/Setup/Alpine.pm +++ b/src/PVE/LXC/Setup/Alpine.pm @@ -86,21 +86,24 @@ sub setup_network { # at least with the workaround the networking starts and if an ipv4 is # configured slaac for ipv6 works (unless accept_ra = 0 in the node) -my $netconf = {}; +my $newconf = {}; my $networks = {}; foreach my $k (keys %$conf) { + my $value = $conf->{$k}; + $newconf->{$k} = $value; next if $k !~ m/^net(\d+)$/; - my $netstring = $conf->{$k}; + + my $netstring = $value; # check for dhcp6: my $d = PVE::LXC::Config->parse_lxc_network($netstring); if (defined($d->{ip6}) && ($d->{ip6} eq 'dhcp' || $d->{ip6} eq 'auto')) { $d->{ip6} = 'manual'; $netstring = PVE::LXC::Config->print_lxc_network($d); } - $netconf->{$k} = $netstring; + $newconf->{$k} = $netstring; } -PVE::LXC::Setup::Debian::setup_network($self, $netconf); +PVE::LXC::Setup::Debian::setup_network($self, $newconf); } 1; -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] applied: [PATCH container 1/5] ct start: track lxc-start stderr and print in error case
Signed-off-by: Thomas Lamprecht --- src/PVE/LXC.pm | 15 +++ src/pve-container@.service | 2 +- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index e13f7e6..f99 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -2167,6 +2167,17 @@ sub userns_command { return []; } +my sub print_ct_stderr_log { +my ($vmid) = @_; +my $log = eval { file_get_contents("/run/pve/ct-$vmid.stderr") }; +return if !$log; + +while ($log =~ /^\h*(lxc-start:?\s+$vmid:?\s*\S+\s*)?(.*?)\h*$/gm) { + my $line = $2; + print STDERR "$line\n"; +} +} + my sub monitor_state_change($$) { my ($monitor_socket, $vmid) = @_; die "no monitor socket\n" if !defined($monitor_socket); @@ -2201,6 +2212,7 @@ my sub monitor_start($$) { if (my $err = $@) { warn "problem with monitor socket, but continuing anyway: $err\n"; } elsif (!$success) { + print_ct_stderr_log($vmid); die "startup for container '$vmid' failed\n"; } } @@ -2233,6 +2245,9 @@ sub vm_start { my $monitor_socket = eval { PVE::LXC::Monitor::get_monitor_socket() }; warn $@ if $@; +unlink "/run/pve/ct-$vmid.stderr"; # systemd does not truncate log files + +my $base_unit = $conf->{debug} ? 'pve-container-debug' : 'pve-container'; my $cmd = ['systemctl', 'start', "pve-container\@$vmid"]; diff --git a/src/pve-container@.service b/src/pve-container@.service index d9185bc..fdc373e 100644 --- a/src/pve-container@.service +++ b/src/pve-container@.service @@ -19,4 +19,4 @@ ExecStop=/usr/share/lxc/pve-container-stop-wrapper %i # Environment=CONSOLETYPE=serial # Prevent container init from putting all its output into the journal StandardOutput=null -StandardError=null +StandardError=file:/run/pve/ct-%i.stderr -- 2.20.1 ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On 08.09.20 09:11, Alexandre DERUMIER wrote: >>> It would really help if we can reproduce the bug somehow. Do you have and >>> idea how >>> to trigger the bug? > > I really don't known. I'm currently trying to reproduce on the same cluster, > with softdog && noboot=1, and rebooting node. > > > Maybe it's related with the number of vms, or the number of nodes, don't have > any clue ... I checked a bit the watchdog code, our user-space mux one and the kernel drivers, and just noting a few things here (thinking out aloud): The /dev/watchdog itself is always active, else we could loose it to some other program and not be able to activate HA dynamically. But, as long as no HA service got active, it's a simple dummy "wake up every second and do an ioctl keep-alive update". This is really simple and efficiently written, so if that fails for over 10s the systems is really loaded, probably barely responding to anything. Currently the watchdog-mux runs as normal process, no re-nice, no real-time scheduling. This is IMO wrong, as it is a critical process which needs to be run with high priority. I've a patch here which sets it to the highest RR realtime-scheduling priority available, effectively the same what corosync does. diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c index 818ae00..71981d7 100644 --- a/src/watchdog-mux.c +++ b/src/watchdog-mux.c @@ -8,2 +8,3 @@ #include +#include #include @@ -151,2 +177,15 @@ main(void) +int sched_priority = sched_get_priority_max (SCHED_RR); +if (sched_priority != -1) { +struct sched_param global_sched_param; +global_sched_param.sched_priority = sched_priority; +int res = sched_setscheduler (0, SCHED_RR, &global_sched_param); +if (res == -1) { +fprintf(stderr, "Could not set SCHED_RR at priority %d\n", sched_priority); +} else { +fprintf(stderr, "set SCHED_RR at priority %d\n", sched_priority); +} +} + + if ((watchdog_fd = open(WATCHDOG_DEV, O_WRONLY)) == -1) { The issue with no HA but watchdog reset due to massively overloaded system should be avoided already a lot with the scheduling change alone. Interesting, IMO, is that lots of nodes rebooted at the same time, with no HA active. This *could* come from a side-effect like ceph rebalacing kicking off and producing a load spike for >10s, hindering the scheduling of the watchdog-mux. This is a theory, but with HA off it needs to be something like that, as in HA-off case there's *no* direct or indirect connection between corosync/pmxcfs and the watchdog-mux. It simply does not cares, or notices, quorum partition changes at all. There may be a approach to reserve the watchdog for the mux, but avoid having it as "ticking time bomb": Theoretically one could open it, then disable it with an ioctl (it can be queried if a driver support that) and only enable it for real once the first client connects to the MUX. This may not work for all watchdog modules, and if, we may want to make it configurable, as some people actually want a reset if a (future) real-time process cannot be scheduled for >= 10 seconds. With HA active, well then there could be something off, either in corosync/knet or also in how we interface with it in pmxcfs, that could well be, but won't explain the non-HA issues. Speaking of pmxcfs, that one runs also with standard priority, we may want to change that too to a RT scheduler, so that its ensured it can process all corosync events. I have also a few other small watchdog mux patches around, it should nowadays actually be able to tell us why a reset happened (can also be over/under voltage, temperature, ...) and I'll repeat doing the ioctl for keep-alive a few times if it fails, can only win with that after all. ___ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Thanks Thomas for the investigations. I'm still trying to reproduce... I think I have some special case here, because the user of the forum with 30 nodes had corosync cluster split. (Note that I had this bug 6 months ago,when shuting down a node too, and the only way was stop full stop corosync on all nodes, and start corosync again on all nodes). But this time, corosync logs looks fine. (every node, correctly see node2 down, and see remaning nodes) surviving node7, was the only node with HA, and LRM didn't have enable watchog (I don't have found any log like "pve-ha-lrm: watchdog active" for the last 6months on this nodes So, the timing was: 10:39:05 : "halt" command is send to node2 10:39:16 : node2 is leaving corosync / halt -> every node is seeing it and correctly do a new membership with 13 remaining nodes ...don't see any special logs (corosync,pmxcfs,pve-ha-crm,pve-ha-lrm) after the node2 leaving. But they are still activity on the server, pve-firewall is still logging, vms are running fine between 10:40:25 - 10:40:34 : watchdog reset nodes, but not node7. -> so between 70s-80s after the node2 was done, so I think that watchdog-mux was still running fine until that. (That's sound like lrm was stuck, and client_watchdog_timeout have expired in watchdog-mux) 10:40:41 node7, loose quorum (as all others nodes have reset), 10:40:50: node7 crm/lrm finally log. Sep 3 10:40:50 m6kvm7 pve-ha-crm[16196]: got unexpected error - error during cfs-locked 'domain-ha' operation: no quorum! Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: lost lock 'ha_agent_m6kvm7_lock - cfs lock update failed - Permission denied So, I really think that something have stucked lrm/crm loop, and watchdog was not resetted because of that. - Mail original - De: "Thomas Lamprecht" À: "Proxmox VE development discussion" , "aderumier" , "dietmar" Envoyé: Mercredi 9 Septembre 2020 22:05:49 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On 08.09.20 09:11, Alexandre DERUMIER wrote: >>> It would really help if we can reproduce the bug somehow. Do you have and >>> idea how >>> to trigger the bug? > > I really don't known. I'm currently trying to reproduce on the same cluster, > with softdog && noboot=1, and rebooting node. > > > Maybe it's related with the number of vms, or the number of nodes, don't have > any clue ... I checked a bit the watchdog code, our user-space mux one and the kernel drivers, and just noting a few things here (thinking out aloud): The /dev/watchdog itself is always active, else we could loose it to some other program and not be able to activate HA dynamically. But, as long as no HA service got active, it's a simple dummy "wake up every second and do an ioctl keep-alive update". This is really simple and efficiently written, so if that fails for over 10s the systems is really loaded, probably barely responding to anything. Currently the watchdog-mux runs as normal process, no re-nice, no real-time scheduling. This is IMO wrong, as it is a critical process which needs to be run with high priority. I've a patch here which sets it to the highest RR realtime-scheduling priority available, effectively the same what corosync does. diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c index 818ae00..71981d7 100644 --- a/src/watchdog-mux.c +++ b/src/watchdog-mux.c @@ -8,2 +8,3 @@ #include +#include #include @@ -151,2 +177,15 @@ main(void) + int sched_priority = sched_get_priority_max (SCHED_RR); + if (sched_priority != -1) { + struct sched_param global_sched_param; + global_sched_param.sched_priority = sched_priority; + int res = sched_setscheduler (0, SCHED_RR, &global_sched_param); + if (res == -1) { + fprintf(stderr, "Could not set SCHED_RR at priority %d\n", sched_priority); + } else { + fprintf(stderr, "set SCHED_RR at priority %d\n", sched_priority); + } + } + + if ((watchdog_fd = open(WATCHDOG_DEV, O_WRONLY)) == -1) { The issue with no HA but watchdog reset due to massively overloaded system should be avoided already a lot with the scheduling change alone. Interesting, IMO, is that lots of nodes rebooted at the same time, with no HA active. This *could* come from a side-effect like ceph rebalacing kicking off and producing a load spike for >10s, hindering the scheduling of the watchdog-mux. This is a theory, but with HA off it needs to be something like that, as in HA-off case there's *no* direct or indirect connection between corosync/pmxcfs and the watchdog-mux. It simply does not cares, or notices, quorum partition changes at all. There may be a approach to reserve the watchdog for the mux, but avoid having