[pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Wolfgang Link
If the pool is under heavy load ZFS will low prioritized deletion jobs. This ends in a timeout and the program logic will delete the current sync snapshot. On the next run the former sync snapshots will also removed because they are not in the state file. In this state it is no more possible to s

Re: [pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Dietmar Maurer
> diff --git a/PVE/Replication.pm b/PVE/Replication.pm > index 9bc4e61..d8ccfaf 100644 > --- a/PVE/Replication.pm > +++ b/PVE/Replication.pm > @@ -136,8 +136,18 @@ sub prepare { > $last_snapshots->{$volid}->{$snap} = 1; > } elsif ($snap =~ m/^\Q$prefix\E/) { >

Re: [pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Dietmar Maurer
> @@ -293,12 +303,16 @@ sub replicate { > die $err; > } > > -# remove old snapshots because they are no longer needed > -$cleanup_local_snapshots->($last_snapshots, $last_sync_snapname); > +eval { > + # remove old snapshots because they are no longer needed > + $cle

[pve-devel] [PATCH cluster] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Dominik Csapak
since systemd depends that the pid file is written only when the service is actually started, we need to wait for the child to get to the point where it starts the fuse loop and signal the parent to now exit and write the pid file without this, we had an issue, where the ExecStartPost hook (which

Re: [pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Wolfgang Link
> Dietmar Maurer hat am 12. April 2018 um 11:06 > geschrieben: > > > > diff --git a/PVE/Replication.pm b/PVE/Replication.pm > > index 9bc4e61..d8ccfaf 100644 > > --- a/PVE/Replication.pm > > +++ b/PVE/Replication.pm > > @@ -136,8 +136,18 @@ sub prepare { > > $last_snapshots->{$volid

Re: [pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Wolfgang Link
> Dietmar Maurer hat am 12. April 2018 um 11:11 > geschrieben: > > > > @@ -293,12 +303,16 @@ sub replicate { > > die $err; > > } > > > > -# remove old snapshots because they are no longer needed > > -$cleanup_local_snapshots->($last_snapshots, $last_sync_snapname); > > +e

Re: [pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

2018-04-12 Thread Fabian Grünbichler
minor nit: please use a short subject that indicates what the commit does, not what the bug was that it fixes. e.g., something like fix #1694: make failure of snapshot removal non-fatal On Thu, Apr 12, 2018 at 09:46:03AM +0200, Wolfgang Link wrote: > If the pool is under heavy load ZFS will low

Re: [pve-devel] [PATCH cluster] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Wolfgang Bumiller
On Thu, Apr 12, 2018 at 11:27:09AM +0200, Dominik Csapak wrote: > since systemd depends that the pid file is written only > when the service is actually started, we need to wait for the > child to get to the point where it starts the fuse loop > and signal the parent to now exit and write the pid f

Re: [pve-devel] [PATCH cluster] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Fabian Grünbichler
minor nits, ACK otherwise On Thu, Apr 12, 2018 at 11:27:09AM +0200, Dominik Csapak wrote: > since systemd depends that the pid file is written only > when the service is actually started, we need to wait for the > child to get to the point where it starts the fuse loop > and signal the parent to n

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Fabian Grünbichler
On Wed, Apr 11, 2018 at 03:12:17PM +0300, Lauri Tirkkonen wrote: > On Tue, Mar 13 2018 11:15:23 +0200, Lauri Tirkkonen wrote: > > > Sorry if I misunderstood you but VMIDs are already _guaranteed_ to be > > > unique cluster wide, so also unique per node? > > > > I'll try to clarify: if I create a V

[pve-devel] [PATCH cluster v2] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Dominik Csapak
since systemd depends that parent exits only when the service is actually started, we need to wait for the child to get to the point where it starts the fuse loop and signal the parent to now exit and write the pid file without this, we had an issue, where the ExecStartPost hook (which runs pvecm

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Fabian Grünbichler
On Thu, Apr 12, 2018 at 12:32:08PM +0200, Fabian Grünbichler wrote: > On Wed, Apr 11, 2018 at 03:12:17PM +0300, Lauri Tirkkonen wrote: > > On Tue, Mar 13 2018 11:15:23 +0200, Lauri Tirkkonen wrote: > > > > Sorry if I misunderstood you but VMIDs are already _guaranteed_ to be > > > > unique cluster

Re: [pve-devel] [PATCH v3 storage 1/2] Fix #1542: show storage utilization per pool

2018-04-12 Thread Dietmar Maurer
comments inline. > On April 11, 2018 at 4:36 PM Alwin Antreich wrote: > > > - get the percent_used value for a ceph pool and >calculate it where ceph doesn't supply it (pre kraken) > - use librados2-perl for pool status > - add librados2-perl as build-depends and depends in debian/contro

Re: [pve-devel] [PATCH cluster v2] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Wolfgang Bumiller
On Thu, Apr 12, 2018 at 12:37:16PM +0200, Dominik Csapak wrote: > since systemd depends that parent exits only > when the service is actually started, we need to wait for the > child to get to the point where it starts the fuse loop > and signal the parent to now exit and write the pid file > > wi

Re: [pve-devel] [PATCH] increase zfs default timeout to 30sec

2018-04-12 Thread Fabian Grünbichler
On Tue, Apr 10, 2018 at 05:40:50PM +0300, Lauri Tirkkonen wrote: > Hi, > > On Tue, Mar 13 2018 10:25:47 +0100, Thomas Lamprecht wrote: > > What Fabian meant with: > > > [...] our API has a timeout per request, [...] > > > > is that our API already has 30 seconds as timeout for response, > > so us

Re: [pve-devel] [PATCH] increase zfs default timeout to 30sec

2018-04-12 Thread Lauri Tirkkonen
On Thu, Apr 12 2018 12:59:13 +0200, Fabian Grünbichler wrote: > On Tue, Apr 10, 2018 at 05:40:50PM +0300, Lauri Tirkkonen wrote: > > Hi, > > > > On Tue, Mar 13 2018 10:25:47 +0100, Thomas Lamprecht wrote: > > > What Fabian meant with: > > > > [...] our API has a timeout per request, [...] > > > >

[pve-devel] [PATCH kernel v2 0/2] pve-kernel helper scripts for patch-queue management

2018-04-12 Thread Fabian Grünbichler
this patch series introduces helper scripts for - importing the exported patchqueue into a patchqueue branch inside the submodule - exporting the (updated) patchqueue from the patchqueue branch inside the submodule - importing a new upstream tag into the submodule, optionally rebasing the pat

[pve-devel] [PATCH kernel v2 1/2] debian/scripts: add patchqueue scripts

2018-04-12 Thread Fabian Grünbichler
Signed-off-by: Fabian Grünbichler --- debian/scripts/export-patchqueue | 33 + debian/scripts/import-patchqueue | 29 + 2 files changed, 62 insertions(+) create mode 100755 debian/scripts/export-patchqueue create mode 100755 debian/scr

[pve-devel] [PATCH kernel v2 2/2] debian/scripts: add import-upstream-tag

2018-04-12 Thread Fabian Grünbichler
Signed-off-by: Fabian Grünbichler --- debian/scripts/import-upstream-tag | 119 + 1 file changed, 119 insertions(+) create mode 100755 debian/scripts/import-upstream-tag diff --git a/debian/scripts/import-upstream-tag b/debian/scripts/import-upstream-tag new

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Lauri Tirkkonen
On Thu, Apr 12 2018 12:42:51 +0200, Fabian Grünbichler wrote: > > - please send patch series as threads (cover letter and each patch as > > separate mail) and configure the subjectprefix accordingly for each > > repository. this allows inline comments on each patch. Ok. This particular patchse

Re: [pve-devel] [PATCH] increase zfs default timeout to 30sec

2018-04-12 Thread Fabian Grünbichler
On Thu, Apr 12, 2018 at 02:59:22PM +0300, Lauri Tirkkonen wrote: > On Thu, Apr 12 2018 12:59:13 +0200, Fabian Grünbichler wrote: > > On Tue, Apr 10, 2018 at 05:40:50PM +0300, Lauri Tirkkonen wrote: > > > Hi, > > > > > > On Tue, Mar 13 2018 10:25:47 +0100, Thomas Lamprecht wrote: > > > > What Fabia

Re: [pve-devel] [PATCH] increase zfs default timeout to 30sec

2018-04-12 Thread Dominik Csapak
On 04/12/2018 02:13 PM, Fabian Grünbichler wrote: On Thu, Apr 12, 2018 at 02:59:22PM +0300, Lauri Tirkkonen wrote: On Thu, Apr 12 2018 12:59:13 +0200, Fabian Grünbichler wrote: On Tue, Apr 10, 2018 at 05:40:50PM +0300, Lauri Tirkkonen wrote: Hi, On Tue, Mar 13 2018 10:25:47 +0100, Thomas Lamp

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Fabian Grünbichler
On Thu, Apr 12, 2018 at 03:12:50PM +0300, Lauri Tirkkonen wrote: > On Thu, Apr 12 2018 12:42:51 +0200, Fabian Grünbichler wrote: > > > - please send patch series as threads (cover letter and each patch as > > > separate mail) and configure the subjectprefix accordingly for each > > > repository

[pve-devel] [PATCH cluster v3] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Dominik Csapak
since systemd depends that parent exits only when the service is actually started, we need to wait for the child to get to the point where it starts the fuse loop and signal the parent to now exit and write the pid file without this, we had an issue, where the ExecStartPost hook (which runs pvecm

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Lauri Tirkkonen
On Thu, Apr 12 2018 14:26:53 +0200, Fabian Grünbichler wrote: > > Sure, it's not a guarantee (because it isn't an error to use an unused > > ID less than nextid -- it would be easy to convert the warning to an > > error though). But we don't especially need it to be a guarantee, we > > just want ca

[pve-devel] [RFC PATCH] collect device list for nested pci-bridges

2018-04-12 Thread Dominik Csapak
when using q35 as machine type, there are nested pci-bridges, but we only checked the first layer this resulted in not being able to hotplug scsi devices, because scsihw0 was deeper in the pci-bridge construct, we did not see it and tried to add it (which fails of course) this patch checks all br

Re: [pve-devel] [RFC PATCH] collect device list for nested pci-bridges

2018-04-12 Thread Dominik Csapak
this patch is for qemu-server if it was not clear i guess git drops the configured subjectprefix when using --rfc ___ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

[pve-devel] Thank you!

2018-04-12 Thread Gilberto Nunes
Hi I would like to say thanks to all Proxmox staff, for the great effort to make this wonderful piece of software a great tool. Nice works. ___ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] avoiding VMID reuse

2018-04-12 Thread Alexandre DERUMIER
Hi, I'm jumping in the conversion. I personally think that it could be great that proxmox use UUID for vmid, to be sure that we have unique id across different cluster, or when we delete/recreate a vm with same id on same cluster. for example, currently, if we delete and recreate a vm with same

[pve-devel] applied: [PATCH xtermjs 0/4] upgrade to 3.2.0 and implement reconnect

2018-04-12 Thread Dominik Csapak
applied with makefile fix like wolfgang suggested ___ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

[pve-devel] applied: [PATCH cluster v3] pmxcfs: only exit parent when successfully started

2018-04-12 Thread Wolfgang Bumiller
applied On Thu, Apr 12, 2018 at 02:38:15PM +0200, Dominik Csapak wrote: > since systemd depends that parent exits only > when the service is actually started, we need to wait for the > child to get to the point where it starts the fuse loop > and signal the parent to now exit and write the pid fil