On Tue, Mar 16, 2021 at 6:22 PM osumi.takami...@fujitsu.com
wrote:
>
>
> To me, this correctly works because
> the timing I put the while loop and stops the walsender
> makes the DROP SUBSCRIPTION affects two slots. Any comments ?
>
No, your testing looks fine. I have also done the similar test.
Hi
On Tuesday, March 16, 2021 4:15 PM vignesh C wrote:
> On Tue, Mar 16, 2021 at 12:29 PM Amit Kapila
> wrote:
> >
> > On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila
> wrote:
> > >
> > > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This seems to be a
On Tue, Mar 16, 2021 at 12:29 PM Amit Kapila wrote:
>
> On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila wrote:
> >
> > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro wrote:
> > >
> > > Hi,
> > >
> > > This seems to be a new low frequency failure, I didn't see it mentioned
> > > already:
> > >
> >
> >
On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila wrote:
>
> On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro wrote:
> >
> > Hi,
> >
> > This seems to be a new low frequency failure, I didn't see it mentioned
> > already:
> >
>
> Thanks for reporting, I'll look into it.
>
By looking at the logs [1] in th
Hello
On Tuesday, March 16, 2021 12:31 PM Amit Kapila wrote:
> On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro
> wrote:
> >
> > Hi,
> >
> > This seems to be a new low frequency failure, I didn't see it mentioned
> already:
Oh, this is the test I wrote and included as part of the commit ce0fdbfe
#
On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro wrote:
>
> Hi,
>
> This seems to be a new low frequency failure, I didn't see it mentioned
> already:
>
Thanks for reporting, I'll look into it.
--
With Regards,
Amit Kapila.
Andrew Dunstan writes:
> On 9/20/19 6:17 PM, Tom Lane wrote:
>> Dromedary is running the last release of macOS that supports 32-bit
>> hardware, so if we decide to kick that to the curb, I'd either shut
>> down the box or put some newer Linux or BSD variant on it.
> Well, nightjar is on FBSD 9.0
On 9/20/19 6:17 PM, Tom Lane wrote:
> Alvaro Herrera writes:
>> Uh .. I didn't think it was possible that we would build the same
>> snapshot file more than once. Isn't that a waste of time anyway? Maybe
>> we can fix the symptom by just not doing that in the first place?
>> I don't have a str
On Fri, Sep 20, 2019 at 05:30:48PM +0200, Tomas Vondra wrote:
> But even with that change you haven't managed to reproduce the issue,
> right? Or am I misunderstanding?
No, I was not able to see it on my laptop running Debian.
--
Michael
signature.asc
Description: PGP signature
Alvaro Herrera writes:
> Uh .. I didn't think it was possible that we would build the same
> snapshot file more than once. Isn't that a waste of time anyway? Maybe
> we can fix the symptom by just not doing that in the first place?
> I don't have a strategy to do that, but seems worth considerin
Hi,
On September 20, 2019 3:06:20 PM PDT, Alvaro Herrera
wrote:
>On 2019-Sep-20, Tom Lane wrote:
>
>> Actually, what I did was as attached [1], and I am getting traces
>like
>> [2]. The problem seems to occur only when there are two or three
>> processes concurrently creating the same snapshot
On 2019-Sep-20, Tom Lane wrote:
> Actually, what I did was as attached [1], and I am getting traces like
> [2]. The problem seems to occur only when there are two or three
> processes concurrently creating the same snapshot file. It's not
> obvious from the debug trace, but the snapshot file *do
Hi,
On 2019-09-20 17:49:27 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> >> I recreated my freebsd-9-under-qemu setup and I can still reproduce
> >> the problem, though not with high reliability (order of 1 time in 10).
> >> Anything particular
Sigh, forgot about attaching the attachments ...
regards, tom lane
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0bd1d0f..53fd33c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/repli
Andres Freund writes:
> On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
>> I recreated my freebsd-9-under-qemu setup and I can still reproduce
>> the problem, though not with high reliability (order of 1 time in 10).
>> Anything particular you want logged?
> A DEBUG2 log would help a fair bit, beca
Hi,
On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> Andres Freund writes:
> > Since now a number of people (I tried as well), failed to reproduce this
> > locally, I propose that we increase the log-level during this test on
> > master. And perhaps expand the set of debugging information. With th
Andres Freund writes:
> Since now a number of people (I tried as well), failed to reproduce this
> locally, I propose that we increase the log-level during this test on
> master. And perhaps expand the set of debugging information. With the
> hope that the additional information on the cases encou
Hi,
On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote:
> It seems there is a pattern how the error is occurring in different
> systems. Following are the relevant log snippets:
>
> nightjar:
> sub3 LOG: received replication command: CREATE_REPLICATION_SLOT
> "sub3_16414_sync_16394" TEMPORARY LOGI
On Thu, Sep 19, 2019 at 01:23:05PM +0900, Michael Paquier wrote:
On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
I kinda suspect it might be just a coincidence that it fails during that
particular test. What likely plays a role here is a checkpoint timing
(AFAICS that's the thing r
On Thu, Sep 19, 2019 at 05:20:15PM +0530, Kuntal Ghosh wrote:
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the sa
Hello hackers,
It seems there is a pattern how the error is occurring in different
systems. Following are the relevant log snippets:
nightjar:
sub3 LOG: received replication command: CREATE_REPLICATION_SLOT
"sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
sub3 LOG: logical decodi
On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
> I kinda suspect it might be just a coincidence that it fails during that
> particular test. What likely plays a role here is a checkpoint timing
> (AFAICS that's the thing removing the file). On most systems the tests
> complete befor
On Wed, Sep 18, 2019 at 04:25:14PM +0530, Kuntal Ghosh wrote:
Hello Michael,
On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier wrote:
On my side, I have let this thing run for a couple of hours with a
patched version to include a sleep between the rename and the sync but
I could not reproduce i
Hello Michael,
On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier wrote:
>
> On my side, I have let this thing run for a couple of hours with a
> patched version to include a sleep between the rename and the sync but
> I could not reproduce it either:
> #!/bin/bash
> attempt=0
> while true; do
>
On Tue, Sep 17, 2019 at 09:45:10PM +0200, Tomas Vondra wrote:
> FWIW I agree with Andres that there probably is an actual bug. The file
> should not just disappear like this, it's clearly unexpected so the
> PANIC does not seem entirely inappropriate.
Agreed.
> I've tried reproducing the issue o
On Tue, Sep 17, 2019 at 12:39:33PM -0400, Tom Lane wrote:
Robert Haas writes:
On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
wrote:
This is one of the remaining open items, and we don't seem to be moving
forward with it :-(
Why exactly is this an open item, anyway?
The reason it's still he
Robert Haas writes:
> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
> wrote:
>> This is one of the remaining open items, and we don't seem to be moving
>> forward with it :-(
> Why exactly is this an open item, anyway?
The reason it's still here is that Andres expressed a concern that
there migh
On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
wrote:
> This is one of the remaining open items, and we don't seem to be moving
> forward with it :-(
Why exactly is this an open item, anyway?
I don't find any discussion on the thread which makes a clear argument
that this problem originated with v
On Mon, Aug 26, 2019 at 11:01:20AM -0400, Tom Lane wrote:
Tomas Vondra writes:
I'm willing to take a stab at it, but to do that I need a way to
reproduce it. Tom, you mentioned you've managed to reproduce it in a
qemu instance, but that it took some fiddling with qemu parmeters or
something. Ca
Tomas Vondra writes:
> I'm willing to take a stab at it, but to do that I need a way to
> reproduce it. Tom, you mentioned you've managed to reproduce it in a
> qemu instance, but that it took some fiddling with qemu parmeters or
> something. Can you share what exactly was necessary?
I don't reca
On Tue, Aug 13, 2019 at 05:04:35PM +0900, Michael Paquier wrote:
On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
I'm not yet sure that that's actually something that's supposed to
happen, I got to spend some time analysing how this actually
happens. Normally the contents of the sl
On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're
I wrote:
> My animal dromedary just reproduced this failure, which we've previously
> only seen on nightjar.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45
Twice:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-28%2006
Andres Freund writes:
> On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
>> Just to make sure I understand: it's OK for the file not to be there
>> when we try to fsync it by name, because a concurrent checkpoint can
>> remove it, having determined that we don't need it anymore? In other
>> word
On 2/13/19 1:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Andres Freund writes:
>>> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
Bleah. But in any case, the rename should not create a situation
in which we need to fsync the file data again.
>
Hi,
On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
> On Thu, Feb 14, 2019 at 8:11 AM Tom Lane wrote:
> > Andres Freund writes:
> > > I was kinda pondering just open coding it. I am not yet convinced that
> > > my idea of just using an open FD isn't the least bad approach for the
> > > issue
Thomas Munro writes:
> I found 3 examples of this failing with an ERROR (though not turning
> the BF red, so nobody noticed) before the PANIC patch went in:
Yeah, I suspected that had happened before with less-obvious consequences.
Now that we know where the problem is, you could probably make it
On Thu, Feb 14, 2019 at 8:11 AM Tom Lane wrote:
> Andres Freund writes:
> > I was kinda pondering just open coding it. I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand. What precisely is the NFS issue you're concerned about?
Andres Freund writes:
> I was kinda pondering just open coding it. I am not yet convinced that
> my idea of just using an open FD isn't the least bad approach for the
> issue at hand. What precisely is the NFS issue you're concerned about?
I'm not sure that fsync-on-FD after the rename will wor
Hi,
On 2019-02-13 13:24:03 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> >> Perhaps more to the point, the way this was coded, the PANIC applies
> >> to open() failures in fsync_fname_ext() not just fsync() failures;
> >> that's painting with t
Andres Freund writes:
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?
> That indeed seems wrong. Thomas
Hi,
On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> >> Bleah. But in any case, the rename should not create a situation
> >> in which we need to fsync the file data again.
>
> > Well, it's not super well defined which of
Andres Freund writes:
> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>> Bleah. But in any case, the rename should not create a situation
>> in which we need to fsync the file data again.
> Well, it's not super well defined which of either you need to make the
> rename durable, and it appears to
Hi,
On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> >> I've managed to reproduce this locally, and obtained this PANIC:
>
> > Cool. How exactly?
>
> Andrew told me that nightjar is actually running in a qemu VM,
> so I se
Andres Freund writes:
> On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
>> I've managed to reproduce this locally, and obtained this PANIC:
> Cool. How exactly?
Andrew told me that nightjar is actually running in a qemu VM,
so I set up freebsd 9.0 in a qemu VM, and boom. It took a bit
of fiddling
Hi,
On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:
Cool. How exactly?
Nice catch.
> Anyway, I think we might be able to fix this along the lines of
>
> CloseTransientFile(fd);
>
> + /* ensure snapshot file is down to stab
Thomas Munro writes:
> On Mon, Feb 11, 2019 at 7:31 PM Tom Lane wrote:
>> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file
>> "pg_logical/snapshots/0-160B578.snap": No such file or directory
>
> They get atomically renamed into place, which seems kosher even if
> snapshots
On Mon, Feb 11, 2019 at 7:31 PM Tom Lane wrote:
> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file
> "pg_logical/snapshots/0-160B578.snap": No such file or directory
They get atomically renamed into place, which seems kosher even if
snapshots for the same LSN are created co
48 matches
Mail list logo