On Sun, 29 Oct 2023 at 11:14, Hayato Kuroda (Fujitsu)
wrote:
>
> Dear Andres,
>
> While tracking BF failures related with pg_ugprade, I found the same failure
> has still happened [1] - [4].
> According to the log, the output directory was remained even after the
> successful upgrade [5].
> I an
Dear Andres,
While tracking BF failures related with pg_ugprade, I found the same failure
has still happened [1] - [4].
According to the log, the output directory was remained even after the
successful upgrade [5].
I analyzed and attached the fix patch, and below is my analysis... how do you
th
On 2023-02-06 14:14:22 -0800, Andres Freund wrote:
> On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> > What I see is that there were 1254 FreeBSD tasks run in that window, of
> > which 163 failed, and (more interestingly) 111 of those failures succeeded
> > on every other platform. And clickin
Hi,
On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD. I'll go and ping that
On Tue, Feb 7, 2023 at 11:03 AM Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD. I'll go and ping that t
On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote:
> On February 6, 2023 1:51:20 PM PST, Thomas Munro
> wrote:
> >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> >builds on FreeBSD. I'll go and ping that thread...
>
> Is that rate unchanged? I thought I fixed the main
Hi,
On February 6, 2023 1:51:20 PM PST, Thomas Munro wrote:
>Next up: the new "running" tests, spuriously failing around 8.8% of CI
>builds on FreeBSD. I'll go and ping that thread...
Is that rate unchanged? I thought I fixed the main issue last week?
Greetings,
Andres
--
Sent from my Andro
On Wed, Feb 1, 2023 at 2:44 PM Thomas Munro wrote:
> OK, I pushed that. Third time lucky?
I pulled down logs for a week of Windows CI, just over ~1k builds.
The failure rate was a few per day before, but there are no failures
like that after that went in. There are logs that contain the
"Direct
On Wed, Feb 1, 2023 at 10:08 AM Thomas Munro wrote:
> On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote:
> > Maybe we should just handle it by sleeping and retrying, if on windows? Sad
> > to even propose...
>
> Yeah, that's what that code I posted would do automatically, though
> it's a bit h
On Wed, Feb 1, 2023 at 9:54 AM Thomas Munro wrote:
> ... I have one more idea ...
I also had a second idea, barely good enough to mention and probably
just paranoia. In a nearby thread I learned that process exit does
not release Windows advisory file locks synchronously, which surprised
this Un
On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote:
> On January 31, 2023 12:54:42 PM PST, Thomas Munro
> wrote:
> >I'm not sure about anything, but if that's what's happening here, then
> >maybe the attached would help. In short, it would make the previous
> >theory true (the idea of a second
Hi,
On January 31, 2023 12:54:42 PM PST, Thomas Munro
wrote:
>On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote:
>> > I pushed the rmtree() change. Let's see if that helps, or tells us
>> > something new.
>>
>> I found a few failures since then:
>>
>> https://api.cirrus-ci.com/v1/artifact/ta
On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote:
> > I pushed the rmtree() change. Let's see if that helps, or tells us
> > something new.
>
> I found a few failures since then:
>
> https://api.cirrus-ci.com/v1/artifact/task/6696942420361216/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/
On Tue, Jan 31, 2023 at 02:00:05PM +1300, Thomas Munro wrote:
> On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote:
> > On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> > > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > > > So [1] on its own didn't fix this. My next guess is that the
On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote:
> On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > > So [1] on its own didn't fix this. My next guess is that the attached
> > > might help.
>
> > What is our plan here? This afaict is
On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote:
> On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> > So [1] on its own didn't fix this. My next guess is that the attached
> > might help.
> What is our plan here? This afaict is the most common "false positive" for
> cfbot in the last weeks
Hi,
On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> So [1] on its own didn't fix this. My next guess is that the attached
> might help.
>
> Hmm. Following Michael's clue that this might involve log files and
> pg_ctl, I noticed one thing: pg_ctl implements
> wait_for_postmaster_stop() by wa
On Tue, Nov 08, 2022 at 01:16:09AM +1300, Thomas Munro wrote:
> So [1] on its own didn't fix this. My next guess is that the attached
> might help.
I took the liberty of adding a CF entry for this
https://commitfest.postgresql.org/41/4011/
And afterwards figured I could be a little bit wasteful
So [1] on its own didn't fix this. My next guess is that the attached
might help.
Hmm. Following Michael's clue that this might involve log files and
pg_ctl, I noticed one thing: pg_ctl implements
wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and
our kill emulation does CallNam
Hi,
On 2022-10-17 23:31:44 -0500, Justin Pryzby wrote:
> On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote:
> > On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> > > * Server 2019, as used on CI, still uses the traditional NT semantics
> > > (unlink is asynchronous, whe
On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote:
> On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> > * Server 2019, as used on CI, still uses the traditional NT semantics
> > (unlink is asynchronous, when all handles closes)
> > * the fix I proposed has the right eff
On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote:
> * Server 2019, as used on CI, still uses the traditional NT semantics
> (unlink is asynchronous, when all handles closes)
> * the fix I proposed has the right effect (I will follow up with tests
> to demonstrate)
Wow, nice investigati
On Mon, Oct 3, 2022 at 7:29 PM Michael Paquier wrote:
> On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> > So I think that setting is_lnk = false is good enough here. Do
> > you see a hole in it?
>
> I cannot think on one, on top of my head. Thanks for the
> explanation.
Some thi
On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> So I think that setting is_lnk = false is good enough here. Do
> you see a hole in it?
I cannot think on one, on top of my head. Thanks for the
explanation.
--
Michael
signature.asc
Description: PGP signature
On Mon, Oct 3, 2022 at 1:40 PM Michael Paquier wrote:
> On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote:
> > I think something like the attached should do the right thing for
> > STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes
> > back to being blocking (sleep+ret
On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote:
> I think something like the attached should do the right thing for
> STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes
> back to being blocking (sleep+retry until eventually we reach ENOENT
> or we time out and give u
On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro wrote:
> On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote:
> > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so
> > didn't warn about the file itself, but then failed one moment later in
> > rmdir.
>
> Yeah, I think this is my fa
On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote:
> I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so
> didn't warn about the file itself, but then failed one moment later in
> rmdir.
Yeah, I think this is my fault. In commit f357233c the new lstat()
call might return ENOE
Hi,
On 2022-09-27 11:47:37 +0530, Bharath Rupireddy wrote:
> Just for the records - the same issue was also seen here [1], [2].
>
> [1] https://cirrus-ci.com/task/5709014662119424?logs=check_world#L82
> [2]
> https://api.cirrus-ci.com/v1/artifact/task/5709014662119424/testrun/build/testrun/pg_up
On Tue, Sep 20, 2022 at 7:01 AM Justin Pryzby wrote:
>
> On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> > Hi,
> >
> > After my last rebase of the meson tree I encountered the following test
> > failure:
> >
> > https://cirrus-ci.com/task/5532444261613568
> >
> > [20:23:04.171] --
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> Hi,
>
> After my last rebase of the meson tree I encountered the following test
> failure:
>
> https://cirrus-ci.com/task/5532444261613568
>
> [20:23:04.171] - 8<
> -
On Mon, Sep 19, 2022 at 06:13:17PM -0700, Andres Freund wrote:
> I don't really see what'd race with what here? pg_upgrade has precise control
> over what's happening here, no?
A code path could have forgotten a fclose() for example, but this code
is rather old and close-proof as far as I know. M
Hi,
On 2022-09-20 10:08:41 +0900, Michael Paquier wrote:
> On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> > I don't know if actually related to the commit below, but there've been a
> > lot of runs of the pg_upgrade tests in the meson branch, and this is the
> > first
> > failur
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote:
> I don't know if actually related to the commit below, but there've been a
> lot of runs of the pg_upgrade tests in the meson branch, and this is the first
> failure of this kind. Unfortunately the error seems to be transient -
> rerun
34 matches
Mail list logo