Re: pg_upgrade test failure

2024-01-04 Thread vignesh C
On Sun, 29 Oct 2023 at 11:14, Hayato Kuroda (Fujitsu) wrote: > > Dear Andres, > > While tracking BF failures related with pg_ugprade, I found the same failure > has still happened [1] - [4]. > According to the log, the output directory was remained even after the > successful upgrade [5]. > I an

RE: pg_upgrade test failure

2023-10-28 Thread Hayato Kuroda (Fujitsu)
Dear Andres, While tracking BF failures related with pg_ugprade, I found the same failure has still happened [1] - [4]. According to the log, the output directory was remained even after the successful upgrade [5]. I analyzed and attached the fix patch, and below is my analysis... how do you th

Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
On 2023-02-06 14:14:22 -0800, Andres Freund wrote: > On 2023-02-07 11:03:18 +1300, Thomas Munro wrote: > > What I see is that there were 1254 FreeBSD tasks run in that window, of > > which 163 failed, and (more interestingly) 111 of those failures succeeded > > on every other platform. And clickin

Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
Hi, On 2023-02-07 11:03:18 +1300, Thomas Munro wrote: > On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote: > > On February 6, 2023 1:51:20 PM PST, Thomas Munro > > wrote: > > >Next up: the new "running" tests, spuriously failing around 8.8% of CI > > >builds on FreeBSD. I'll go and ping that

Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 11:03 AM Thomas Munro wrote: > On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote: > > On February 6, 2023 1:51:20 PM PST, Thomas Munro > > wrote: > > >Next up: the new "running" tests, spuriously failing around 8.8% of CI > > >builds on FreeBSD. I'll go and ping that t

Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 10:57 AM Andres Freund wrote: > On February 6, 2023 1:51:20 PM PST, Thomas Munro > wrote: > >Next up: the new "running" tests, spuriously failing around 8.8% of CI > >builds on FreeBSD. I'll go and ping that thread... > > Is that rate unchanged? I thought I fixed the main

Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
Hi, On February 6, 2023 1:51:20 PM PST, Thomas Munro wrote: >Next up: the new "running" tests, spuriously failing around 8.8% of CI >builds on FreeBSD. I'll go and ping that thread... Is that rate unchanged? I thought I fixed the main issue last week? Greetings, Andres -- Sent from my Andro

Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Wed, Feb 1, 2023 at 2:44 PM Thomas Munro wrote: > OK, I pushed that. Third time lucky? I pulled down logs for a week of Windows CI, just over ~1k builds. The failure rate was a few per day before, but there are no failures like that after that went in. There are logs that contain the "Direct

Re: pg_upgrade test failure

2023-01-31 Thread Thomas Munro
On Wed, Feb 1, 2023 at 10:08 AM Thomas Munro wrote: > On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote: > > Maybe we should just handle it by sleeping and retrying, if on windows? Sad > > to even propose... > > Yeah, that's what that code I posted would do automatically, though > it's a bit h

Re: pg_upgrade test failure

2023-01-31 Thread Thomas Munro
On Wed, Feb 1, 2023 at 9:54 AM Thomas Munro wrote: > ... I have one more idea ... I also had a second idea, barely good enough to mention and probably just paranoia. In a nearby thread I learned that process exit does not release Windows advisory file locks synchronously, which surprised this Un

Re: pg_upgrade test failure

2023-01-31 Thread Thomas Munro
On Wed, Feb 1, 2023 at 10:04 AM Andres Freund wrote: > On January 31, 2023 12:54:42 PM PST, Thomas Munro > wrote: > >I'm not sure about anything, but if that's what's happening here, then > >maybe the attached would help. In short, it would make the previous > >theory true (the idea of a second

Re: pg_upgrade test failure

2023-01-31 Thread Andres Freund
Hi, On January 31, 2023 12:54:42 PM PST, Thomas Munro wrote: >On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote: >> > I pushed the rmtree() change. Let's see if that helps, or tells us >> > something new. >> >> I found a few failures since then: >> >> https://api.cirrus-ci.com/v1/artifact/ta

Re: pg_upgrade test failure

2023-01-31 Thread Thomas Munro
On Wed, Feb 1, 2023 at 6:28 AM Justin Pryzby wrote: > > I pushed the rmtree() change. Let's see if that helps, or tells us > > something new. > > I found a few failures since then: > > https://api.cirrus-ci.com/v1/artifact/task/6696942420361216/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/

Re: pg_upgrade test failure

2023-01-31 Thread Justin Pryzby
On Tue, Jan 31, 2023 at 02:00:05PM +1300, Thomas Munro wrote: > On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote: > > On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote: > > > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > > > > So [1] on its own didn't fix this. My next guess is that the

Re: pg_upgrade test failure

2023-01-30 Thread Thomas Munro
On Thu, Jan 5, 2023 at 4:11 PM Thomas Munro wrote: > On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote: > > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > > > So [1] on its own didn't fix this. My next guess is that the attached > > > might help. > > > What is our plan here? This afaict is

Re: pg_upgrade test failure

2023-01-04 Thread Thomas Munro
On Wed, Dec 7, 2022 at 7:15 AM Andres Freund wrote: > On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > > So [1] on its own didn't fix this. My next guess is that the attached > > might help. > What is our plan here? This afaict is the most common "false positive" for > cfbot in the last weeks

Re: pg_upgrade test failure

2022-12-06 Thread Andres Freund
Hi, On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > So [1] on its own didn't fix this. My next guess is that the attached > might help. > > Hmm. Following Michael's clue that this might involve log files and > pg_ctl, I noticed one thing: pg_ctl implements > wait_for_postmaster_stop() by wa

Re: pg_upgrade test failure

2022-11-16 Thread Justin Pryzby
On Tue, Nov 08, 2022 at 01:16:09AM +1300, Thomas Munro wrote: > So [1] on its own didn't fix this. My next guess is that the attached > might help. I took the liberty of adding a CF entry for this https://commitfest.postgresql.org/41/4011/ And afterwards figured I could be a little bit wasteful

Re: pg_upgrade test failure

2022-11-07 Thread Thomas Munro
So [1] on its own didn't fix this. My next guess is that the attached might help. Hmm. Following Michael's clue that this might involve log files and pg_ctl, I noticed one thing: pg_ctl implements wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and our kill emulation does CallNam

Re: pg_upgrade test failure

2022-10-18 Thread Andres Freund
Hi, On 2022-10-17 23:31:44 -0500, Justin Pryzby wrote: > On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote: > > On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote: > > > * Server 2019, as used on CI, still uses the traditional NT semantics > > > (unlink is asynchronous, whe

Re: pg_upgrade test failure

2022-10-17 Thread Justin Pryzby
On Tue, Oct 18, 2022 at 01:06:15PM +0900, Michael Paquier wrote: > On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote: > > * Server 2019, as used on CI, still uses the traditional NT semantics > > (unlink is asynchronous, when all handles closes) > > * the fix I proposed has the right eff

Re: pg_upgrade test failure

2022-10-17 Thread Michael Paquier
On Tue, Oct 18, 2022 at 09:47:37AM +1300, Thomas Munro wrote: > * Server 2019, as used on CI, still uses the traditional NT semantics > (unlink is asynchronous, when all handles closes) > * the fix I proposed has the right effect (I will follow up with tests > to demonstrate) Wow, nice investigati

Re: pg_upgrade test failure

2022-10-17 Thread Thomas Munro
On Mon, Oct 3, 2022 at 7:29 PM Michael Paquier wrote: > On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote: > > So I think that setting is_lnk = false is good enough here. Do > > you see a hole in it? > > I cannot think on one, on top of my head. Thanks for the > explanation. Some thi

Re: pg_upgrade test failure

2022-10-02 Thread Michael Paquier
On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote: > So I think that setting is_lnk = false is good enough here. Do > you see a hole in it? I cannot think on one, on top of my head. Thanks for the explanation. -- Michael signature.asc Description: PGP signature

Re: pg_upgrade test failure

2022-10-02 Thread Thomas Munro
On Mon, Oct 3, 2022 at 1:40 PM Michael Paquier wrote: > On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote: > > I think something like the attached should do the right thing for > > STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes > > back to being blocking (sleep+ret

Re: pg_upgrade test failure

2022-10-02 Thread Michael Paquier
On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote: > I think something like the attached should do the right thing for > STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes > back to being blocking (sleep+retry until eventually we reach ENOENT > or we time out and give u

Re: pg_upgrade test failure

2022-10-02 Thread Thomas Munro
On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro wrote: > On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote: > > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so > > didn't warn about the file itself, but then failed one moment later in > > rmdir. > > Yeah, I think this is my fa

Re: pg_upgrade test failure

2022-10-02 Thread Thomas Munro
On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby wrote: > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so > didn't warn about the file itself, but then failed one moment later in > rmdir. Yeah, I think this is my fault. In commit f357233c the new lstat() call might return ENOE

Re: pg_upgrade test failure

2022-10-02 Thread Andres Freund
Hi, On 2022-09-27 11:47:37 +0530, Bharath Rupireddy wrote: > Just for the records - the same issue was also seen here [1], [2]. > > [1] https://cirrus-ci.com/task/5709014662119424?logs=check_world#L82 > [2] > https://api.cirrus-ci.com/v1/artifact/task/5709014662119424/testrun/build/testrun/pg_up

Re: pg_upgrade test failure

2022-09-26 Thread Bharath Rupireddy
On Tue, Sep 20, 2022 at 7:01 AM Justin Pryzby wrote: > > On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote: > > Hi, > > > > After my last rebase of the meson tree I encountered the following test > > failure: > > > > https://cirrus-ci.com/task/5532444261613568 > > > > [20:23:04.171] --

Re: pg_upgrade test failure

2022-09-19 Thread Justin Pryzby
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote: > Hi, > > After my last rebase of the meson tree I encountered the following test > failure: > > https://cirrus-ci.com/task/5532444261613568 > > [20:23:04.171] - 8< > -

Re: pg_upgrade test failure

2022-09-19 Thread Michael Paquier
On Mon, Sep 19, 2022 at 06:13:17PM -0700, Andres Freund wrote: > I don't really see what'd race with what here? pg_upgrade has precise control > over what's happening here, no? A code path could have forgotten a fclose() for example, but this code is rather old and close-proof as far as I know. M

Re: pg_upgrade test failure

2022-09-19 Thread Andres Freund
Hi, On 2022-09-20 10:08:41 +0900, Michael Paquier wrote: > On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote: > > I don't know if actually related to the commit below, but there've been a > > lot of runs of the pg_upgrade tests in the meson branch, and this is the > > first > > failur

Re: pg_upgrade test failure

2022-09-19 Thread Michael Paquier
On Mon, Sep 19, 2022 at 02:32:17PM -0700, Andres Freund wrote: > I don't know if actually related to the commit below, but there've been a > lot of runs of the pg_upgrade tests in the meson branch, and this is the first > failure of this kind. Unfortunately the error seems to be transient - > rerun