Re: Recent 027_streaming_regress.pl hangs

2024-08-12 Thread Tom Lane
Andrew Dunstan writes: > On 2024-08-11 Su 8:32 PM, Tom Lane wrote: >> I think we need more data. We know that the >> wait_for_catchup query is never getting to true: >> >> SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' >> >> but we don't know if the LSN condition or the state condi

Re: Recent 027_streaming_regress.pl hangs

2024-08-12 Thread Andrew Dunstan
On 2024-08-11 Su 8:32 PM, Tom Lane wrote: Andrew Dunstan writes: We'll see. I have switched crake from --run-parallel mode to --run-all mode i.e. the runs are serialized. Maybe that will be enough to stop the errors. I'm still annoyed that this test is susceptible to load, if that is indeed wh

Re: Recent 027_streaming_regress.pl hangs

2024-08-11 Thread Tom Lane
Andrew Dunstan writes: > We'll see. I have switched crake from --run-parallel mode to --run-all > mode i.e. the runs are serialized. Maybe that will be enough to stop the > errors. I'm still annoyed that this test is susceptible to load, if that > is indeed what is the issue. crake is still ti

Re: Recent 027_streaming_regress.pl hangs

2024-07-31 Thread Andrew Dunstan
On 2024-07-31 We 12:05 PM, Tom Lane wrote: Andrew Dunstan writes: There seem to be a bunch of recent failures, and not just on crake, and not just on HEAD: There were a batch of

Re: Recent 027_streaming_regress.pl hangs

2024-07-31 Thread Tom Lane
Andrew Dunstan writes: > There seem to be a bunch of recent failures, and not just on crake, and > not just on HEAD: > There were a batch of recovery-stage failures ending about 9 d

Re: Recent 027_streaming_regress.pl hangs

2024-07-31 Thread Andrew Dunstan
On 2024-07-25 Th 6:33 PM, Andrew Dunstan wrote: On 2024-07-25 Th 5:14 PM, Tom Lane wrote: I wrote: I'm confused by crake's buildfarm logs. AFAICS it is not running recovery-check at all in most of the runs; at least there is no mention of that step, for example here: https://buildfarm.postg

Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Andrew Dunstan
On 2024-07-25 Th 5:14 PM, Tom Lane wrote: I wrote: I'm confused by crake's buildfarm logs. AFAICS it is not running recovery-check at all in most of the runs; at least there is no mention of that step, for example here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-2

Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Thomas Munro
On Fri, Jul 26, 2024 at 9:14 AM Tom Lane wrote: > Based on this, it seems fairly likely that crake is simply timing out > as a consequence of intermittent heavy background activity. Would it be better to keep going as long as progress is being made? I.e. time out only when the relevant LSN stops

Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Tom Lane
I wrote: > I'm confused by crake's buildfarm logs. AFAICS it is not running > recovery-check at all in most of the runs; at least there is no > mention of that step, for example here: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-07-25%2013%3A27%3A02 Oh, I see it: the lo

Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Tom Lane
Andrew Dunstan writes: > But yes we do seem to have seen a lot of recovery_check failures on > crake in the last 8 days, which is roughly when I changed PG_TEST_EXTRA > to get more coverage. I'm confused by crake's buildfarm logs. AFAICS it is not running recovery-check at all in most of the r

Re: Recent 027_streaming_regress.pl hangs

2024-07-25 Thread Andrew Dunstan
On 2024-07-25 Th 12:00 AM, Alexander Lakhin wrote: Hello Andrew, 04.06.2024 13:00, Alexander Lakhin wrote: Also, 027_stream_regress still fails due to the same reason: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03 https://buildfarm.postgresql.o

Re: Recent 027_streaming_regress.pl hangs

2024-07-24 Thread Alexander Lakhin
Hello Andrew, 04.06.2024 13:00, Alexander Lakhin wrote: Also, 027_stream_regress still fails due to the same reason: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-05-22%202

Re: Recent 027_streaming_regress.pl hangs

2024-06-04 Thread Alexander Lakhin
Hello Andres, So it looks like the issue resolved, but there is another apparently performance-related issue: deadlock-parallel test failures. I reduced test concurrency a bit. I hadn't quite realized how the buildfarm config and meson test concurrency interact. But there's still something off

Re: Recent 027_streaming_regress.pl hangs

2024-04-04 Thread Andres Freund
Hi, On 2024-04-04 19:00:00 +0300, Alexander Lakhin wrote: > 26.03.2024 10:59, Andres Freund wrote: > > Late, will try to look more in the next few days. > > > > AFAICS, last 027_streaming_regress.pl failures on calliphoridae, > culicidae, tamandua occurred before 2024-03-27: > https://buildfarm.

Re: Recent 027_streaming_regress.pl hangs

2024-04-04 Thread Alexander Lakhin
Hello Andres, 26.03.2024 10:59, Andres Freund wrote: Late, will try to look more in the next few days. AFAICS, last 027_streaming_regress.pl failures on calliphoridae, culicidae, tamandua occurred before 2024-03-27: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024

Re: Recent 027_streaming_regress.pl hangs

2024-03-26 Thread Andres Freund
Hi, On 2024-03-26 00:54:54 -0400, Tom Lane wrote: > > I guess I'll try to write a buildfarm database query to extract how long > > that > > phase of the test took from all runs on my menagerie, not just the failing > > one, and see if there's a visible trend. > > +1 Only the query for successful

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Tom Lane
Andres Freund writes: > On 2024-03-26 00:00:38 -0400, Tom Lane wrote: >> Are you sure it's not just that the total time to run the core >> regression tests has grown to a bit more than what the test timeout >> allows for? > You're right, that could be it - in a way at least, the issue is replay n

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Andres Freund
Hi, On 2024-03-26 00:00:38 -0400, Tom Lane wrote: > Andres Freund writes: > > I think there must be some actual regression involved. The frequency of > > failures on HEAD vs failures on 16 - both of which run the tests > > concurrently > > via meson - is just vastly different. > > Are you sure i

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Tom Lane
Andres Freund writes: > I think there must be some actual regression involved. The frequency of > failures on HEAD vs failures on 16 - both of which run the tests concurrently > via meson - is just vastly different. Are you sure it's not just that the total time to run the core regression tests h

Re: Recent 027_streaming_regress.pl hangs

2024-03-25 Thread Andres Freund
Hi, On 2024-03-20 17:41:45 -0700, Andres Freund wrote: > On 2024-03-14 16:56:39 -0400, Tom Lane wrote: > > Also, this is probably not > > helping anything: > > > >'extra_config' => { > > ... > >

Re: Recent 027_streaming_regress.pl hangs

2024-03-20 Thread Andres Freund
Hi, On 2024-03-20 17:41:45 -0700, Andres Freund wrote: > 2024-03-20 22:14:01.904 UTC [56343][client backend][6/1925:0] LOG: > connection authorized: user=bf database=postgres > application_name=027_stream_regress.pl > 2024-03-20 22:14:01.930 UTC [56343][client backend][6/1926:0] LOG: > statem

Re: Recent 027_streaming_regress.pl hangs

2024-03-20 Thread Andres Freund
Hi, On 2024-03-20 17:41:47 -0700, Andres Freund wrote: > There's a lot of other animals on the same machine, however it's rarely fuly > loaded (with either CPU or IO). > > I don't think the test just being slow is the issue here, e.g. in the last > failing iteration > > [...] > > I suspect we have

Re: Recent 027_streaming_regress.pl hangs

2024-03-20 Thread Andres Freund
Hi, On 2024-03-14 16:56:39 -0400, Tom Lane wrote: > Thomas Munro writes: > > On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin > > wrote: > >> Could it be that the timeout (360 sec?) is just not enough for the test > >> under the current (changed due to switch to meson) conditions? > > > But yo

Re: Recent 027_streaming_regress.pl hangs

2024-03-19 Thread Alexander Lakhin
14.03.2024 23:56, Tom Lane wrote: Thomas Munro writes: On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: Could it be that the timeout (360 sec?) is just not enough for the test under the current (changed due to switch to meson) conditions? But you're right that under meson the test tak

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Tom Lane
Thomas Munro writes: > On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: >> Could it be that the timeout (360 sec?) is just not enough for the test >> under the current (changed due to switch to meson) conditions? > But you're right that under meson the test takes a lot longer, I guess > d

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 7:00 AM Alexander Lakhin wrote: > Could it be that the timeout (360 sec?) is just not enough for the test > under the current (changed due to switch to meson) conditions? Hmm, well it looks like he switched over to meson around 42 days ago 2024-02-01, looking at "calliphor

Re: Recent 027_streaming_regress.pl hangs

2024-03-14 Thread Alexander Lakhin
Hello Thomas and Michael, 14.03.2024 06:16, Thomas Munro wrote: Yeah, I was wondering if its checkpoint delaying logic might have got the checkpointer jammed or something like that, but I don't currently see how. Yeah, the replay of bulk newpages could be relevant, but it's not exactly new tec

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Thomas Munro
On Thu, Mar 14, 2024 at 3:27 PM Michael Paquier wrote: > Hmm. Perhaps 8af25652489? That looks like the closest thing in the > list that could have played with the way WAL is generated, hence > potentially impacting the records that are replayed. Yeah, I was wondering if its checkpoint delaying

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Michael Paquier
On Thu, Mar 14, 2024 at 03:00:28PM +1300, Thomas Munro wrote: > Assuming it is due to a commit in master, and given the failure > frequency, I think it is very likely to be a change from this 3 day > window of commits, and more likely in the top half dozen or so: > > d360e3cc60e Fix compiler warni

Re: Recent 027_streaming_regress.pl hangs

2024-03-13 Thread Thomas Munro
On Wed, Mar 13, 2024 at 10:53 AM Thomas Munro wrote: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-02-23%2015%3A44%3A35 Assuming it is due to a commit in master, and given the failure frequency, I think it is very likely to be a change from this 3 day window of commits,