Re: Crash in new pgstats code

2022-05-12 Thread Michael Paquier
On Thu, May 12, 2022 at 09:33:05AM -0700, Andres Freund wrote: > On 2022-05-12 12:12:59 -0400, Tom Lane wrote: >> But we have not seen any pgstat crashes lately, so I'm content to mark the >> open item as resolved. > > Cool. Okay, thanks for the feedback. I have marked the item as resolved for t

Re: Crash in new pgstats code

2022-05-12 Thread Andres Freund
Hi, On 2022-05-12 12:12:59 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-11 15:46:13 +0900, Michael Paquier wrote: > >> Do we have anything remaining on this thread in light of the upcoming > >> beta1? One fix has been pushed upthread, but it does not seem we are > >> completely

Re: Crash in new pgstats code

2022-05-12 Thread Tom Lane
Andres Freund writes: > On 2022-05-11 15:46:13 +0900, Michael Paquier wrote: >> Do we have anything remaining on this thread in light of the upcoming >> beta1? One fix has been pushed upthread, but it does not seem we are >> completely in the clear either. > I don't know what else there is to do

Re: Crash in new pgstats code

2022-05-12 Thread Andres Freund
Hi, On 2022-05-11 15:46:13 +0900, Michael Paquier wrote: > On Tue, Apr 19, 2022 at 08:31:05PM +1200, Thomas Munro wrote: > > On Tue, Apr 19, 2022 at 2:50 AM Andres Freund wrote: > > > Kestrel won't go that far back even - I set it up 23 days ago... > > > > Here's a ~6 month old example from mylo

Re: Crash in new pgstats code

2022-05-10 Thread Michael Paquier
On Tue, Apr 19, 2022 at 08:31:05PM +1200, Thomas Munro wrote: > On Tue, Apr 19, 2022 at 2:50 AM Andres Freund wrote: > > Kestrel won't go that far back even - I set it up 23 days ago... > > Here's a ~6 month old example from mylodon (I can't see much further > back than that with HTTP requests...

Re: Crash in new pgstats code

2022-04-19 Thread Thomas Munro
On Tue, Apr 19, 2022 at 2:50 AM Andres Freund wrote: > Kestrel won't go that far back even - I set it up 23 days ago... Here's a ~6 month old example from mylodon (I can't see much further back than that with HTTP requests... I guess BF records are purged?): https://buildfarm.postgresql.org/cgi-

Re: Crash in new pgstats code

2022-04-18 Thread Andres Freund
Hi, On 2022-04-18 22:45:07 +1200, Thomas Munro wrote: > On Mon, Apr 18, 2022 at 7:19 PM Michael Paquier wrote: > > On Sat, Apr 16, 2022 at 02:36:33PM -0700, Andres Freund wrote: > > > which I haven't seen locally. Looks like we have some race between > > > startup process and walreceiver? That se

Re: Crash in new pgstats code

2022-04-18 Thread Thomas Munro
On Mon, Apr 18, 2022 at 7:19 PM Michael Paquier wrote: > On Sat, Apr 16, 2022 at 02:36:33PM -0700, Andres Freund wrote: > > which I haven't seen locally. Looks like we have some race between > > startup process and walreceiver? That seems not great. I'm a bit > > confused that walreceiver and arc

Re: Crash in new pgstats code

2022-04-18 Thread Michael Paquier
On Sat, Apr 16, 2022 at 02:36:33PM -0700, Andres Freund wrote: > which I haven't seen locally. Looks like we have some race between > startup process and walreceiver? That seems not great. I'm a bit > confused that walreceiver and archiving are both active at the same time > in the first place - t

Re: Crash in new pgstats code

2022-04-16 Thread Andres Freund
Hi, On 2022-04-16 12:13:09 -0700, Andres Freund wrote: > On 2022-04-15 13:28:35 -0400, Tom Lane wrote: > > mylodon just showed a new-to-me failure mode [1]: > > Thanks. Found the bug (pgstat_drop_all_entries() passed the wrong lock > level), with the obvious fix. > > This failed to fail in other

Re: Crash in new pgstats code

2022-04-16 Thread Andres Freund
Hi On 2022-04-16 12:13:09 -0700, Andres Freund wrote: > What confuses me so far is what already had generated stats before > reaching pgstat_reset_after_failure() (so that the bug could even be hit > in t/025_stuck_on_old_timeline.pl). I see part of a problem - in archiver stats. Even in 14 (and

Re: Crash in new pgstats code

2022-04-16 Thread Andres Freund
Hi On 2022-04-15 13:28:35 -0400, Tom Lane wrote: > mylodon just showed a new-to-me failure mode [1]: Thanks. Found the bug (pgstat_drop_all_entries() passed the wrong lock level), with the obvious fix. This failed to fail in other tests because they all end up resetting only when there's no stat

Re: Crash in new pgstats code

2022-04-15 Thread Tom Lane
I wrote: > mylodon just showed a new-to-me failure mode [1]: Another occurrence here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-04-15%2022%3A42%3A07 I've added an open item. regards, tom lane

Crash in new pgstats code

2022-04-15 Thread Tom Lane
mylodon just showed a new-to-me failure mode [1]: Core was generated by `postgres: cascade: startup recovering 00010002'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 ../sysdeps/u