Re: Snapshot related assert failure on skink

2025-04-05 Thread Tomas Vondra
On 3/24/25 16:25, Heikki Linnakangas wrote: > On 24/03/2025 16:56, Tomas Vondra wrote: >> >> >> On 3/23/25 17:43, Heikki Linnakangas wrote: >>> On 21/03/2025 17:16, Andres Freund wrote: Am I right in understanding that the only scenario (when in STANDBY_SNAPSHOT_READY), where ExpireOld

Re: Snapshot related assert failure on skink

2025-03-27 Thread Heikki Linnakangas
On 21/03/2025 12:28, Tomas Vondra wrote: But it seems it changed in 952365cded6, which is: commit 952365cded635e54c4177399c0280cb7a5e34c11 Author: Heikki Linnakangas Date: Mon Dec 23 12:42:39 2024 +0200 Remove unnecessary GetTransactionSnapshot() calls In get_databa

Re: Snapshot related assert failure on skink

2025-03-24 Thread Heikki Linnakangas
On 21/03/2025 17:16, Andres Freund wrote: Am I right in understanding that the only scenario (when in STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds() would "legally" remove a transaction, rather than the commit / abort records doing so, is if the primary crash-restarted whil

Re: Snapshot related assert failure on skink

2025-03-24 Thread Heikki Linnakangas
On 24/03/2025 16:56, Tomas Vondra wrote: On 3/23/25 17:43, Heikki Linnakangas wrote: On 21/03/2025 17:16, Andres Freund wrote: Am I right in understanding that the only scenario (when in STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds() would "legally" remove a transaction

Re: Snapshot related assert failure on skink

2025-03-24 Thread Tomas Vondra
On 3/23/25 17:43, Heikki Linnakangas wrote: > On 21/03/2025 17:16, Andres Freund wrote: >> Am I right in understanding that the only scenario (when in >> STANDBY_SNAPSHOT_READY), where ExpireOldKnownAssignedTransactionIds() >> would >> "legally" remove a transaction, rather than the commit / abo

Re: Snapshot related assert failure on skink

2025-03-21 Thread Andres Freund
Hi, On 2025-03-19 09:17:23 +0200, Heikki Linnakangas wrote: > On 19/03/2025 04:22, Tomas Vondra wrote: > > I kept stress-testing this, and while the frequency massively increased > > on PG18, I managed to reproduce this all the way back to PG14. I see > > ~100x more corefiles on PG18. > > > > Tha

Re: Snapshot related assert failure on skink

2025-03-21 Thread Tomas Vondra
On 3/19/25 13:27, Tomas Vondra wrote: > On 3/19/25 08:17, Heikki Linnakangas wrote: >> On 19/03/2025 04:22, Tomas Vondra wrote: >>> I kept stress-testing this, and while the frequency massively increased >>> on PG18, I managed to reproduce this all the way back to PG14. I see >>> ~100x more corefil

Re: Snapshot related assert failure on skink

2025-03-19 Thread Tomas Vondra
On 3/19/25 08:17, Heikki Linnakangas wrote: > On 19/03/2025 04:22, Tomas Vondra wrote: >> I kept stress-testing this, and while the frequency massively increased >> on PG18, I managed to reproduce this all the way back to PG14. I see >> ~100x more corefiles on PG18. >> >> That is not a proof the is

Re: Snapshot related assert failure on skink

2025-03-19 Thread Heikki Linnakangas
On 19/03/2025 04:22, Tomas Vondra wrote: I kept stress-testing this, and while the frequency massively increased on PG18, I managed to reproduce this all the way back to PG14. I see ~100x more corefiles on PG18. That is not a proof the issue was introduced in PG14, maybe it's just the assert tha

Re: Snapshot related assert failure on skink

2025-03-18 Thread Tomas Vondra
I kept stress-testing this, and while the frequency massively increased on PG18, I managed to reproduce this all the way back to PG14. I see ~100x more corefiles on PG18. That is not a proof the issue was introduced in PG14, maybe it's just the assert that was added there or something. Or maybe th

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/17/25 13:18, Thomas Munro wrote: > On Tue, Mar 18, 2025 at 12:59 AM Tomas Vondra wrote: >> On 3/17/25 12:36, Tomas Vondra wrote: >>> I'm still fiddling with the script, trying to increase the probability >>> of the (apparent) race condition. On one machine (old Xeon) I can hit it >>> very

Re: Snapshot related assert failure on skink

2025-03-17 Thread Thomas Munro
On Tue, Mar 18, 2025 at 12:59 AM Tomas Vondra wrote: > On 3/17/25 12:36, Tomas Vondra wrote: > > I'm still fiddling with the script, trying to increase the probability > > of the (apparent) race condition. On one machine (old Xeon) I can hit it > > very easily/reliably, while on a different machin

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/17/25 12:36, Tomas Vondra wrote: > ... > > I'm still fiddling with the script, trying to increase the probability > of the (apparent) race condition. On one machine (old Xeon) I can hit it > very easily/reliably, while on a different machine (new Ryzen) it's very > rare. I don't know if that'

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/4/25 23:25, Andres Freund wrote: > Hi, > > I just saw a BF failure on skink (valgrind) that asserts out. > > Check the 002_compare_backups failure in: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2017%3A35%3A01 > > TRAP: failed Assert("TransactionIdPrecedesO

Snapshot related assert failure on skink

2025-03-04 Thread Andres Freund
Hi, I just saw a BF failure on skink (valgrind) that asserts out. Check the 002_compare_backups failure in: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2017%3A35%3A01 TRAP: failed Assert("TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)"), File: "../pgs