Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-12 Thread Michael Paquier
On Mon, Sep 12, 2022 at 04:29:22PM -0500, Justin Pryzby wrote: > After another round of restore-from-backup, and sqlsmith-with-kill-9, it > looks to be okay. The issue was evidently another possible symptom of > the recovery prefetch bug, which is already fixed in REL_15_STABLE (but > not in pg15b

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-12 Thread Justin Pryzby
On Mon, Sep 12, 2022 at 11:53:14AM +0900, Michael Paquier wrote: > On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote: > > On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby wrote: > >> Yeah ... I just realized that I've already forgotten the relevant > >> chronology. > >> > >> The io_concurr

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-12 Thread Alvaro Herrera
On 2022-Sep-09, Justin Pryzby wrote: > 4) I was simultaneously compiling pg14b4 to run with with >-DRELCACHE_FORCE_RELEASE and installing it into /usr/local. I don't > *think* >running libraries would've been overwritten, and that shouldn't have >affected the running instance anyway.

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Michael Paquier
On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby wrote: >> Yeah ... I just realized that I've already forgotten the relevant >> chronology. >> >> The io_concurrency bugfix wasn't included in 15b4, so (if I understood >> you correctly), t

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby wrote: > On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > > > But yesterday I started from initdb and restored this cluster from > > > backup, and > > > started up sqlsmith, and

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby
On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby > > > wrote: > > > > < 2022-09-09 19:37:25.835 CDT telsasoft

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby wrote: > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 > > > has not been created yet -- apparent wr

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby wrote: > I think you're saying is that this can be explained by the > io_concurrency bug in recovery_prefetch, if run under 15b3. > > But yesterday I started from initdb and restored this cluster from backup, and > started up sqlsmith, and sent some ki

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby
On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has > > not been created yet -- apparent wraparound > > I guess what happened here is that after one of

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby wrote: > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has > not been created yet -- apparent wraparound I guess what happened here is that after one of your (apparently several?) OOM crashes, crash recovery didn't run all th

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro
On Sat, Sep 10, 2022 at 5:01 PM Justin Pryzby wrote: > BTW, after a number of sigabrt's, I started seeing these during > recovery: > > < 2022-09-09 19:44:04.180 CDT >LOG: unexpected pageaddr 1214/AF0FE000 in > log segment 0001121400B4, offset 1040384 > < 2022-09-09 23:20:50.830 CDT

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-09 Thread Justin Pryzby
The OOM was at: < 2022-09-09 19:34:24.043 CDT >LOG: server process (PID 14841) was terminated by signal 9: Killed The first SIGABRT was at: < 2022-09-09 19:37:31.650 CDT >LOG: server process (PID 7363) was terminated by signal 6: Aborted And I've just found a bunch of "interesting" logs bet

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-09 Thread Justin Pryzby
On Sat, Sep 10, 2022 at 12:07:30PM +0800, Zhang Mingli wrote: > That’s interesting, dig into it for a while but not too much progress. > > Maybe we could add some logs to print MultiXactMembers’ xid and status if xid > is 0. > > Inside MultiXactIdGetUpdateXid() > > ``` > nmembers = GetMul

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-09 Thread Zhang Mingli
Hi, That’s interesting, dig into it for a while but not too much progress. Maybe we could add some logs to print MultiXactMembers’ xid and status if xid is 0. Inside MultiXactIdGetUpdateXid() ``` nmembers = GetMultiXactIdMembers(xmax, &members, false, false); if (nmembers > 0

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-09 Thread Justin Pryzby
On Fri, Sep 09, 2022 at 09:06:37PM -0500, Justin Pryzby wrote: > #0 0x7fb8a22f31f7 in raise () from /lib64/libc.so.6 > #1 0x7fb8a22f48e8 in abort () from /lib64/libc.so.6 > #2 0x0098f9be in ExceptionalCondition > (conditionName=conditionName@entry=0x9fada4 "TransactionIdIsValid(

pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-09 Thread Justin Pryzby
The sequence of events leading up to this: 0) Yesterday I upgraded an internal VM to pg15b4 using PGDG RPMs; It's the same VM that hit the prefetch_recovery bug which was fixed by adb466150. I don't think that should've left it in a weird state (since recovery was sucessful when prefetch