On Wed, Aug 25, 2021 at 5:36 AM Greg Nancarrow wrote:
> I've attached an updated patch, hopefully more along the lines that
> you were thinking of.
LGTM. Committed and back-patched to v10 and up. In theory the same bug
exists in 9.6, but you'd have to have third-party code using the
parallel cont
On Wed, Aug 25, 2021 at 1:37 AM Robert Haas wrote:
>
> I guess I was thinking more of rejiggering things so that we save the
> results of each RestoreSnapshot() call in a local variable, e.g.
> asnapshot and tsnapshot. And then I think we could just
> RestoreTransactionSnapshot() on whichever one
On Tue, Aug 24, 2021 at 12:20 AM Greg Nancarrow wrote:
> I initially thought this too, but RestoreTransactionSnapshot() sets up
> the resultant transaction snapshot in "CurrentSnapshot", which is
> static to snapmgr.c (like the other pointers to valid snapshots) and I
> didn't really want to mess
On Tue, Aug 24, 2021 at 5:00 AM Robert Haas wrote:
>
>
> I think this looks pretty good. I am not sure I see any reason to
> introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
> just use RestoreTransactionSnapshot() and then call
> PushActiveSnapshot() from parallel.c? That see
On Wed, Aug 18, 2021 at 9:28 AM Greg Nancarrow wrote:
> Yes, I think I agree on that.
> I've updated the patch to restore the actual transaction snapshot in
> the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
> installed as the transaction snapshot.
> I've tested the patch for
On Wed, Aug 18, 2021 at 5:00 AM Robert Haas wrote:
>
> Ah ha! Thank you. So I think what I was missing here is that even
> though the transaction snapshot is not a well-defined concept when
> !IsolationUsesXactSnapshot(), we still need TransactionXmin to be set
> to a value that's earlier than any
On Fri, Aug 13, 2021 at 2:52 AM Greg Nancarrow wrote:
> With your proposed approach, what I'm seeing is that the worker calls
> GetTransactionSnapshot() at some point, which then builds a new
> snapshot, and results in increasing TransactionXmin (probably because
> another concurrent transaction h
On Thu, Aug 12, 2021 at 5:37 AM Robert Haas wrote:
>
> 1. Then why doesn't the approach I proposed fix it?
>
I think that with your approach, it is not doing the expected
initialization done by SetTransactionSnapshot() (which is called by
RestoreTransactionSnapshot(), which your approach skips in
On Wed, Aug 11, 2021 at 8:32 AM Greg Nancarrow wrote:
> This is explained by the TransactionSnapshot being a later snapshot in
> this case.
> So this is why it seems to be wrong to call GetTransactionSnapshot()
> in InitializeParallelDSM() and use a separate, potentially later,
> snapshot than tha
On Tue, Aug 10, 2021 at 12:35 AM Robert Haas wrote:
>
> Now there is evidently something wrong with this line of reasoning
> because the code is buggy and my proposed fix doesn't work, but I
> don't know what is wrong with it. You seem to think that it's crazy
> that we try to replicate the active
On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow wrote:
> In setting up the snapshot for the execution state used in command
> execution, GetTransactionSnapshot() is called and (possibly a copy of)
> the returned snapshot is pushed as the ActiveSnapshot.
I mean, there are LOTS of PushActiveSnapshot
On Thu, Aug 5, 2021 at 12:03 PM Greg Nancarrow wrote:
>
As the current v7 patch doesn't fix the coredump issue and also the
cfbot is now failing (as one of the regression tests fails) I'm
reinstating my v2/v5 patch (as v8) as the current best solution to
this issue.
So far I haven't found a test
On Wed, Aug 4, 2021 at 11:43 PM Robert Haas wrote:
>
> Why do you think it's right to install the serialized *active*
> snapshot as the *transaction* snapshot? I've been operating on the
> presumption that we wanted the worker to install the leader's
> transaction snapshot as its transaction snaps
On Tue, Aug 3, 2021 at 11:41 PM Greg Nancarrow wrote:
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mi
>
> At SERIALIZABLE level with v2/v5 I get an error which I don't have before
> the patch (but no crash):
> pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:
> could not serialize access due to read/write dependencies among
> transactions
> DETAIL: Reason code: Canceled on
ср, 4 авг. 2021 г. в 14:18, Greg Nancarrow :
> On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov
> wrote:
> >
> >>
> > Greg, thanks for the fast response! I suppose that a check for
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for
> the correct processing of a case with NUL
On Wed, Aug 4, 2021 at 8:17 PM Greg Nancarrow wrote:
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>
Er, with the v7 patch, the problem still occurs (that Assert still
fires during a run of the SubTransGetTopmo
On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov wrote:
>
>>
> Greg, thanks for the fast response! I suppose that a check for
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for
> the correct processing of a case with NULL transaction snapshot.
> This corrects mentioned chec
ср, 4 авг. 2021 г. в 07:41, Greg Nancarrow :
> On Wed, Aug 4, 2021 at 3:21 AM Robert Haas wrote:
> >
> >The idea I sort of had floating around in my mind is a little
> >different than what Greg has implemented. I was thinking that we could
> >just skip SerializeSnapshot and the corresponding shm_
On Wed, Aug 4, 2021 at 3:21 AM Robert Haas wrote:
>
>The idea I sort of had floating around in my mind is a little
>different than what Greg has implemented. I was thinking that we could
>just skip SerializeSnapshot and the corresponding shm_toc_allocate()
>if !IsolationUsesXactSnapshot(). Then on
On Tue, Aug 3, 2021 at 9:59 AM Pavel Borisov wrote:
> I've looked at v5 patch. It is completely similar to v2 patch, which I've
> already tested using the workflow, I've described in the comments above.
> Before the patch I get the errors quite soon, the patch corrects them.
> Furthermore I've
пт, 23 июл. 2021 г. в 10:00, Greg Nancarrow :
> On Thu, Jul 22, 2021 at 2:15 AM Robert Haas wrote:
> >
> > Thanks to Thomas Munro for drawing my attention to this thread. I
> > wasn't intentionally ignoring it, but there's a lot of email in the
> > world and only so much time.
> >
> > When I wrot
On Thu, Jul 22, 2021 at 2:15 AM Robert Haas wrote:
>
> Thanks to Thomas Munro for drawing my attention to this thread. I
> wasn't intentionally ignoring it, but there's a lot of email in the
> world and only so much time.
>
> When I wrote this code originally, the idea that I had in mind was
> sim
On Wed, Jul 14, 2021 at 10:34 PM Greg Nancarrow wrote:
> Unfortunately there is currently no test, code-comment, README or
> developer-discussion that definitively determines which approach (v2
> vs v3/v4) is a valid fix for this issue.
> We don't know if having both the transaction and active sn
On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this bre
>
> > So yeah, I think this is due to confusion with two snapshots and
> > failing
> > to consider both of them when calculating TransactionXmin.
> >
> > But I think removing one of the snapshots (as the v2 does it) is rather
> > strange too. I very much doubt having both the transaction and active
On 2021-07-09 20:36, Tomas Vondra wrote:
Hi,
I took a quick look on this - I'm no expert in the details of
snapshots,
so take my comments with a grain of salt.
AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or cor
Hi,
I took a quick look on this - I'm no expert in the details of snapshots,
so take my comments with a grain of salt.
AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or correct)
solution, for a couple reasons:
1) It
By the way in the initial discussion on parallel infrastructure
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de
I've seen a proposal to set the worker's PGXACT->xmin
to be the minimum of the top transaction id and the snapshots.
--
Best regards,
Pavel Borisov
Post
>
> Have you found a case that the v2 patch breaks?
>
> The v2 patch does follow the analysis in the beginning of the
> discussion, which identified that in setting up parallel workers, a
> "later transaction snapshot" was taken than the one actually used in
> the statement execution, and that's wh
On Wed, Jun 23, 2021 at 1:06 AM Maxim Orlov wrote:
>
> The analysis in the beginning of the discussion seems to be right, but
> the fix v2 looks too invasive for me.
>
> Personally, I'd like not to remove snapshot even if transaction is
> read-only. I propose to consider "xid < TransactionXmin" as
The analysis in the beginning of the discussion seems to be right, but
the fix v2 looks too invasive for me.
Personally, I'd like not to remove snapshot even if transaction is
read-only. I propose to consider "xid < TransactionXmin" as a legit case
and just promote xid to TransactionXmin.
It
Em sex., 4 de jun. de 2021 às 12:07, Pavel Borisov
escreveu:
> Please, avoid using decimal based values.
>>
>> 128 is multiple of 64.
>>
> It's true that 128 is better to use than 120 but the main problem is not
> in the value but in the fact we never get
> CurrentRunningXacts->subxid_overflow =
>
> Please, avoid using decimal based values.
>
> 128 is multiple of 64.
>
It's true that 128 is better to use than 120 but the main problem is not in
the value but in the fact we never get
CurrentRunningXacts->subxid_overflow = suboverflowed; with value more than
120. This solves the problem but i
>Just a note here. After examining the core dump I did notice something.
>While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true
>although subxip == NULL and subxcnt == 0. As far as I understand,
>snapshot->suboverflowed is set true in the GetRunningTransactionData
>call.
>And th
Just a note here. After examining the core dump I did notice something.
While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true
although subxip == NULL and subxcnt == 0. As far as I understand,
snapshot->suboverflowed is set true in the GetRunningTransactionData
call.
And the
On Mon, May 24, 2021 at 11:56 PM Pavel Borisov wrote:
>
> Using a recipe similar to what has been described above in the thread, I
> reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
> 1. Make & make install
> 2. Make check
> 3. run SubTransGetTopmostTransaction-rep.sh in
пн, 24 мая 2021 г. в 09:22, Greg Nancarrow :
> On Mon, May 24, 2021 at 2:50 PM Michael Paquier
> wrote:
> >
> > On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > > Keep cfbot happy, use the PG14 patch as latest.
> >
> > This stuff is usually very tricky.
>
> Agreed. That's why I
On Mon, May 24, 2021 at 2:50 PM Michael Paquier wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.
Agreed. That's why I was looking for experts in this snapshot-handling
code, to loo
On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> Keep cfbot happy, use the PG14 patch as latest.
This stuff is usually very tricky. Do we have a way to reliably
reproduce the report discussed here?
--
Michael
signature.asc
Description: PGP signature
On Thu, May 20, 2021 at 4:08 PM Greg Nancarrow wrote:
>
Keep cfbot happy, use the PG14 patch as latest.
Regards,
Greg Nancarrow
Fujitsu Australia
v2-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data
On Thu, May 20, 2021 at 11:18 AM Pengchengliu wrote:
>
> Hi Greg,
>Thanks a lot for you explanation and your fix.
>
>I think your fix can resolve the core dump issue. As with your fix,
> parallel process reset Transaction Xmin from ActiveSnapshot.
>But it will change Transaction snaps
--Original Message-
From: Greg Nancarrow
Sent: 2021年5月18日 17:15
To: Pengchengliu
Cc: Andres Freund ; PostgreSQL-development
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert
coredump
On Tue, May 18, 2021 at 11:27 AM Pengchengliu wrote:
>
> Hi Greg,
>
>
On Tue, May 18, 2021 at 9:41 PM houzj.f...@fujitsu.com
wrote:
>
> > To: Pengchengliu
> > Cc: Greg Nancarrow ; Andres Freund
> > ; PostgreSQL-development
> > Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert
> > coredump
>
> &
> To: Pengchengliu
> Cc: Greg Nancarrow ; Andres Freund ;
> PostgreSQL-development
> Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert
> coredump
> I've also seen the reports of the same
> Assert(TransactionIdFollowsOrEquals(xid, Transacti
On Tue, May 18, 2021 at 11:27 AM Pengchengliu wrote:
>
> Hi Greg,
>
>Actually I am very confused about ActiveSnapshot and TransactionSnapshot.
> I don't know why main process send ActiveSnapshot and TransactionSnapshot
> separately. And what is exact difference between them?
>If you kno
I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid,
TransactionXmin)) with a subsequent crash in a parallel worker in
PostgreSQL v11-based build, Though I was unable to investigate deeper and
reproduce the issue. The details above in the thread make me think it is a
real
om: Greg Nancarrow
Sent: 2021年5月17日 20:59
To: 刘鹏程
Cc: Andres Freund ; PostgreSQL-development
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert
coredump
On Sat, May 15, 2021 at 12:37 PM 刘鹏程 wrote:
>
>
> BTW, I test it in a high performance server. It is verly easily
On Sat, May 15, 2021 at 12:37 PM 刘鹏程 wrote:
>
>
> BTW, I test it in a high performance server. It is verly easily be
> reproduced. My colleague and me use different environment both can reproduce
> it.
>
Hi Pengcheng,
Although the issue won't reproduce easily in my system, I can
certainly se
with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html
Thanks
Pengcheng
-Original Message-
From: Greg Nancarrow
Sent: 2021年5月15日 0:44
To: Pengchengliu
Cc: Andres Freund ; PostgreSQL-development
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Fri, May 14,
Hi Greg,
It is really weird. Could you make sure is the SnapShot overflow in you ENV?
It is very impoint.
Abount SnapShot overflow and Subtrans, you can refer this
https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/.
In the script sub_120.sql, for one
On Fri, May 14, 2021 at 8:36 PM Pengchengliu wrote:
> Did you use pgbench with the script sub_120.sql which I provide in
> attachment?
yes
>
> Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change
> any codes, now we just use the origin codes in PG13.2.
>
No, I have m
sue should still exist.
Thanks
Pengcheng
-Original Message-
From: Greg Nancarrow
Sent: 2021年5月14日 16:47
To: Pengchengliu
Cc: Andres Freund ; PostgreSQL-development
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Fri, May 14, 2021 at
On Fri, May 14, 2021 at 12:25 PM Pengchengliu wrote:
>
> Hi Greg,
>
> Thanks for you replay and test.
>
>
>
> When main process gets the transaction snapshot in
> InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin
> is very likely follows active snapshot xmin.
>
>
elopment
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Thu, May 13, 2021 at 11:25 AM Pengchengliu <
<mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote:
>
> Hi Andres,
> Thanks for you replay.
Er, it
On Thu, May 13, 2021 at 11:25 AM Pengchengliu wrote:
>
> Hi Andres,
> Thanks for you replay.
Er, it's Greg who has replied so far (not Andres).
>
> And If you still cannot reproduce it in 2 minitus. Could you run pgbench
> longer time, such as 30 or 60 minutes.
>
Actually, I did run it, mu
--Original Message-
From: Greg Nancarrow
Sent: 2021年5月11日 19:08
To: Pengchengliu
Cc: Andres Freund ; PostgreSQL-development
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Tue, May 11, 2021 at 11:28 AM Pengchengliu <
<mailto:pengcheng...@tju.edu.
On Tue, May 11, 2021 at 11:28 AM Pengchengliu wrote:
>
> Hi Andres,
>Reproduce steps.
>
> 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
> "src/include/access/subtrans.h" line number 15.
> 2, configure with enable assert and build it.
> 3, init a new database cluster.
>
al Message-
From: Andres Freund
Sent: 2021年5月7日 11:55
To: Pengchengliu
Cc: pgsql-hack...@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi,
On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
>
> Last email, format error, missi
al Message-
From: Andres Freund
Sent: 2021年5月7日 11:55
To: Pengchengliu
Cc: pgsql-hack...@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi,
On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
>
> Last email, format error, missi
Hi,
On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
>
> Last email, format error, missing some information, so I resend this email.
>
> With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested
> subtransaction with parallel scan, I got a subtransaction coredump as bel
Hi Hackers,
Last email, format error, missing some information, so I resend this email.
With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested
subtransaction with parallel scan, I got a subtransaction coredump as below:
```
(gdb) bt
#0 0x1517ce61f7ff in raise () from /lib
Hi hackers,
With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested
subtransaction with parallel scan, I got a subtransaction coredump as below:
(gdb) bt
#0 0x1517ce61f7ff in raise () from /lib64/libc.so.6
#1 0x1517ce609c35 in abort () from /lib64/libc.so.6
#2 0x00aa
63 matches
Mail list logo