On 12/23/2017 11:23 PM, Erik Rijkers wrote: > On 2017-12-23 21:06, Tomas Vondra wrote: >> On 12/23/2017 03:03 PM, Erikjan Rijkers wrote: >>> On 2017-12-23 05:57, Tomas Vondra wrote: >>>> Hi all, >>>> >>>> Attached is a patch series that implements two features to the logical >>>> replication - ability to define a memory limit for the reorderbuffer >>>> (responsible for building the decoded transactions), and ability to >>>> stream large in-progress transactions (exceeding the memory limit). >>>> >>> >>> logical replication of 2 instances is OK but 3 and up fail with: >>> >>> TRAP: FailedAssertion("!(last_lsn < change->lsn)", File: >>> "reorderbuffer.c", Line: 1773) >>> >>> I can cobble up a script but I hope you have enough from the assertion >>> to see what's going wrong... >> >> The assertion says that the iterator produces changes in order that does >> not correlate with LSN. But I have a hard time understanding how that >> could happen, particularly because according to the line number this >> happens in ReorderBufferCommit(), i.e. the current (non-streaming) case. >> >> So instructions to reproduce the issue would be very helpful. > > Using: > > 0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v2.patch > 0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v2.patch > 0003-Issue-individual-invalidations-with-wal_level-log-v2.patch > 0004-Extend-the-output-plugin-API-with-stream-methods-v2.patch > 0005-Implement-streaming-mode-in-ReorderBuffer-v2.patch > 0006-Add-support-for-streaming-to-built-in-replication-v2.patch > > As you expected the problem is the same with these new patches. > > I have now tested more, and seen that it not always fails. I guess that > it here fails 3 times out of 4. But the laptop I'm using at the moment > is old and slow -- it may well be a factor as we've seen before [1]. > > Attached is the bash that I put together. I tested with > NUM_INSTANCES=2, which yields success, and NUM_INSTANCES=3, which fails > often. This same program run with HEAD never seems to fail (I tried a > few dozen times). >
Thanks. Unfortunately I still can't reproduce the issue. I even tried running it in valgrind, to see if there are some memory access issues (which should also slow it down significantly). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services