Re: Single transaction in the tablesync worker?

Amit Kapila Sun, 06 Dec 2020 23:56:06 -0800

On Mon, Dec 7, 2020 at 10:02 AM Craig Ringer
<craig.rin...@enterprisedb.com> wrote:
>
> On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2...@gmail.com> wrote:
>>
>>
>> Basically, I was wondering why can't the "tablesync" worker just
>> gather messages in a similar way to how the current streaming feature
>> gathers messages into a "changes" file, so that they can be replayed
>> later.
>>
>
> See the related thread "Logical archiving"
>
> https://www.postgresql.org/message-id/20d9328b-a189-43d1-80e2-eb25b9284...@yandex-team.ru
>
> where I addressed some parts of this topic in detail earlier today.
>
>> A) The "tablesync" worker (after the COPY) does not ever apply any of
>> the incoming messages, but instead it just gobbles them into a
>> "changes" file until it decides it has reached SYNCDONE state and
>> exits.
>
>
> This has a few issues.
>
> Most importantly, the sync worker must cooperate with the main apply worker 
> to achieve a consistent end-of-sync cutover.
>


In this idea, there is no need to change the end-of-sync cutover. It
will work as it is now. I am not sure what makes you think so.

> The sync worker must have replayed the pending changes in order to make this 
> cut-over, because the non-sync apply worker will need to start applying 
> changes on top of the resync'd table potentially as soon as the next 
> transaction it starts applying, so it needs to see the rows there.
>

The change here would be that the apply worker will check for changes
file and if it exists then apply them before it changes the relstate
to SUBREL_STATE_READY in process_syncing_tables_for_apply(). So, it
will not miss seeing any rows.

> Doing this would also add another round of write multiplication since the 
> data would get spooled then applied to WAL then heap. Write multiplication is 
> already an issue for logical replication so adding to it isn't particularly 
> desirable without a really compelling reason.
>

It will solve our problem of allowing decoding of prepared xacts in
pgoutput. I have explained the problem above [1]. The other idea which
we discussed is to allow having an additional state in
pg_subscription_rel, make the slot as permanent in tablesync worker,
and then process transaction-by-transaction in apply worker. Does that
approach sounds better? Is there any bigger change involved in this
approach (making tablesync slot permanent) which I am missing?

> With  the write multiplication comes disk space management issues for big 
> transactions as well as the obvious performance/throughput impact.
>
> It adds even more latency between upstream commit and downstream apply, 
> something that is again already an issue for logical replication.
>
> Right now we don't have any concept of a durable and locally flushed spool.
>

I think we have a concept quite close to it for writing changes for
in-progress xacts as done in PG-14. It is not durable but that
shouldn't be a big problem if we allow syncing the changes file.

> It's not impossible to do as you suggest but the cutover requirement makes it 
> far from simple. As discussed in the logical archiving thread I think it'd be 
> good to have something like this, and there are times the write 
> multiplication price would be well worth paying. But it's not easy.
>
>> B) Then, when the "apply" worker proceeds, if it detects the existence
>> of the "changes" file it will replay/apply_dispatch all those gobbled
>> messages before just continuing as normal.
>
>
> That's going to introduce a really big stall in the apply worker's progress 
> in many cases. During that time it won't be receiving from upstream (since we 
> don't spool logical changes to disk at this time) so the upstream lag will 
> grow. That will impact synchronous replication, pg_wal size management, 
> catalog bloat, etc. It'll also leave the upstream logical decoding session 
> idle, so when it resumes it may create a spike of I/O and CPU load as it 
> catches up, as well as a spike of network traffic. And depending on how close 
> the upstream write rate is to the max decode speed, network throughput max, 
> and downstream apply speed max, it may take some time to catch up over the 
> resulting lag.
>

This is just for the initial tablesync phase. I think it is equivalent
to saying that during basebackup, we need to parallelly start physical
replication. I agree that sometimes it can take a lot of time to copy
large tables but it will be just one time and no worse than the other
situations like basebackup.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Single transaction in the tablesync worker?

Reply via email to