Hi there, hope to find you well.

I have a follow-up question to this already long thread.

Upon deploying my PostgreSQL logical replication fed application on a stale 
database, I ended up running out of space, as the replication slot is being 
held back till the next time that we receive a data-changing event, and we 
advance to that new LSN offset.
I think that the solution for this is to advance our LSN offset every time a 
keep-alive message is received ('k' // 107).
My doubt is, can the keep-alive messages be received in between open 
transaction events? I think not, but I would like to get your input to be extra 
sure as if this happens, and I commit that offset, I may introduce again faulty 
logic leading to data loss.

In sum, something like this wouldn't happen:
BEGIN LSN001
INSERT LSN002
KEEP LIVE LSN003
UPDATE LSN004
COMMIT LSN005

Correct? It has to be either:

KEEP LIVE LSN001
BEGIN LSN002
INSERT LSN003
UPDATE LSN004
COMMIT LSN005

Or:
BEGIN LSN001
INSERT LSN002
UPDATE LSN004
COMMIT LSN005
KEEP LIVE LSN006

LSNXXX are mere representations of LSN offsets.

Thank you again.
Regards,
José Neves

________________________________
De: Amit Kapila <amit.kapil...@gmail.com>
Enviado: 8 de agosto de 2023 14:37
Para: José Neves <rafanev...@msn.com>
Cc: Andres Freund <and...@anarazel.de>; pgsql-hack...@postgresql.org 
<pgsql-hack...@postgresql.org>
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Mon, Aug 7, 2023 at 1:46 PM José Neves <rafanev...@msn.com> wrote:
>
> Humm, that's... challenging. I faced some issues after "the fix" because I 
> had a couple of transactions with 25k updates, and I had to split it to be 
> able to push to our event messaging system, as our max message size is 10MB. 
> Relying on commit time would mean that all transaction operations will have 
> the same timestamp. If something goes wrong while my worker is pushing that 
> transaction data chunks, I will duplicate some data in the next run, so... 
> this wouldn't allow me to deal with data duplication.
> Is there any other way that you see to deal with it?
>
> Right now I only see an option, which is to store all processed LSNs on the 
> other side of the ETL. I'm trying to avoid that overhead.
>

Sorry, I don't understand your system enough to give you suggestions
but if you have any questions related to how logical replication work
then I might be able to help.

--
With Regards,
Amit Kapila.

Reply via email to