Hi there, hope to find you well. I have a follow-up question to this already long thread.
Upon deploying my PostgreSQL logical replication fed application on a stale database, I ended up running out of space, as the replication slot is being held back till the next time that we receive a data-changing event, and we advance to that new LSN offset. I think that the solution for this is to advance our LSN offset every time a keep-alive message is received ('k' // 107). My doubt is, can the keep-alive messages be received in between open transaction events? I think not, but I would like to get your input to be extra sure as if this happens, and I commit that offset, I may introduce again faulty logic leading to data loss. In sum, something like this wouldn't happen: BEGIN LSN001 INSERT LSN002 KEEP LIVE LSN003 UPDATE LSN004 COMMIT LSN005 Correct? It has to be either: KEEP LIVE LSN001 BEGIN LSN002 INSERT LSN003 UPDATE LSN004 COMMIT LSN005 Or: BEGIN LSN001 INSERT LSN002 UPDATE LSN004 COMMIT LSN005 KEEP LIVE LSN006 LSNXXX are mere representations of LSN offsets. Thank you again. Regards, José Neves ________________________________ De: Amit Kapila <amit.kapil...@gmail.com> Enviado: 8 de agosto de 2023 14:37 Para: José Neves <rafanev...@msn.com> Cc: Andres Freund <and...@anarazel.de>; pgsql-hack...@postgresql.org <pgsql-hack...@postgresql.org> Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom client On Mon, Aug 7, 2023 at 1:46 PM José Neves <rafanev...@msn.com> wrote: > > Humm, that's... challenging. I faced some issues after "the fix" because I > had a couple of transactions with 25k updates, and I had to split it to be > able to push to our event messaging system, as our max message size is 10MB. > Relying on commit time would mean that all transaction operations will have > the same timestamp. If something goes wrong while my worker is pushing that > transaction data chunks, I will duplicate some data in the next run, so... > this wouldn't allow me to deal with data duplication. > Is there any other way that you see to deal with it? > > Right now I only see an option, which is to store all processed LSNs on the > other side of the ETL. I'm trying to avoid that overhead. > Sorry, I don't understand your system enough to give you suggestions but if you have any questions related to how logical replication work then I might be able to help. -- With Regards, Amit Kapila.