from:"John Ashmead"

Re: Logical replication from 11.x to 12.x and "unique key violations"

2020-07-20 Thread John Ashmead

I have had this problem with logical replication on PG 10 repeatedly. In a 
clean build no problem.

But if I am restarting replication because of some problem I’ve seen problems 
with rows already present.

My own fix, which has worked in my shop, is to add replica triggers to check 
for the row being already present.   If it is, they drop the row on the floor.  
This lets stuff come in in whatever order it happens to come in.  

Sample code:

if TG_OP = ‘INSERT’ then
select id into id1 from table1 where id = new.id;
if id1 is not null then
— log error for analysis
return null;
end if;
end if;

In an ideal world, this would probably not be necessary.  But it can be tricky 
to restart replication in an absolutely clean way across all tables and in this 
case it can be better to allow for a bit of overlap in the rows being 
replicated. 

FWIW,

John

> On Jul 20, 2020, at 1:47 PM, Tom Lane  wrote:
> 
> Thomas Kellerer  writes:
>>> I have a strange error when using logical replication between a 11.2
>>> source database and a 12.3 target.
>>> 
>>> If I create the publication with all needed tables (about 50) at
>>> once, I get "duplicate key value violates unique constraint xxx_pkey"
>>> errors during the initial replication (when creating the
>>> subscription).
>>> 
>>> When create the publication only with a few tables, the initial data
>>> sync works without problems. To replicate all tables, I add the
>>> tables incrementally to the publication, and refresh the
>>> subscription.
>>> 
>>> If I do it like that (step-by-step) everything works fine. Tables
>>> that generated the "duplicate key value" error previously will
>>> replicate just fine. The tables are quite small, some of them less
>>> then 100 rows.
> 
> I have not looked at the code, but it wouldn't surprise me if the initial
> replication just copies all the specified tables in some random order.
> If there are FK references involved, the replication would have to be
> done with referenced tables first, and I bet there's no logic for that.
> (Even if there was, it could not cope with circular references or
> self-references.)
> 
> Best bet might be to not install the subscriber's foreign key
> constraints till after the initial sync is done.
> 
>   regards, tom lane
> 
> 
> 

John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323
john.ashm...@ashmeadsoftware.com








smime.p7s
Description: S/MIME cryptographic signature

How can you find out what point logical replication is at?

2020-08-03 Thread John Ashmead

I have logical replication setup from a factory in Zhuhai China to a data 
warehouse in New Jersey. We are using postgresql 10.13 on both sides, on Redhat 
Linux 7.6.

The logical replication has been in “catchup” mode for several days now, stuck 
at a specific LSN (9EF/89ADF7E0). The slave side seems to be chugging along, 
generating lots of WAL files — but not actually getting any new records in. 

The logical replication is being used to move some fairly large files:  
averaging about 1 MB but with a max up to about 250 MB. (I think I want to fix 
that longer term, but that’s not a quick fix.)

My working hypothesis is that we are stuck on a long transaction: that we can’t 
get some abnormally large blob over before we drop the line.  In this case 
fixing the connection should fix the problem.

Is there a way I can see what is going on?  The slave is working hard, but what 
is it working on?

And are there any suggestions on how to handle this?

I could restart the logical replication with “copy_data = false”, then fill in 
the holes by hand.  But I would rather not!

Thanks in advance!

John

PS. Output of pg_stat_replication & pg_stat_subscription on master & slave 
respectively.  (Some proprietary information X’d out)

select * from pg_stat_replication;
-[ RECORD 1 ]+--
pid  | 42451
usesysid | 10
usename  | postgres
application_name | china_to_nj_sub
client_addr  | XXX.XXX.XXX.XXX
client_hostname  | 
client_port  | 54300
backend_start| 2020-08-03 09:07:07.257454-04
backend_xmin | 16574498
state| catchup
sent_lsn | 9EF/89ADF7E0
write_lsn| 9EF/89ADF7E0
flush_lsn| 9EF/89ADF7E0
replay_lsn   | 9EF/89ADF7E0
write_lag| 
flush_lag| 
replay_lag   | 
sync_priority| 0
sync_state   | async

 select * from pg_stat_subscription;
-[ RECORD 1 ]-+--
subid | 7222699
subname   | china_to_nj_sub
pid   | 14764
relid | 
received_lsn  | 9EF/89ADF7E0
last_msg_send_time| 2020-08-03 10:15:48.644575-04
last_msg_receipt_time | 2020-08-03 10:14:57.247993-04
latest_end_lsn| 9EF/89ADF7E0
latest_end_time   | 2020-08-03 09:30:57.974223-04


John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323
john.ashm...@ashmeadsoftware.com <mailto:john.ashm...@ashmeadsoftware.com>








smime.p7s
Description: S/MIME cryptographic signature

Re: How can you find out what point logical replication is at? -- or weird, slow, infinite loop

2020-08-03 Thread John Ashmead

This has gotten a bit weirder.

The replication is getting up to a specific LSN.  

Then it moves the sent_lsn to a number less than that, for a WAL file about an 
hour or more earlier and slowly walks forward again till it hit the same 
highwater mark.

So the replication seems to be stuck in some kind of a slow infinite loop.

Corrupted WAL file?  Any other ideas of what to look for?

TIA,

John

> On Aug 3, 2020, at 10:38 AM, John Ashmead  
> wrote:
> 
> I have logical replication setup from a factory in Zhuhai China to a data 
> warehouse in New Jersey. We are using postgresql 10.13 on both sides, on 
> Redhat Linux 7.6.
> 
> The logical replication has been in “catchup” mode for several days now, 
> stuck at a specific LSN (9EF/89ADF7E0). The slave side seems to be chugging 
> along, generating lots of WAL files — but not actually getting any new 
> records in. 
> 
> The logical replication is being used to move some fairly large files:  
> averaging about 1 MB but with a max up to about 250 MB. (I think I want to 
> fix that longer term, but that’s not a quick fix.)
> 
> My working hypothesis is that we are stuck on a long transaction: that we 
> can’t get some abnormally large blob over before we drop the line.  In this 
> case fixing the connection should fix the problem.
> 
> Is there a way I can see what is going on?  The slave is working hard, but 
> what is it working on?
> 
> And are there any suggestions on how to handle this?
> 
> I could restart the logical replication with “copy_data = false”, then fill 
> in the holes by hand.  But I would rather not!
> 
> Thanks in advance!
> 
> John
> 
> PS. Output of pg_stat_replication & pg_stat_subscription on master & slave 
> respectively.  (Some proprietary information X’d out)
> 
> select * from pg_stat_replication;
> -[ RECORD 1 ]+--
> pid  | 42451
> usesysid | 10
> usename  | postgres
> application_name | china_to_nj_sub
> client_addr  | XXX.XXX.XXX.XXX
> client_hostname  | 
> client_port  | 54300
> backend_start| 2020-08-03 09:07:07.257454-04
> backend_xmin | 16574498
> state| catchup
> sent_lsn | 9EF/89ADF7E0
> write_lsn| 9EF/89ADF7E0
> flush_lsn| 9EF/89ADF7E0
> replay_lsn   | 9EF/89ADF7E0
> write_lag| 
> flush_lag| 
> replay_lag   | 
> sync_priority| 0
> sync_state   | async
> 
>  select * from pg_stat_subscription;
> -[ RECORD 1 ]-+--
> subid | 7222699
> subname   | china_to_nj_sub
> pid   | 14764
> relid | 
> received_lsn  | 9EF/89ADF7E0
> last_msg_send_time| 2020-08-03 10:15:48.644575-04
> last_msg_receipt_time | 2020-08-03 10:14:57.247993-04
> latest_end_lsn| 9EF/89ADF7E0
> latest_end_time   | 2020-08-03 09:30:57.974223-04
> 
> 
> John Ashmead
> 139 Montrose Avenue
> Rosemont, PA, 19010-1508
> (610) 527 9560 
> mobile (610) 247 2323
> john.ashm...@ashmeadsoftware.com <mailto:john.ashm...@ashmeadsoftware.com>
> 
> 
> 
> 
> 
> 

John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323
john.ashm...@ashmeadsoftware.com








smime.p7s
Description: S/MIME cryptographic signature

Re: Logical replication from 11.x to 12.x and "unique key violations"

How can you find out what point logical replication is at?

Re: How can you find out what point logical replication is at? -- or weird, slow, infinite loop

3 matches

Site Navigation

Mail list logo

Footer information