Repeated, never-ending deadlock

2022-04-19 Thread andrew cooke


Hi All,

OK, apologies in advance, I feel like this is a rather unintelligent
question that I probably know the answer to, but I am banging my head
against a wall trying to understand what is happening.

I have an ETL program, written in Python, that uses mutiple processes
to read and load a bunch of files into a database.  Sometimes
(especially early on, when the database is new) common information
that occurs in several files, and which is normalized out into
sub-tables, causes deadlocks.  That's OK; the code should receive an
exception, back off, try again, reading first, etc.  And I see
exceptions occuring in the Python code log as expected.

However, today, I saw the same deadlock error repeated in the Postgres
log every second for minutes.  The same two processes, with the same
data, and the same tables.  I can't work out how this can happen
without a crazy tight loop in the Python code that's just repeatedly
trying to commit a value (which afaik doesn't exist!)  A fragment of
the Postgres log is below.  The Python logs show nothing.

So my question is - is there some circumstance where instead of
raising an error to the calling code, Postgres instead goes round in
circles?

An answer of "nope, it must be a bug in your code" is fine / expected.
It's just one of those days...

This is Postgres 14 with a fairly default config (except that logical
replication is enabled).

Thanks,
Andrew


2022-04-19 19:55:54.482 UTC [28560] ERROR:  deadlock detected
2022-04-19 19:55:54.482 UTC [28560] DETAIL:  Process 28560 waits for ShareLock 
on transaction 609318483; blocked by process 28559.
Process 28559 waits for ShareLock on transaction 609318682; blocked by 
process 28560.
Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
2022-04-19 19:55:54.482 UTC [28560] HINT:  See server log for query details.
2022-04-19 19:55:54.482 UTC [28560] CONTEXT:  while inserting index tuple 
(25,3) in relation "channel_nscl_nscl_idx"
2022-04-19 19:55:54.482 UTC [28560] STATEMENT:  INSERT INTO channel_nscl (net, 
sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, 
channel_nscl.publisher
2022-04-19 19:55:55.488 UTC [28560] ERROR:  deadlock detected
2022-04-19 19:55:55.488 UTC [28560] DETAIL:  Process 28560 waits for ShareLock 
on transaction 609318483; blocked by process 28559.
Process 28559 waits for ShareLock on transaction 609318682; blocked by 
process 28560.
Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
2022-04-19 19:55:55.488 UTC [28560] HINT:  See server log for query details.
2022-04-19 19:55:55.488 UTC [28560] CONTEXT:  while inserting index tuple 
(25,4) in relation "channel_nscl_nscl_idx"
2022-04-19 19:55:55.488 UTC [28560] STATEMENT:  INSERT INTO channel_nscl (net, 
sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, 
channel_nscl.publisher
2022-04-19 19:55:56.495 UTC [28560] ERROR:  deadlock detected
2022-04-19 19:55:56.495 UTC [28560] DETAIL:  Process 28560 waits for ShareLock 
on transaction 609318483; blocked by process 28559.
Process 28559 waits for ShareLock on transaction 609318682; blocked by 
process 28560.
Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES 
('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
2022-04-19 19:55:56.495 UTC [28560] HINT:  See server log for query details.
2022-04-19 19:55:56.495 UTC [28560] CONTEXT:  while inserting index tuple 
(25,5) in relation "channel_nscl_nscl_idx"
2022-04-19 19:55:56.495 UTC [28560] STATEMENT:  INSERT INTO channel_nscl (net, 
sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, 
channel_nscl.publisher





Logical subscription / publication lifetimes

2022-04-22 Thread andrew cooke


If I define a publication at time Tp, then load some data on the
publisher, then start a subscription at time Ts, then load some more
data on the publisher, does the subscriber get data from Tp or Ts
onwards?

Also, if a subscription is disabled and then re-enabled does it lose
the data inbetween, or is it back-filled?

I am not finding the answers to these questions in the docs at
https://www.postgresql.org/docs/current/logical-replication.html but
maybe I am overlooking something.  The link above does mention copying
an existing table which may imply Ts?

Thanks,
Andrew




Re: Logical subscription / publication lifetimes

2022-04-22 Thread andrew cooke


Ah, thanks!  I should have read the documentation of all the
parameters!

So the portion of data that is covered by "copy_data" is going to
reflect updates and deletes prior to the creation of the slot even if
"publish=insert" (only)?

This makes sense because I can't see how else it could be practically
implemented, but just want to be sure I am understanding.  The idea
that there are two phases (copy existing data then replicate
operations) is a big help.

Thanks again,
Andrew

On Fri, Apr 22, 2022 at 09:13:15AM -0700, David G. Johnston wrote:
> On Fri, Apr 22, 2022 at 5:00 AM andrew cooke  wrote:
> 
> >
> > If I define a publication at time Tp, then load some data on the
> > publisher, then start a subscription at time Ts, then load some more
> > data on the publisher, does the subscriber get data from Tp or Ts
> > onwards?
> >
> >
> It depends.  By default, neither, the publisher is publishing the entire
> contents of the table and the subscriber will do everything necessary to
> replicate those contents in their entirety.
> 
> If you specify copy_data = false I'm not sure what you end up with
> initially or after disable.  My guess is the subscription defines the first
> transaction it cares about when it connects to the publisher, defaulting to
> the most recent publisher transaction (all older transactions would be
> handled via copy_data = true) but then so long as the slot remains active
> the publisher will place the data into the slot even while the subscriber
> is not active and the subscriber will receive all of it next time it comes
> online/re-enables.
> 
> David J.




How many max_replication_slots?

2022-04-26 Thread andrew cooke


Hi,

How can I estimate the correct value for max_replication_slots
(logical replication, pg 14)?

As far as I understand, they're needed to keep WAL files on the
publisher when something goes wrong.  But we seem to need way, way
more slots than publishers.  Is it one slot per publisher per table?

We have 14 publishers and seem to need more than 100 slots.  Is this
reasonable, or does it indicate some underlying error?  They all share
the same schema with ~60 tables and the logical replication should
generate the union of the publishers on the subscriber.

Thanks,
Andrew




Appending data locally to a logical replication subscriber

2022-05-30 Thread andrew cooke


Hi All,

I would appreciate some advice on adding data to logically replicated
tables on the subscriber.  I am worried about contention between
writes from local data loading and replication.

We have 14 publisher databases (all with identical schema) that are
constantly receiving new data.  The tables have a composite primary
key where one column identifies the publisher machine.

We also have a single subscriber that subscribes to all these
machines.

In addition to the realtime data described above, we would like to
back-load some archive data.  For practical (ie political) reasons it
is easier to do this on the subscriber than on each publisher.  The
archive data will use a composite primary key that combines the
publisher code (used to differentiate publishers) with an incremental
ID that will start at a "sufficiently large number" to avoid
conflicts.

The loading process is not a simple append; some data are normalized
in separate tables.  So, for example, there may be a table with
orders, where each line is unique, but the orders reference companies
in another table, and usually these companies already exist.

My primary concern is that there is going to be contention between
replication and archive loading.

All tables on the subscriber may be queried by users (so disabling
indices is not trivial although maybe we could drop constraints and
load archive data at night).

Really, after all that, I guess I am asking the following:

 - Is this just a bad idea?  Is it implicit (or maybe explicit and
   I've missed it) that the subscriber should be dedicated only to
   reading subscriptions?

 - Do subscriber transactions reflect publisher transactions?  For
   example, if the publisher receives many rows in a single
   transaction (which can happen if we know/guess that the table is
   only being written to by one process) does that mean that the
   subscriber also writes those rows in a single transaction?  Or is
   it "simplified" to a transaction per row?  Or something else (like
   one transaction per "batch")?

 - Is there anything else I should be concerned about?

Thanks,
Andrew