Repeated, never-ending deadlock
Hi All, OK, apologies in advance, I feel like this is a rather unintelligent question that I probably know the answer to, but I am banging my head against a wall trying to understand what is happening. I have an ETL program, written in Python, that uses mutiple processes to read and load a bunch of files into a database. Sometimes (especially early on, when the database is new) common information that occurs in several files, and which is normalized out into sub-tables, causes deadlocks. That's OK; the code should receive an exception, back off, try again, reading first, etc. And I see exceptions occuring in the Python code log as expected. However, today, I saw the same deadlock error repeated in the Postgres log every second for minutes. The same two processes, with the same data, and the same tables. I can't work out how this can happen without a crazy tight loop in the Python code that's just repeatedly trying to commit a value (which afaik doesn't exist!) A fragment of the Postgres log is below. The Python logs show nothing. So my question is - is there some circumstance where instead of raising an error to the calling code, Postgres instead goes round in circles? An answer of "nope, it must be a bug in your code" is fine / expected. It's just one of those days... This is Postgres 14 with a fairly default config (except that logical replication is enabled). Thanks, Andrew 2022-04-19 19:55:54.482 UTC [28560] ERROR: deadlock detected 2022-04-19 19:55:54.482 UTC [28560] DETAIL: Process 28560 waits for ShareLock on transaction 609318483; blocked by process 28559. Process 28559 waits for ShareLock on transaction 609318682; blocked by process 28560. Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher 2022-04-19 19:55:54.482 UTC [28560] HINT: See server log for query details. 2022-04-19 19:55:54.482 UTC [28560] CONTEXT: while inserting index tuple (25,3) in relation "channel_nscl_nscl_idx" 2022-04-19 19:55:54.482 UTC [28560] STATEMENT: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher 2022-04-19 19:55:55.488 UTC [28560] ERROR: deadlock detected 2022-04-19 19:55:55.488 UTC [28560] DETAIL: Process 28560 waits for ShareLock on transaction 609318483; blocked by process 28559. Process 28559 waits for ShareLock on transaction 609318682; blocked by process 28560. Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher 2022-04-19 19:55:55.488 UTC [28560] HINT: See server log for query details. 2022-04-19 19:55:55.488 UTC [28560] CONTEXT: while inserting index tuple (25,4) in relation "channel_nscl_nscl_idx" 2022-04-19 19:55:55.488 UTC [28560] STATEMENT: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher 2022-04-19 19:55:56.495 UTC [28560] ERROR: deadlock detected 2022-04-19 19:55:56.495 UTC [28560] DETAIL: Process 28560 waits for ShareLock on transaction 609318483; blocked by process 28559. Process 28559 waits for ShareLock on transaction 609318682; blocked by process 28560. Process 28560: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher Process 28559: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'CRY', 'HNZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher 2022-04-19 19:55:56.495 UTC [28560] HINT: See server log for query details. 2022-04-19 19:55:56.495 UTC [28560] CONTEXT: while inserting index tuple (25,5) in relation "channel_nscl_nscl_idx" 2022-04-19 19:55:56.495 UTC [28560] STATEMENT: INSERT INTO channel_nscl (net, sta, chan, loc) VALUES ('AZ', 'KNW', 'HHZ', '--') RETURNING channel_nscl.rowid, channel_nscl.publisher
Logical subscription / publication lifetimes
If I define a publication at time Tp, then load some data on the publisher, then start a subscription at time Ts, then load some more data on the publisher, does the subscriber get data from Tp or Ts onwards? Also, if a subscription is disabled and then re-enabled does it lose the data inbetween, or is it back-filled? I am not finding the answers to these questions in the docs at https://www.postgresql.org/docs/current/logical-replication.html but maybe I am overlooking something. The link above does mention copying an existing table which may imply Ts? Thanks, Andrew
Re: Logical subscription / publication lifetimes
Ah, thanks! I should have read the documentation of all the parameters! So the portion of data that is covered by "copy_data" is going to reflect updates and deletes prior to the creation of the slot even if "publish=insert" (only)? This makes sense because I can't see how else it could be practically implemented, but just want to be sure I am understanding. The idea that there are two phases (copy existing data then replicate operations) is a big help. Thanks again, Andrew On Fri, Apr 22, 2022 at 09:13:15AM -0700, David G. Johnston wrote: > On Fri, Apr 22, 2022 at 5:00 AM andrew cooke wrote: > > > > > If I define a publication at time Tp, then load some data on the > > publisher, then start a subscription at time Ts, then load some more > > data on the publisher, does the subscriber get data from Tp or Ts > > onwards? > > > > > It depends. By default, neither, the publisher is publishing the entire > contents of the table and the subscriber will do everything necessary to > replicate those contents in their entirety. > > If you specify copy_data = false I'm not sure what you end up with > initially or after disable. My guess is the subscription defines the first > transaction it cares about when it connects to the publisher, defaulting to > the most recent publisher transaction (all older transactions would be > handled via copy_data = true) but then so long as the slot remains active > the publisher will place the data into the slot even while the subscriber > is not active and the subscriber will receive all of it next time it comes > online/re-enables. > > David J.
How many max_replication_slots?
Hi, How can I estimate the correct value for max_replication_slots (logical replication, pg 14)? As far as I understand, they're needed to keep WAL files on the publisher when something goes wrong. But we seem to need way, way more slots than publishers. Is it one slot per publisher per table? We have 14 publishers and seem to need more than 100 slots. Is this reasonable, or does it indicate some underlying error? They all share the same schema with ~60 tables and the logical replication should generate the union of the publishers on the subscriber. Thanks, Andrew
Appending data locally to a logical replication subscriber
Hi All, I would appreciate some advice on adding data to logically replicated tables on the subscriber. I am worried about contention between writes from local data loading and replication. We have 14 publisher databases (all with identical schema) that are constantly receiving new data. The tables have a composite primary key where one column identifies the publisher machine. We also have a single subscriber that subscribes to all these machines. In addition to the realtime data described above, we would like to back-load some archive data. For practical (ie political) reasons it is easier to do this on the subscriber than on each publisher. The archive data will use a composite primary key that combines the publisher code (used to differentiate publishers) with an incremental ID that will start at a "sufficiently large number" to avoid conflicts. The loading process is not a simple append; some data are normalized in separate tables. So, for example, there may be a table with orders, where each line is unique, but the orders reference companies in another table, and usually these companies already exist. My primary concern is that there is going to be contention between replication and archive loading. All tables on the subscriber may be queried by users (so disabling indices is not trivial although maybe we could drop constraints and load archive data at night). Really, after all that, I guess I am asking the following: - Is this just a bad idea? Is it implicit (or maybe explicit and I've missed it) that the subscriber should be dedicated only to reading subscriptions? - Do subscriber transactions reflect publisher transactions? For example, if the publisher receives many rows in a single transaction (which can happen if we know/guess that the table is only being written to by one process) does that mean that the subscriber also writes those rows in a single transaction? Or is it "simplified" to a transaction per row? Or something else (like one transaction per "batch")? - Is there anything else I should be concerned about? Thanks, Andrew