Hi Thomas! On Tue, Feb 4, 2025 at 10:22 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > > On Sun, Feb 2, 2025 at 3:44 AM Ants Aasma <ants.aa...@cybertec.at> wrote: > > The other direction is to split off WAL decoding, buffer lookup and maybe > > even pinning to a separate process from the main redo loop. > > Hi Ants, > [..] > An assumption I just made, in remembering all that: OP didn't > mention it but I guess that this COPY replay is probably repeatedly > hammering the same pages from separate records here, because otherwise > multi-insert stuff would already avoid a lot of mapping table lookups > already?
Basic COPY (he mentioned basic CREATE SUBSCRIPTION table copy, so I assume it's fresh): ends emitting stuff like this (note the "blk" is increasing): rmgr: Heap2 len (rec/tot): 3666/ 3666, tx: 759, lsn: 0/040069A0, prev 0/04004950, desc: MULTI_INSERT+INIT ntuples: 226, flags: 0x00, blkref #0: rel 1663/5/16393 blk 5 rmgr: Heap2 len (rec/tot): 3666/ 3666, tx: 759, lsn: 0/040077F8, prev 0/040069A0, desc: MULTI_INSERT+INIT ntuples: 226, flags: 0x00, blkref #0: rel 1663/5/16393 blk 6 rmgr: Heap2 len (rec/tot): 3666/ 3666, tx: 759, lsn: 0/04008668, prev 0/040077F8, desc: MULTI_INSERT+INIT ntuples: 226, flags: 0x00, blkref #0: rel 1663/5/16393 blk 7 rmgr: Heap2 len (rec/tot): 3122/ 3122, tx: 759, lsn: 0/040094C0, prev 0/04008668, desc: MULTI_INSERT+INIT ntuples: 192, flags: 0x02, blkref #0: rel 1663/5/16393 blk 8 now if the table would have PK , we end up doing *massive* INSERT_LEAFs due to lack of batched btree emision like just in heap2 case: rmgr: Heap2 len (rec/tot): 3666/ 3666, tx: 763, lsn: 0/05000028, prev 0/041C8198, desc: MULTI_INSERT+INIT ntuples: 226, flags: 0x00, blkref #0: rel 1663/5/16396 blk 0 rmgr: Heap2 len (rec/tot): 3666/ 3666, tx: 763, lsn: 0/05000E80, prev 0/05000028, desc: MULTI_INSERT+INIT ntuples: 226, flags: 0x00, blkref #0: rel 1663/5/16396 blk 1 [..] rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004050, prev 0/05003FD8, desc: INSERT_LEAF off: 1, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004090, prev 0/05004050, desc: INSERT_LEAF off: 2, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/050040D0, prev 0/05004090, desc: INSERT_LEAF off: 3, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004110, prev 0/050040D0, desc: INSERT_LEAF off: 4, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004150, prev 0/05004110, desc: INSERT_LEAF off: 5, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004190, prev 0/05004150, desc: INSERT_LEAF off: 6, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/050041D0, prev 0/05004190, desc: INSERT_LEAF off: 7, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004210, prev 0/050041D0, desc: INSERT_LEAF off: 8, blkref #0: rel 1663/5/16398 blk 1 rmgr: Btree len (rec/tot): 64/ 64, tx: 763, lsn: 0/05004250, prev 0/05004210, desc: INSERT_LEAF off: 9, blkref #0: rel 1663/5/16398 blk 1 [..] I don't know exactly which situation Dmitry has hit while the second scenario would be much easier to optimize. This of course reminds me of Your earlier work on recet_buffer optimization too and Andres mentioned some form of LRU cache to just protect the hash table for buffer mapping lookups. Also I was under the impression that work by Bhrath [0] could help here too to lower the number of WAL records emitted. @Dmitry : So if you're reading then this is a known problem for a while (max performance ceiling, see [1]), but there are a myriad of possible long-term solutions in the code. Outside of hacking PG changes, you could only probably split the big table on publisher into many smaller partitions (but not too many), and then having COPY running on those smaller ones with some delays so that replication lag doesn't grow too much. The worst alternative is trying to get the fastest possible cores (it's a single threaded bottleneck). As Andres noted you probably could try to recompile with some better -march flag on that ARM and see how much that helps. -J. [0] - https://www.postgresql.org/message-id/flat/CALj2ACVi9eTRYR=gdca5wxtj3kk_9q9qvccxss1hngtgocj...@mail.gmail.com [1] - https://www.postgresql.org/message-id/VI1PR0701MB69608CBCE44D80857E59572EF6CA0%40VI1PR0701MB6960.eurprd07.prod.outlook.com