Yeah, indexes could slow things down, thanks. Btw I'm not using logical replication for the upgrade, that's not supported for 9.3. It was more complicated but that's beside the point.
I could just delete the publication and all that belongs to it and start over. But since I'm trying out logical replication, I would like to be more in control than that. It's there anything that I can dig into to find out why the WAL is accumulating? Op vr 28 mei 2021 22:20 schreef Vijaykumar Jain < vijaykumarjain.git...@gmail.com>: > I am not too sure with 9.3 > i tried an upgrade from 9.6 to 11 using logical replication (pg_logical > extension) > > one thing to note. > logical replication initiates a copy from a snapshot, then changes from > then on. > > I had a very high insert rate on my source tables (v9.6) and the > destination (v11) could not keep up (it had tons of indexes when I copied > the schema) and it took around a day as the table had around 12 indexes. > > So at the destination(v11), I dropped all but the primary index for each > table, started subscription and when it was almost caught up, rebuilt the > index on the destination concurrently. > it completed in 4-5 hours without stopping the source. > migration completed in a few mins :) > > not sure if this would help, but just FYI. > > > On Sat, 29 May 2021 at 01:36, Willy-Bas Loos <willy...@gmail.com> wrote: > >> Hi , I'm upgrading a 1.5TB database from postgres 9.3 to postgres 13 on >> Debian 10. This is now in an Acceptance stage (DTAP). I have encountered a >> problem: the WAL is not being deleted. I now have 1.4 TB of WAL in pg_wal >> and my disks are getting full. The oldest WAL file is 18 days old. >> I use Logical Replication from the new cluster to another new cluster >> with 1 subscriber and 1 subscription. >> >> pg_stat_subscription tells me all recent timestamps. >> and this: >> db=# select * from pg_replication_slots; >> -[ RECORD 1 ]-------+------------- >> slot_name | my_pub1 >> plugin | pgoutput >> slot_type | logical >> datoid | 16401 >> database | db >> temporary | f >> active | t >> active_pid | 9480 >> xmin | >> catalog_xmin | 269168 >> restart_lsn | D4/908BC268 >> confirmed_flush_lsn | E1/25BF5710 >> wal_status | extended >> safe_wal_size | >> >> >> >> I've had problems with diskspace on this server, with postgres crashing >> because of it, then added more diskspace and postgres recovered. This >> doesn't seem to be a problem now. >> >> The *publication* has the options publish = 'insert, update, delete, >> truncate', publish_via_partition_root = false >> The *subscription* has the options connect = true, enabled = true, >> create_slot = false, slot_name = my_pub1, synchronous_commit = 'off' >> >> The log on the publisher says: >> 2021-05-25 21:25:18.973 CEST [4584] user@db LOG: starting logical >> decoding for slot "my_pub1" >> 2021-05-25 21:25:18.973 CEST [4584] user@db DETAIL: Streaming >> transactions committing after D6/A82B5FE0, reading WAL from D4/908BC268. >> 2021-05-25 21:25:18.973 CEST [4584] user@db LOG: logical decoding found >> consistent point at D4/908BC268 >> 2021-05-25 21:25:18.973 CEST [4584] user@db DETAIL: There are no >> running transactions. >> 2021-05-25 21:29:49.456 CEST [4614] user@db ERROR: replication slot >> "my_pub1" is active for PID 4584 >> 2021-05-25 21:29:54.474 CEST [4615] user@db ERROR: replication slot >> "my_pub1" is active for PID 4584 >> >> And on the subscriber: >> 2021-05-28 21:23:46.702 CEST [40039] LOG: logical replication apply >> worker for subscription "my_pub1" has started >> 2021-05-28 21:23:46.712 CEST [40039] ERROR: could not start WAL >> streaming: ERROR: replication slot "my_pub1" is active for PID 730 >> 2021-05-28 21:23:46.714 CEST [19794] LOG: background worker "logical >> replication worker" (PID 40039) exited with exit code 1 >> >> The postgres settings on the *publisher* are: >> max_connections = 100 # (change requires restart) >> tcp_keepalives_idle = 120 # TCP_KEEPIDLE, in seconds; >> shared_buffers = 50GB # min 128kB >> work_mem = 1GB # min 64kB >> maintenance_work_mem = 10GB # min 1MB >> logical_decoding_work_mem = 5GB # min 64kB >> dynamic_shared_memory_type = posix # the default is the first option >> max_worker_processes = 20 # (change requires restart) >> max_parallel_maintenance_workers = 10 # taken from max_parallel_workers >> max_parallel_workers_per_gather = 5 # taken from max_parallel_workers >> max_parallel_workers = 15 # maximum number of max_worker_processes that >> wal_level = logical # minimal, replica, or logical >> max_wal_size = 1GB >> min_wal_size = 80MB >> #archive_mode = off >> max_wal_senders = 10 # max number of walsender processes >> wal_sender_timeout = 1min # in milliseconds; 0 disables >> max_replication_slots = 7 # max number of replication slots >> >> On postgres settings on the *subscriber*: >> max_connections = 100 # (change requires restart) >> tcp_keepalives_idle = 120 # TCP_KEEPIDLE, in seconds; >> shared_buffers = 25GB # min 128kB >> work_mem = 1GB # min 64kB >> maintenance_work_mem = 10GB # min 1MB >> logical_decoding_work_mem = 5GB # min 64kB >> dynamic_shared_memory_type = posix # the default is the first option >> max_worker_processes = 20 # (change requires restart) >> max_parallel_maintenance_workers = 10 # taken from max_parallel_workers >> max_parallel_workers_per_gather = 5 # taken from max_parallel_workers >> max_parallel_workers = 15 # maximum number of max_worker_processes that >> wal_level = logical # minimal, replica, or logical >> max_wal_size = 3GB >> min_wal_size = 80MB >> #archive_mode = off >> wal_receiver_timeout = 1min # time that receiver waits for >> max_logical_replication_workers = 10 # taken from max_worker_processes >> max_sync_workers_per_subscription = 5 # taken from >> max_logical_replication_workers >> >> I've tried increasing wal_sender_timeout and wal_receiver_timeout to 10 >> minutes each, but this had no positive effect. >> >> Some advice would be helpful >> -- >> Willy-Bas Loos >> > > > -- > Thanks, > Vijay > Mumbai, India >