Re: WAL-files is not removing authomaticaly
> Never, ever, remove WAL files manually. > A crash might render your database unrecoverable. You're right, it's not a good idea. > You'll have to figure out what is blocking WAL removal. This is my favorite dream! > Are there archive failures reported in "pg_stat_archiver" and the log? master=# select * from pg_stat_archiver; archived_count |last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset +--+---+--+-+--+-- 3021 | 0001000B00C9 | 2021-03-11 08:53:56.133796+02 |0 | | | 2021-01-06 14:33:40.70147+02 (1 row) In today's logs there is no any errors: $ grep "FATAL\|WARNING" /opt/pglogs/master.log $ $ grep "FATAL\|WARNING" /opt/pglogs/slave.log $ > You say that replication is working fine, but try > > SELECT pg_current_wal_lsn(); > > on the primary and compare that with the "restart_lsn" of the replication slot. It parameters is differ: master=# SELECT pg_current_wal_lsn(); pg_current_wal_lsn B/CB099398 (1 row) master=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --++---++--+---+++--+--+-+- standby_slot || physical || | f | t |2631806 | | | B/CB0AC068 | (1 row) But when I add new data to the table on master, it apears on slave. > Look for strange messages in the log file on both servers. Looking for strange messages gives nothing, in today's logs there is no any strange messages, just about connects of applications. Regards, Andrew ср, 10 мар. 2021 г. в 19:40, Laurenz Albe : > On Wed, 2021-03-10 at 14:34 +0200, Forum Writer wrote: > > Have Streaming replica with 2 hosts, master and slave with PostgreSQL > 10.6. > > Replica works fine and data from master copying to slave without any > problem, > > but on slave have a problem: wal-files in catalog ${PGDATA}/pg_wal is not > > removing authomaticaly and may be collects over some years. > > > > Both, master (1.1.1.1) and slave (2.2.2.2), have simmilar replication > slots: > > > > master=# select * from pg_replication_slots; > > slot_name | plugin | slot_type | datoid | database | > temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | > confirmed_flush_lsn > > > -++---++--+---+++--+--+-+- > > standby_slot || physical || | f | t > | 1184 | | | 8/1C16C508 | > > (1 row) > > > > and seted in ${PGDATA}/recovery.conf on slave: > > > > standby_mode = 'on' > > primary_conninfo = 'user=pgadmin host=2.2.2.2 port=5432 user=pgadmin > sslmode=prefer sslcompression=1' > > recovery_target_timeline = 'latest' > > trigger_file = 'failover' > > restore_command = 'cp /opt/archivelog/%f %p' > > archive_cleanup_command = '/usr/pgsql-10/bin/pg_archivecleanup > /opt/archivelog %r' > > primary_slot_name = 'standby_slot' > > > > In ${PGDATA}/postgresql.conf on both hosts sets: > > > > max_wal_size = 8GB > > wal_keep_segments = 32 > > > > But yesterday pg_wal was 16G and had 1018 files inside, thats why there > was no other > > way except deleting its manualy, but not of all, > > just 6 hundreds files, which had modification time before September 2020 > except > > 00010034.0028.backup. > > Trying to rebuild slave from master with: making pg_start_backup, > copying ${PGDATA} > > files from master to slave (without removing on slave any files in > ${PGDATA}), > > pg_stop_backup, tunning settings on slave and run it again had no any > success, > > wal-files still is not removing automaticaly. > > Does anybody knows how to fix it ? > > Never, ever, remove WAL files manually. > A crash might render your database unrecoverable. > > You'll have to figure out what is blocking WAL removal. > > Are there archive failures reported in "pg_stat_archiver" and the log? > > You say that replication is working fine, but try > > SELECT pg_current_wal_lsn(); > > on the primary and compare that with the "restart_lsn" of the replication > slot. > > Look for strange messages in the log file on both servers. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
at_tmp/db_13212.stat" 2021-03-16 06:32:16.562 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_0.stat" 2021-03-16 06:32:16.575 EET [97575] [] [] [] []DEBUG: server process (PID 35768) exited with exit code 0 2021-03-16 06:32:17.583 EET [97575] [] [] [] []DEBUG: forked new backend, pid=35770 socket=11 2021-03-16 06:32:17.587 EET [97583] [] [] [] []DEBUG: received inquiry for database 13212 2021-03-16 06:32:17.587 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/global.stat" 2021-03-16 06:32:17.587 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_13212.stat" 2021-03-16 06:32:17.588 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_0.stat" 2021-03-16 06:32:17.601 EET [97575] [] [] [] []DEBUG: server process (PID 35770) exited with exit code 0 2021-03-16 06:32:18.377 EET [97585] [] [] [] []DEBUG: sending write 145/ED1310C8 flush 145/ED1310C8 apply 145/ED1310C8 (reply requested) 2021-03-16 06:32:18.378 EET [97585] [] [] [] []DEBUG: sendtime 2021-03-16 06:32:18.378397+02 receipttime 2021-03-16 06:32:18.378306+02 replication apply delay 0 ms transfer latency 0 ms 2021-03-16 06:32:20.419 EET [97575] [] [] [] []DEBUG: forked new backend, pid=35773 socket=11 2021-03-16 06:32:20.428 EET [97575] [] [] [] []DEBUG: server process (PID 35773) exited with exit code 0 2021-03-16 06:32:20.995 EET [97575] [] [] [] []DEBUG: forked new backend, pid=35775 socket=11 2021-03-16 06:32:21.000 EET [97583] [] [] [] []DEBUG: received inquiry for database 13212 2021-03-16 06:32:21.000 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/global.stat" 2021-03-16 06:32:21.000 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_13212.stat" 2021-03-16 06:32:21.000 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_0.stat" 2021-03-16 06:32:21.013 EET [97575] [] [] [] []DEBUG: server process (PID 35775) exited with exit code 0 2021-03-16 06:32:21.021 EET [97575] [] [] [] []DEBUG: forked new backend, pid=35777 socket=11 чт, 11 мар. 2021 г. в 11:18, Laurenz Albe : > On Thu, 2021-03-11 at 09:43 +0200, Andrew Anderson wrote: > > > Never, ever, remove WAL files manually. > > > A crash might render your database unrecoverable. > > You're right, it's not a good idea. > > > > > You'll have to figure out what is blocking WAL removal. > > This is my favorite dream! > > > > > Are there archive failures reported in "pg_stat_archiver" and the log? > > master=# select * from pg_stat_archiver; > > archived_count |last_archived_wal | last_archived_time >| failed_count | last_failed_wal | last_failed_time | > stats_reset > > > +--+---+--+-+--+-- > >3021 | 0001000B00C9 | 2021-03-11 > 08:53:56.133796+02 |0 | | | > 2021-01-06 14:33:40.70147+02 > > (1 row) > > > > In today's logs there is no any errors: > > $ grep "FATAL\|WARNING" /opt/pglogs/master.log > > $ > > > > $ grep "FATAL\|WARNING" /opt/pglogs/slave.log > > $ > > > > > You say that replication is working fine, but try > > > > > > SELECT pg_current_wal_lsn(); > > > > > > on the primary and compare that with the "restart_lsn" of the > replication slot. > > > > It parameters is differ: > > master=# SELECT pg_current_wal_lsn(); > > pg_current_wal_lsn > > > > B/CB099398 > > (1 row) > > > > master=# select * from pg_replication_slots; > > slot_name | plugin | slot_type | datoid | database | > temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | > confirmed_flush_lsn > > > --++---++--+---+++--+--+-+- > > standby_slot || physical || | f | t > |2631806 | | | B/CB0AC068 | > > (1 row) > > > > But when I add new data to the table on master, it apears on slave. > > > > > Look for strange messages in the log file on both servers. > > Looking for strange messages gives nothing, in today's logs there is no > any strange messages, just about connects of applications. > > All that looks fine. > > If that happens again, set "log_min_messages" to "debug2". > Then the log file will contain information abotu the WAL segments that > PostgreSQL is trying to remove. That might give you some insight. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
]DEBUG: sending write 145/EEDA3C70 flush 145/EEDA2550 apply 145/EEDA2550 2021-03-16 09:44:18.794 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA3C70 flush 145/EEDA3C70 apply 145/EEDA2550 2021-03-16 09:44:18.794 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA3C70 flush 145/EEDA3C70 apply 145/EEDA3C70 2021-03-16 09:44:18.795 EET [97585] [] [] [] []DEBUG: sendtime 2021-03-16 09:44:18.794123+02 receipttime 2021-03-16 09:44:18.795082+02 replication apply delay 0 ms transfer latency 0 ms 2021-03-16 09:44:18.795 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E50 flush 145/EEDA3C70 apply 145/EEDA3C70 2021-03-16 09:44:18.795 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E50 flush 145/EEDA5E50 apply 145/EEDA3C70 2021-03-16 09:44:18.795 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E50 flush 145/EEDA5E50 apply 145/EEDA5E50 2021-03-16 09:44:18.908 EET [97585] [] [] [] []DEBUG: sendtime 2021-03-16 09:44:18.907307+02 receipttime 2021-03-16 09:44:18.908412+02 replication apply delay 0 ms transfer latency 1 ms 2021-03-16 09:44:18.908 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E90 flush 145/EEDA5E50 apply 145/EEDA5E50 2021-03-16 09:44:18.909 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E90 flush 145/EEDA5E90 apply 145/EEDA5E50 2021-03-16 09:44:18.909 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5E90 flush 145/EEDA5E90 apply 145/EEDA5E90 2021-03-16 09:44:19.033 EET [97585] [] [] [] []DEBUG: sendtime 2021-03-16 09:44:19.032755+02 receipttime 2021-03-16 09:44:19.033807+02 replication apply delay 0 ms transfer latency 1 ms 2021-03-16 09:44:19.033 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5FC0 flush 145/EEDA5E90 apply 145/EEDA5E90 2021-03-16 09:44:19.034 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5FC0 flush 145/EEDA5FC0 apply 145/EEDA5E90 2021-03-16 09:44:19.034 EET [97585] [] [] [] []DEBUG: sending write 145/EEDA5FC0 flush 145/EEDA5FC0 apply 145/EEDA5FC0 2021-03-16 09:44:19.258 EET [97575] [] [] [] []DEBUG: forked new backend, pid=69051 socket=11 2021-03-16 09:44:19.264 EET [97583] [] [] [] []DEBUG: received inquiry for database 13212 2021-03-16 09:44:19.264 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/global.stat" 2021-03-16 09:44:19.264 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_13212.stat" 2021-03-16 09:44:19.265 EET [97583] [] [] [] []DEBUG: writing stats file "pg_stat_tmp/db_0.stat" 2021-03-16 09:44:19.278 EET [97575] [] [] [] []DEBUG: server process (PID 69051) exited with exit code 0 2021-03-16 09:44:19.849 EET [97577] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default 2021-03-16 09:44:19.849 EET [97581] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default 2021-03-16 09:44:19.849 EET [97582] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default 2021-03-16 09:44:19.850 EET [97583] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default 2021-03-16 09:44:19.851 EET [97576] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default 2021-03-16 09:44:19.937 EET [97585] [] [] [] []DEBUG: parameter "log_checkpoints" removed from configuration file, reset to default вт, 16 мар. 2021 г. в 09:40, Laurenz Albe : > On Tue, 2021-03-16 at 09:33 +0200, Andrew Anderson wrote: > > The situation didn't changes, WAL segments not removing automatically. > > The streaming replication works fine, here is the log files: > > > > from master: > > [log with debug2, but without WAL segment delete messages] > > Could you set "log_checkpoints = on" and tell us what "checkpoint_timeout" > is set to? WAL segments are only deleted after checkpoints. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
In pg_wal catalog there is no file by that name: $ ls pg_wal/ | grep 013E0097 $ but there is files with very old time of creation: $ ls -lart pg_wal/ total 30933220 -rw--- 1 postgres postgres 299 Dec 30 05:44 0002013E0097.0108.backup -rw--- 1 postgres postgres 16777216 Dec 30 06:12 00010001 -rw--- 1 postgres postgres 16777216 Dec 30 06:12 0002013E0098 -rw--- 1 postgres postgres 44 Dec 30 06:29 0002.history -rw--- 1 postgres postgres 16777216 Dec 30 06:29 0002013E0099 -rw--- 1 postgres postgres 16777216 Dec 30 09:01 0002013E009A -rw--- 1 postgres postgres 16777216 Dec 30 09:57 0002013E009B . till now . -rw--- 1 postgres postgres 16777216 Mar 16 12:59 0002014500F2 -rw--- 1 postgres postgres 16777216 Mar 16 13:50 0002014500F3 -rw--- 1 postgres postgres 16777216 Mar 16 14:39 0002014500F4 drwx-- 2 postgres postgres 118784 Mar 16 14:39 archive_status -rw--- 1 postgres postgres 16777216 Mar 16 15:19 0002014500F6 drwx-- 3 postgres postgres94208 Mar 16 15:19 . -rw--- 1 postgres postgres 16777216 Mar 16 15:25 0002014500F5 drwx-- 19 postgres postgres 4096 Mar 16 15:25 .. and on master current WAL-file: postgres=# select pg_walfile_name(pg_current_wal_lsn()); pg_walfile_name -- 0002014500F6 (1 row) вт, 16 мар. 2021 г. в 14:21, Laurenz Albe : > On Tue, 2021-03-16 at 09:49 +0200, Andrew Anderson wrote: > > 2021-03-16 09:44:03.997 EET [97581] [] [] [] []DEBUG: attempting to > remove WAL segments older than log file 013E0097 > > That was the entry I was hoping to see. > > So, are all older WAL segments deleted or recycled? > Is that a reasonably recent WAL segment? > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
> - replication slot (you said that is not it) The replication slot is the same on master: postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --++---++--+---+++--+--+--+- stanby_slot || physical || | f | t | 94341 | | | 145/F6ECC190 | (1 row) $ grep slot ${PGDATA}/recovery.done primary_slot_name = 'stanby_slot' and on slave: postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --++---++--+---+++--+--+--+- stanby_slot || physical || | f | f || | | 13E/981E2DD0 | (1 row) $ grep slot ${PGDATA}/recovery.conf primary_slot_name = 'stanby_slot' postgres=# show wal_keep_segments; wal_keep_segments --- 32 (1 row) but on slave: $ ls pg_wal/ | wc -l 1892 вт, 16 мар. 2021 г. в 15:53, Laurenz Albe : > On Tue, 2021-03-16 at 15:31 +0200, Andrew Anderson wrote: > > вт, 16 мар. 2021 г. в 14:21, Laurenz Albe : > > > On Tue, 2021-03-16 at 09:49 +0200, Andrew Anderson wrote: > > > > 2021-03-16 09:44:03.997 EET [97581] [] [] [] []DEBUG: attempting to > remove WAL segments older than log file 013E0097 > > > > > > That was the entry I was hoping to see. > > > > > > So, are all older WAL segments deleted or recycled? > > > Is that a reasonably recent WAL segment? > > > > but there is files with very old time of creation: > > $ ls -lart pg_wal/ > > total 30933220 > > -rw--- 1 postgres postgres 299 Dec 30 05:44 > 0002013E0097.0108.backup > > -rw--- 1 postgres postgres 16777216 Dec 30 06:12 > 00010001 > > -rw--- 1 postgres postgres 16777216 Dec 30 06:12 > 0002013E0098 > > -rw--- 1 postgres postgres 44 Dec 30 06:29 0002.history > > -rw--- 1 postgres postgres 16777216 Dec 30 06:29 > 0002013E0099 > > -rw--- 1 postgres postgres 16777216 Dec 30 09:01 > 0002013E009A > > -rw--- 1 postgres postgres 16777216 Dec 30 09:57 > 0002013E009B > > . till now . > > -rw--- 1 postgres postgres 16777216 Mar 16 12:59 > 0002014500F2 > > -rw--- 1 postgres postgres 16777216 Mar 16 13:50 > 0002014500F3 > > -rw--- 1 postgres postgres 16777216 Mar 16 14:39 > 0002014500F4 > > drwx-- 2 postgres postgres 118784 Mar 16 14:39 archive_status > > -rw--- 1 postgres postgres 16777216 Mar 16 15:19 > 0002014500F6 > > drwx-- 3 postgres postgres94208 Mar 16 15:19 . > > -rw--- 1 postgres postgres 16777216 Mar 16 15:25 > 0002014500F5 > > drwx-- 19 postgres postgres 4096 Mar 16 15:25 .. > > > > and on master current WAL-file: > > postgres=# select pg_walfile_name(pg_current_wal_lsn()); > > pg_walfile_name > > -- > > 0002014500F6 > > (1 row) > > These things hold up deletion of WAL segments: > > - checkpoint (not the problem here) > > - replication slot (you said that is not it) > > - archiver failure (also doesn't seem to be the problem) > > - wal_keep_segments / wal_keep_size: what is your setting? > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
You're right, restart_lsn on slave does not changes. > What's using it? As I think, streaming replication is using this slot. Does anybody know how to fix it ? вт, 16 мар. 2021 г. в 17:44, Tom Lane : > Andrew Anderson writes: > >> - replication slot (you said that is not it) > > > The replication slot is the same on master: > > postgres=# select * from pg_replication_slots ; > > slot_name | plugin | slot_type | datoid | database | temporary | > > active | active_pid | xmin | catalog_xmin | restart_lsn | > > confirmed_flush_lsn > > > --++---++--+---+++--+--+--+- > > stanby_slot || physical || | f | t > > | 94341 | | | 145/F6ECC190 | > > (1 row) > > $ grep slot ${PGDATA}/recovery.done > > primary_slot_name = 'stanby_slot' > > > and on slave: > > postgres=# select * from pg_replication_slots ; > > slot_name | plugin | slot_type | datoid | database | temporary | > > active | active_pid | xmin | catalog_xmin | restart_lsn | > > confirmed_flush_lsn > > > --++---++--+---+++--+--+--+- > > stanby_slot || physical || | f | f > > || | | 13E/981E2DD0 | > > (1 row) > > $ grep slot ${PGDATA}/recovery.conf > > primary_slot_name = 'stanby_slot' > > Uh ... that slot's restart_lsn seems to correspond to where the > standby is refusing to truncate WAL beyond. So I think your > problem is exactly that this slot isn't advancing. What's > using it? > > regards, tom lane >
Re: WAL-files is not removing authomaticaly
So, right now I have configuration of replication slots: On master: postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --++---++--+---+++--+--+--+- stanby_slot || physical || | f | t | 94341 | | | 145/FBAACBA8 | (1 row) On slave: postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn --++---++--+---+++--+--+--+- stanby_slot || physical || | f | f || | | 13E/981E2DD0 | (1 row) And I can do 'select pg_drop_replication_slot('stanby_slot');' on slave without any risk to destroy streaming replication ? And will it fix the automated removing of WAL-files ? вт, 16 мар. 2021 г. в 19:39, Tom Lane : > Andrew Anderson writes: > >> What's using it? > > > As I think, streaming replication is using this slot. Does anybody know > how > > to fix it ? > > Unless you need another replica that's downstream of the standby, > you should not be maintaining a replication slot on the standby. > > There may be a way to have a slot that's not actually holding back > WAL cleanup while doing nothing, but I don't know what it is. > > regards, tom lane >
Re: WAL-files is not removing authomaticaly
But maybe there is a way to fix this ? Rebuilding slave from master with erasing ${PGDATA} on slave does not help. вт, 16 мар. 2021 г. в 16:56, Laurenz Albe : > On Tue, 2021-03-16 at 16:11 +0200, Andrew Anderson wrote: > > postgres=# show wal_keep_segments; > > wal_keep_segments > > --- > > 32 > > (1 row) > > Ok, then I am out of ideas. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: WAL-files is not removing authomaticaly
Deleting replication slot on slave resolve the issue, now WAL-files removing automaticaly ! Thanks a lot for your answers ! ср, 17 мар. 2021 г. в 10:59, Andrew Anderson : > But maybe there is a way to fix this ? Rebuilding slave from master with > erasing ${PGDATA} on slave does not help. > > вт, 16 мар. 2021 г. в 16:56, Laurenz Albe : > >> On Tue, 2021-03-16 at 16:11 +0200, Andrew Anderson wrote: >> > postgres=# show wal_keep_segments; >> > wal_keep_segments >> > --- >> > 32 >> > (1 row) >> >> Ok, then I am out of ideas. >> >> Yours, >> Laurenz Albe >> -- >> Cybertec | https://www.cybertec-postgresql.com >> >>