Hello All, Ok, so maybe something helpful? On my master node I am seeing a bunch of:
2019-03-28 23:54:44 GMT [2611]: [1-1] user=admin,db=admin,client=17210.10.37 172.10.10.37(57552) (0:3D000)FATAL: database "admin" does not exist 2019-03-28 23:54:46 GMT [2613]: [1-1] user=admin,db=admin,client=172.10.10.18 172.10.10.18(56970) (0:3D000)FATAL: database "admin" does not exist Which is accurate, as there is no admin database... I usually connect with psql -h localhost -U admin postgres Should there be? Will this fix my issue with pgsql filling up the disk? Thanks, - QBR On Mon, Mar 25, 2019 at 10:21 AM Foo Bar <qubitreneg...@gmail.com> wrote: > Hello All, > > Wow! Lots of awesome replies, Went away for the weekend thinking my email > had been rejected and come back to a full inbox. Thanks for all the help! > > >> Postgres version? > > 9.6.11 > > Doh. Fairly important detail there. :) > > >> FYI, psql is the Postgres client program, Postgres(ql) is the server. > >> "psql" is the name of a specific command line tool used to connect to > a PostgreSQL database server, it is not the name of the database itself. > The database is usually abbreviated "pgsql". > > Duly noted, thanks for the correction. > > >> It's the standby that has not seen any traffic? > > There's really no traffic. I built three nodes, connected them, created a > test table and inserted some values, then left the cluster be for a couple > weeks. > > >> And "restartpoint" is usually spelled as one work in technical > discussions of it. Or at least, searching for it that way avoids finding > things which mention each word separately in different senses. > > Ah ha. I had seen it that way but thought it was a typo. Thanks for the > clarification! > > >> Are you sure it is the archive > directory (/hab/svc/postgresql/data/archive) which is filling up, and not > the live directory (pg_wal or pg_xlog)? This is often a point of confusion. > > Right before I sent the mail last week I deleted everything in > /hab/svc/postgresql/data/archive, this morning I'm seeing: > > # du -h --max=1 /hab/svc/postgresql/data/ > 198M /hab/svc/postgresql/data/pgdata > 8.9G /hab/svc/postgresql/data/archive > 9.1G /hab/svc/postgresql/data/ > # du -hs /hab/svc/postgresql/data/pgdata/pg_xlog/ > 177M /hab/svc/postgresql/data/pgdata/pg_xlog/ > # ls -lah /hab/svc/postgresql/data/archive/ | wc -l > 571 > > There is no pg_wal directory though (should there be?) > > # find /hab/svc/postgresql/ -name '*pg*wal*' > # > > >> If the only reason you want an archive is for replication, then use > streaming replication and do away with the archive completely > > To be honest, I thought it was required for streaming replication based on > the guides linked above. > > >> There are reasons other than replication that one might want to keep a > WAL archive, but those reasons don't seem to apply to you > > Like backup maybe? A wholly other topic, we recently had a power outage > and I lost a pgsql node... having an archive would allow me to "replay" any > transactions? > > >> What needs to be determined here is why the standby never consumed the > WAL's from the master? > Ok, so it the standby that's the problem. > > >> Do you still have the logs from the standby and do they show anything > relevant? > > Sure, what am I looking for? I see a bunch of entries like: > > 2019-03-08 17:06:11 GMT [1813]: [815-1] user=,db=,client= (0:00000)LOG: > restartpoint complete: wrote 22 buffers (0.0%); 0 transaction log file(s) > added, 0 removed, 1 recycled; write=2.211 s, sync=0.062 s, total=2.281 s; > sync files=18, longest=0.062 s, average=0.003 s; distance=16383 kB, > estimate=16383 kB > 2019-03-08 17:06:11 GMT [1813]: [816-1] user=,db=,client= (0:00000)LOG: > recovery restart point at 0/8D000028 > > On the 15th, around when I think I filled the disk, I see a bunch of: > > cp: cannot stat '/00000002.history': No such file or directory > cp: cannot stat '/000000010000000400000049': No such file or directory > 2019-03-15 23:59:49 GMT [16691]: [1-1] user=,db=,client= (0:XX000)FATAL: > could not connect to the primary server: could not connect to server: > Connection refused > Is the server running on host "172.16.10.23" and accepting > TCP/IP connections on port 5432? > > Which makes sense since the pgsql service was down. > > This appears to be when I recovered the master on Thursday: > > cp: cannot stat '/00000002.history': No such file or directory > cp: cannot stat '/00000001000000040000004D': No such file or directory > 2019-03-21 17:37:31 GMT [31338]: [1-1] user=,db=,client= (0:XX000)FATAL: > could not connect to the primary server: could not connect to server: > Connection refused > Is the server running on host "172.16.10.23" and accepting > TCP/IP connections on port 5432? > > cp: cannot stat '/00000002.history': No such file or directory > cp: cannot stat '/00000001000000040000004D': No such file or directory > 2019-03-21 17:37:36 GMT [31343]: [1-1] user=,db=,client= (0:00000)LOG: > started streaming WAL from primary at 4/4D000000 on timeline 1 > 2019-03-21 17:37:47 GMT [30711]: [3151-1] user=,db=,client= > (0:00000)LOG: restartpoint starting: time > 2019-03-21 17:37:47 GMT [30711]: [3152-1] user=,db=,client= > (0:00000)LOG: restartpoint complete: wrote 0 buffers (0.0%); 0 transaction > log file(s) added, 0 removed, 1 recycled; write=0.003 s, sync=0.000 s, > total=0.007 s; sync files=0, longest=0.000 s, average=0.000 s; > distance=16384 kB, estimate=16384 kB > > Then there's a bunch of the same entries where about the only thing (other > than the timestamp) that seems to change is the index in: > > 2019-03-23 23:33:40 GMT [30711]: [5094-1] user=,db=,client= > (0:00000)LOG: recovery restart point at 5/90000140 > 2019-03-23 23:38:40 GMT [30711]: [5097-1] user=,db=,client= > (0:00000)LOG: recovery restart point at 5/91000028 > 2019-03-23 23:43:40 GMT [30711]: [5100-1] user=,db=,client= > (0:00000)LOG: recovery restart point at 5/91000108 > > I see effectively the same logs on the other hotstandby pgsql node. > > This is my config on my standby node: > > listen_addresses = '0.0.0.0' > port = 5432 > max_connections = 100 > external_pid_file = '/hab/svc/postgresql/var/postgresql.pid' > authentication_timeout = 1min > max_files_per_process = 1000 > max_locks_per_transaction = 64 > logging_collector = on > log_directory = '/hab/svc/postgresql/var/pg_log' > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,client=%h %r (%x:%e)' > log_min_messages = ERROR > datestyle = 'iso, mdy' > default_text_search_config = 'pg_catalog.english' > data_directory = '/hab/svc/postgresql/data/pgdata' > hba_file = '/hab/svc/postgresql/config/pg_hba.conf' > wal_level = hot_standby > wal_log_hints = 'on' > hot_standby = 'on' > hot_standby_feedback = true > max_wal_senders = 5 > max_replication_slots = 5 > checkpoint_completion_target = 0.9 > max_wal_size = 1GB > min_wal_size = 128MB > wal_keep_segments = 8 > log_checkpoints = on > log_lock_waits = on > log_temp_files = 0 > log_autovacuum_min_duration = 0 > track_activity_query_size = 2048 > track_io_timing=on > dynamic_shared_memory_type = 'none' > archive_mode = 'on' > archive_command = 'cp %p /hab/svc/postgresql/data/archive/%f' > archive_timeout = '10min' > max_standby_archive_delay = '30s' > synchronous_commit = local > include '/hab/svc/postgresql/config/postgresql.local.conf' > > Should I disable archive mode? Even though I'm not currently using it, it > seems like there's a use-case for having it? And if I can configure out > what causing the backup of archive files it should still be manageable? > > Thanks again for all the replies, while it hasn't solved the problem yet, > this was incredibly helpful! Also, please don't hate me for munging all > your replies into one reply... I thought it might be easier to follow than > having three different branches...? > > Best Regards, > - QBR > > On Fri, Mar 22, 2019 at 7:14 PM Michael Paquier <mich...@paquier.xyz> > wrote: > >> On Fri, Mar 22, 2019 at 12:26:33PM -0400, Jeff Janes wrote: >> > archive_cleanup_command is pretty much obsolete. The modern way to do >> this >> > is with streaming replication, using either replication slots or >> > wal_keep_segments. If the only reason you want an archive is for >> > replication, then use streaming replication and do away with the archive >> > completely. There are reasons other than replication that one might >> want >> > to keep a WAL archive, but those reasons don't seem to apply to you. >> And >> > if they did you almost certainly wouldn't want to run >> > archive_cleanup_command on it. >> >> Personally, I still find archives also very valuable when a standby >> creation takes a long time because of a large initial base backup and >> that the partition dedicated to pg_wal is not large enough to support >> the retention associated with a slot, and it is easier to have larger >> retention policies in the archives. >> -- >> Michael >> >