That seems to be a misconfigured client application that explicitly tries to connect to a non-existent database 'admin' (via db=admin). Instead of adding that database, it seems more logical to fix the client configuration.
On Tue, 2 Apr 2019 at 09:53, Foo Bar <qubitreneg...@gmail.com> wrote: > Hello All, > > Ok, so maybe something helpful? On my master node I am seeing a bunch of: > > 2019-03-28 23:54:44 GMT [2611]: [1-1] > user=admin,db=admin,client=17210.10.37 172.10.10.37(57552) (0:3D000)FATAL: > database "admin" does not exist > 2019-03-28 23:54:46 GMT [2613]: [1-1] > user=admin,db=admin,client=172.10.10.18 172.10.10.18(56970) > (0:3D000)FATAL: database "admin" does not exist > > Which is accurate, as there is no admin database... I usually connect > with > > psql -h localhost -U admin postgres > > Should there be? Will this fix my issue with pgsql filling up the disk? > > Thanks, > - QBR > > On Mon, Mar 25, 2019 at 10:21 AM Foo Bar <qubitreneg...@gmail.com> wrote: > >> Hello All, >> >> Wow! Lots of awesome replies, Went away for the weekend thinking my >> email had been rejected and come back to a full inbox. Thanks for all the >> help! >> >> >> Postgres version? >> >> 9.6.11 >> >> Doh. Fairly important detail there. :) >> >> >> FYI, psql is the Postgres client program, Postgres(ql) is the server. >> >> "psql" is the name of a specific command line tool used to connect to >> a PostgreSQL database server, it is not the name of the database itself. >> The database is usually abbreviated "pgsql". >> >> Duly noted, thanks for the correction. >> >> >> It's the standby that has not seen any traffic? >> >> There's really no traffic. I built three nodes, connected them, created >> a test table and inserted some values, then left the cluster be for a >> couple weeks. >> >> >> And "restartpoint" is usually spelled as one work in technical >> discussions of it. Or at least, searching for it that way avoids finding >> things which mention each word separately in different senses. >> >> Ah ha. I had seen it that way but thought it was a typo. Thanks for the >> clarification! >> >> >> Are you sure it is the archive >> directory (/hab/svc/postgresql/data/archive) which is filling up, and not >> the live directory (pg_wal or pg_xlog)? This is often a point of confusion. >> >> Right before I sent the mail last week I deleted everything in >> /hab/svc/postgresql/data/archive, this morning I'm seeing: >> >> # du -h --max=1 /hab/svc/postgresql/data/ >> 198M /hab/svc/postgresql/data/pgdata >> 8.9G /hab/svc/postgresql/data/archive >> 9.1G /hab/svc/postgresql/data/ >> # du -hs /hab/svc/postgresql/data/pgdata/pg_xlog/ >> 177M /hab/svc/postgresql/data/pgdata/pg_xlog/ >> # ls -lah /hab/svc/postgresql/data/archive/ | wc -l >> 571 >> >> There is no pg_wal directory though (should there be?) >> >> # find /hab/svc/postgresql/ -name '*pg*wal*' >> # >> >> >> If the only reason you want an archive is for replication, then use >> streaming replication and do away with the archive completely >> >> To be honest, I thought it was required for streaming replication based >> on the guides linked above. >> >> >> There are reasons other than replication that one might want to keep a >> WAL archive, but those reasons don't seem to apply to you >> >> Like backup maybe? A wholly other topic, we recently had a power outage >> and I lost a pgsql node... having an archive would allow me to "replay" any >> transactions? >> >> >> What needs to be determined here is why the standby never consumed >> the WAL's from the master? >> Ok, so it the standby that's the problem. >> >> >> Do you still have the logs from the standby and do they show anything >> relevant? >> >> Sure, what am I looking for? I see a bunch of entries like: >> >> 2019-03-08 17:06:11 GMT [1813]: [815-1] user=,db=,client= (0:00000)LOG: >> restartpoint complete: wrote 22 buffers (0.0%); 0 transaction log file(s) >> added, 0 removed, 1 recycled; write=2.211 s, sync=0.062 s, total=2.281 s; >> sync files=18, longest=0.062 s, average=0.003 s; distance=16383 kB, >> estimate=16383 kB >> 2019-03-08 17:06:11 GMT [1813]: [816-1] user=,db=,client= (0:00000)LOG: >> recovery restart point at 0/8D000028 >> >> On the 15th, around when I think I filled the disk, I see a bunch of: >> >> cp: cannot stat '/00000002.history': No such file or directory >> cp: cannot stat '/000000010000000400000049': No such file or directory >> 2019-03-15 23:59:49 GMT [16691]: [1-1] user=,db=,client= >> (0:XX000)FATAL: could not connect to the primary server: could not connect >> to server: Connection refused >> Is the server running on host "172.16.10.23" and accepting >> TCP/IP connections on port 5432? >> >> Which makes sense since the pgsql service was down. >> >> This appears to be when I recovered the master on Thursday: >> >> cp: cannot stat '/00000002.history': No such file or directory >> cp: cannot stat '/00000001000000040000004D': No such file or directory >> 2019-03-21 17:37:31 GMT [31338]: [1-1] user=,db=,client= >> (0:XX000)FATAL: could not connect to the primary server: could not connect >> to server: Connection refused >> Is the server running on host "172.16.10.23" and accepting >> TCP/IP connections on port 5432? >> >> cp: cannot stat '/00000002.history': No such file or directory >> cp: cannot stat '/00000001000000040000004D': No such file or directory >> 2019-03-21 17:37:36 GMT [31343]: [1-1] user=,db=,client= (0:00000)LOG: >> started streaming WAL from primary at 4/4D000000 on timeline 1 >> 2019-03-21 17:37:47 GMT [30711]: [3151-1] user=,db=,client= >> (0:00000)LOG: restartpoint starting: time >> 2019-03-21 17:37:47 GMT [30711]: [3152-1] user=,db=,client= >> (0:00000)LOG: restartpoint complete: wrote 0 buffers (0.0%); 0 transaction >> log file(s) added, 0 removed, 1 recycled; write=0.003 s, sync=0.000 s, >> total=0.007 s; sync files=0, longest=0.000 s, average=0.000 s; >> distance=16384 kB, estimate=16384 kB >> >> Then there's a bunch of the same entries where about the only thing >> (other than the timestamp) that seems to change is the index in: >> >> 2019-03-23 23:33:40 GMT [30711]: [5094-1] user=,db=,client= >> (0:00000)LOG: recovery restart point at 5/90000140 >> 2019-03-23 23:38:40 GMT [30711]: [5097-1] user=,db=,client= >> (0:00000)LOG: recovery restart point at 5/91000028 >> 2019-03-23 23:43:40 GMT [30711]: [5100-1] user=,db=,client= >> (0:00000)LOG: recovery restart point at 5/91000108 >> >> I see effectively the same logs on the other hotstandby pgsql node. >> >> This is my config on my standby node: >> >> listen_addresses = '0.0.0.0' >> port = 5432 >> max_connections = 100 >> external_pid_file = '/hab/svc/postgresql/var/postgresql.pid' >> authentication_timeout = 1min >> max_files_per_process = 1000 >> max_locks_per_transaction = 64 >> logging_collector = on >> log_directory = '/hab/svc/postgresql/var/pg_log' >> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' >> log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,client=%h %r (%x:%e)' >> log_min_messages = ERROR >> datestyle = 'iso, mdy' >> default_text_search_config = 'pg_catalog.english' >> data_directory = '/hab/svc/postgresql/data/pgdata' >> hba_file = '/hab/svc/postgresql/config/pg_hba.conf' >> wal_level = hot_standby >> wal_log_hints = 'on' >> hot_standby = 'on' >> hot_standby_feedback = true >> max_wal_senders = 5 >> max_replication_slots = 5 >> checkpoint_completion_target = 0.9 >> max_wal_size = 1GB >> min_wal_size = 128MB >> wal_keep_segments = 8 >> log_checkpoints = on >> log_lock_waits = on >> log_temp_files = 0 >> log_autovacuum_min_duration = 0 >> track_activity_query_size = 2048 >> track_io_timing=on >> dynamic_shared_memory_type = 'none' >> archive_mode = 'on' >> archive_command = 'cp %p /hab/svc/postgresql/data/archive/%f' >> archive_timeout = '10min' >> max_standby_archive_delay = '30s' >> synchronous_commit = local >> include '/hab/svc/postgresql/config/postgresql.local.conf' >> >> Should I disable archive mode? Even though I'm not currently using it, >> it seems like there's a use-case for having it? And if I can configure out >> what causing the backup of archive files it should still be manageable? >> >> Thanks again for all the replies, while it hasn't solved the problem yet, >> this was incredibly helpful! Also, please don't hate me for munging all >> your replies into one reply... I thought it might be easier to follow than >> having three different branches...? >> >> Best Regards, >> - QBR >> >> On Fri, Mar 22, 2019 at 7:14 PM Michael Paquier <mich...@paquier.xyz> >> wrote: >> >>> On Fri, Mar 22, 2019 at 12:26:33PM -0400, Jeff Janes wrote: >>> > archive_cleanup_command is pretty much obsolete. The modern way to do >>> this >>> > is with streaming replication, using either replication slots or >>> > wal_keep_segments. If the only reason you want an archive is for >>> > replication, then use streaming replication and do away with the >>> archive >>> > completely. There are reasons other than replication that one might >>> want >>> > to keep a WAL archive, but those reasons don't seem to apply to you. >>> And >>> > if they did you almost certainly wouldn't want to run >>> > archive_cleanup_command on it. >>> >>> Personally, I still find archives also very valuable when a standby >>> creation takes a long time because of a large initial base backup and >>> that the partition dedicated to pg_wal is not large enough to support >>> the retention associated with a slot, and it is easier to have larger >>> retention policies in the archives. >>> -- >>> Michael >>> >> -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.