Re: Incredibly slow queries on information_schema.constraint_column_usage?
Hello Viktor, There was a known slowness issue in this view. It was fixed in PG 10: https://www.postgresql.org/message-id/flat/2d533e5b-687a-09fa-a772-dac9e6cf9...@imap.cc#2d533e5b-687a-09fa-a772-dac9e6cf9...@imap.cc You can try to use a solution from there to create a faster view that returns the same data as information_schema.constraint_column_usage. I wouldn't be surprised if it was safe to replace information_schema.constraint_column_usage itself, but I cannot guarantee it. Best, Alex
Re: partition table query allocate much memory
Hello Tao, I'm not sure it was a bug and I also cloud not explain why it allocated so much memory.Dosn't each sub partition table allocated the size of work_mem memory and not free it? It can, and it did it for hashed subPlan at least in PG 9.4, see https://www.slideshare.net/AlexeyBashtanov/postgresql-and-ram-usage/41 Generally, work_mem is per operation, not per query -- that's not a bug When work_mem=1GB or more,the query plan is a HashAggregate.otherwise it was Unique and running on parallel mode. I would appreciate if you could send full plans with ANALYZE. I suspect it's hashed subPlan hashing goodsh.gid separately for each partition, but maybe something else. NOT IN is generally tricky, both semantically and for planning, rewriting it into NOT EXISTS or LEFT JOIN may change the plan dramatically. Best, Alexey
Re: after restore the size of the database is increased
Hi all, this should be trivial, but if I dump and restore the very same database the restored one is bigger than the original one. I did vacuumed the database foo, then dumped and restored into bar, and the latter, even when vacuumed, remains bigger then the original one. No other activity was running on the cluster. I can also think of toast data rearranged differently after dump-restore, accidentally in such a way that it's packed into pages more efficiently. Not that the probability of such a behavior is very high though. Best regards, Alexey
Re: Function for Exception Logging
Even more difficult in PG functions as they have no commit / rollback capability. I haven't played with stored procedures in in PG11 yet. You can simulate oracle autonomous transaction feature in postgres by connecting to the same db using dblink. As for implicit passing of error parameters, I don't think it's possible. Best, Alex
Re: Question about index on different tablespace and rebuild it
Yes, I know. My question is: Did the pg server will start at all if the NVME / table space somehow is broken and indexes is unable to be loaded, not how to drop an index. Since the Postgre server is not starting at all maybe I can try dropping my indexes on my pocket calculator all day long. When the data is separated, is there any other tool that can rebuild my indexes located on broken tablespace, having of course in mind, that Data in the index is stored separately from the table. I guess the pg server wont start and for that reason I try to explain and wonder if the data and index is separated probably pg devs have a way to rebuild these indexes somehow when pg server is offline. Unfortunately it doesn't sound like something feasible to me, at least without deep hacking of postgres itself. To start after a crash, the server needs to apply the WAL records since last checkpoint, and they need data files to be binary the same as before the crash. So the index file not only needs to be there and not too corrupted, but should have its pages and their contents to be located physically in the same order, which depends on the prior history of the table updates. Postgres manual explicitly warns that all the tablespaces need to be preserved in order to started the server. The only possible exclusion may be temp_tablespaces and unlogged and temporary tables -- kind of grey zone, people do it but the manual don't allow this trick. See also https://blog.2ndquadrant.com/postgresql-no-tablespaces-on-ramdisks/ Best regards, Alexey
Re: debugging intermittent slow updates under higher load
The table has around 1.5M rows which have been updated/inserted around 121M times, the distribution of updates to row in alerts_alert will be quite uneven, from 1 insert up to 1 insert and 0.5M updates. Under high load (200-300 inserts/updates per second) we see occasional (~10 per hour) updates taking excessively long times (2-10s). These updates are always of the form: UPDATE "alerts_alert" SET ...bunch of fields... WHERE "alerts_alert"."id" = '...sha1 hash...'; Here's a sample explain: https://explain.depesz.com/s/Fjq8 What could be causing this? What could we do to debug? What config changes could we make to alleviate this? Hello Chris, One of the reasons could be the row already locked by another backend, doing the same kind of an update or something different. Are these updates performed in a longer transactions? Can they hit the same row from two clients at the same time? Is there any other write or select-for-update/share load on the table? Have you tried periodical logging of the non-granted locks? Try querying pg_stat_activity and pg_locks (possibly joined and maybe repeatedly self-joined, google for it) to get the backends that wait one for another while competing for to lock the same row or object. Best, Alex
Re: debugging intermittent slow updates under higher load
Is there any existing tooling that does this? There must be some, google for queries involving pg_locks I'm loath to start hacking something up when I'd hope others have done a better job already... If you log all queries that take more than a second to complete, is your update the only one logged, or something (the would-be blocker) gets logged down together with it?
Re: Array_agg and dimensions in Array
Hi Mike, I have come across a problem which I cant seem to solve in a nice way Basically I have a (small) table of tags What I need to is combine two concatenated fields with a literal value as an array element. You can create a custom aggregate function like this: alexey@[local]/alexey=# create aggregate array_cat_agg(anyarray) (SFUNC = array_cat, STYPE = anyarray, COMBINEFUNC = array_cat, PARALLEL = SAFE); CREATE AGGREGATE And use it like this: alexey@[local]/alexey=# select grp, array_cat_agg(array['--foo', bar || '=' || baz]) from (values ('g1', 'a', 'b'), ('g1', 'c', 'd'), ('g2', 'e', 'f')) _ (grp, bar, baz) group by grp; ┌─┬───┐ │ grp │ array_cat_agg │ ├─┼───┤ │ g2 │ {--foo,e=f} │ │ g1 │ {--foo,a=b,--foo,c=d} │ └─┴───┘ (2 rows) Is that what you need? Best, Alex
pg_rewind
Hi, I'm trying to get my get my head around pg_rewind. Why does it need full_page_writes and wal_log_hints on the target? As far as I could see it only needs old target WAL to see what pages have been touched since the last checkpoint before diverge point. Why can't it get this data from partial pages? Best, Alex
Streaming replication: PANIC on tertiary when secondary promoted
Hi, I had a cascade serverA->serverB->serverC->serverD of Postgres 10.14 servers connected with streaming replication. There was no archive shipping set up, but there was an empty directory /data/pg_archive/10/dedupe_shard1_10/ mentioned in config for it on each of the servers. When I promoted serverB, serverC crashed: Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[43934]: [7-1] 2021-06-16 14:45:43.717 UTC [43934] {-} LOG: replication terminated by primary server Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[43934]: [7-2] 2021-06-16 14:45:43.717 UTC [43934] {-} DETAIL: End of WAL reached on timeline 1 at 190B0/3600. Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[43934]: [8-1] 2021-06-16 14:45:43.717 UTC [43934] {-} LOG: fetching timeline history file for timeline 2 from primary server cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002.history': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0003.history': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002.history': No such file or directory Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[33222]: [9-1] 2021-06-16 14:45:43.736 UTC [33222] {-} LOG: new target timeline is 2 cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002000190B00036': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0001000190B00036': No such file or directory Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[43934]: [9-1] 2021-06-16 14:45:43.746 UTC [43934] {-} LOG: restarted WAL streaming at 190B0/3600 on timeline 2 Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[43934]: [10-1] 2021-06-16 14:45:43.746 UTC [43934] {-} FATAL: could not receive data from WAL stream: ERROR: requested starting point 190B0/3600 is ahead of the WAL flush position of this server 190B0/35E8 cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0003.history': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002000190B00036': No such file or directory Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[93365]: [7-1] 2021-06-16 14:45:43.764 UTC [93365] {-} LOG: started streaming WAL from primary at 190B0/3600 on timeline 2 Jun 16 14:45:43 serverC postgresql-10-dedupe_shard1_10[93365]: [8-1] 2021-06-16 14:45:43.764 UTC [93365] {-} FATAL: could not receive data from WAL stream: ERROR: requested starting point 190B0/3600 is ahead of the WAL flush position of this server 190B0/35E8 cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0003.history': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002000190B00036': No such file or directory Jun 16 14:45:48 serverC postgresql-10-dedupe_shard1_10[93421]: [7-1] 2021-06-16 14:45:48.771 UTC [93421] {-} LOG: started streaming WAL from primary at 190B0/3600 on timeline 2 Jun 16 14:45:48 serverC postgresql-10-dedupe_shard1_10[33222]: [10-1] 2021-06-16 14:45:48.792 UTC [33222] {-} LOG: invalid contrecord length 1585 at 190B0/35B8 Jun 16 14:45:48 serverC postgresql-10-dedupe_shard1_10[93421]: [8-1] 2021-06-16 14:45:48.793 UTC [93421] {-} FATAL: terminating walreceiver process due to administrator command cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0003.history': No such file or directory cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002000190B00035': No such file or directory Jun 16 14:45:48 serverC postgresql-10-dedupe_shard1_10[33222]: [11-1] 2021-06-16 14:45:48.803 UTC [33222] {-} PANIC: could not open file "pg_wal/0002000190B00035": No such file or directory After I started it it seems to have caught up: Jun 16 14:59:09 serverB postgresql-10-dedupe_shard1_10[100187]: [8-1] 2021-06-16 14:59:09.826 UTC [100187] {-} LOG: consistent recovery state reached at 190B0/35B8 Jun 16 14:59:09 serverB postgresql-10-dedupe_shard1_10[100172]: [5-1] 2021-06-16 14:59:09.827 UTC [100172] {-} LOG: database system is ready to accept read only connections cp: cannot stat '/data/pg_archive/10/dedupe_shard1_10/0002000190B00036': No such file or directory Jun 16 14:59:10 serverB postgresql-10-dedupe_shard1_10[100187]: [9-1] 2021-06-16 14:59:10.281 UTC [100187] {-} LOG: invalid record length at 190B0/36000898: wanted 24, got 0 Jun 16 14:59:10 serverB postgresql-10-dedupe_shard1_10[101331]: [6-1] 2021-06-16 14:59:10.292 UTC [101331] {-} LOG: started streaming WAL from primary at 190B0/3600 on timeline 2 Jun 16 14:59:10 serverB postgresql-10-dedupe_shard1_10[101332]: [6-1] 2021-06-16 14:59:10.332 UTC [101332] {[unknown]-[local]} [unknown]@[unknown] LOG: incomplete startup packet Jun 16 14:59:10 serverB postgresql-10-dedupe_shard1_10[101334]: [6-1] 2021-06-16 14:59:10.508 UTC [101334] {walreceiver-10.1.10.12(43648)} replication@[unknown] ERROR: requested star
Re: Streaming replication: PANIC on tertiary when secondary promoted
I had it "latest" as well. I'll try to reproduce it again tomorrow. On 16/06/2021 17:20, Vijaykumar Jain wrote: What is your recovery_target_timeline set to on replicas ? I just did a primary -> replica -> cascading replica setup. and then promoted replica as new primary. cascading replica was working fine, no restarts required. for me recovery_target_timeline was set to 'latest' i have pg14beta installed btw. initdb -D primary The files belonging to this database system will be owned by user "postgres". This user must also own the server process. postgres@db:~/playground$ pg_ctl -D primary -l logfile start waiting for server to start done server started postgres@db:~/playground$ psql psql (14beta1) Type "help" for help. postgres=# select pg_create_physical_replication_slot('replica'); pg_create_physical_replication_slot - (replica,) (1 row) postgres=# create table t(id int primary key); CREATE TABLE postgres=# insert into t select x from generate_series(1, 100) x; checkpoint; INSERT 0 100 postgres=# \q -- create a replica postgres@db:~/playground$ pg_basebackup -D replica -R -X stream -S replica -v -d "dbname=postgres port=5432" -U postgres pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/228 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: write-ahead log end point: 0/2000138 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed postgres@db:~/playground$ vim replica/postgresql.conf --start the replica (port 5433) postgres@db:~/playground$ pg_ctl -D replica -l replicalog start waiting for server to start done server started postgres@db:~/playground$ psql -p 5433 -c 'select count(1) from t;' count --- 100 (1 row) postgres@db:~/playground$ psql -p 5433 psql (14beta1) Type "help" for help. -- create a replica slot for cascading streaming replication postgres=# select pg_create_physical_replication_slot('cascading_replica'); pg_create_physical_replication_slot - (cascading_replica,) (1 row) postgres=# \q -- create a cascading replica off replica postgres@db:~/playground$ pg_basebackup -D cascading_replica -R -X stream -S cascading_replica -v -d "dbname=postgres port=5433" -U postgres pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/328 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: write-ahead log end point: 0/3D8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed postgres@db:~/playground$ vim cascading_replica/postgresql.conf postgres@db:~/playground$ pg_ctl -D cascading_replica -l creplica start waiting for server to start done server started -- validate receiving data fine. postgres@db:~/playground$ psql -p 5434 -c 'select count(1) from t;' count --- 100 (1 row) -- stop primary postgres@db:~/playground$ pg_ctl -D primary -l logfile stop waiting for server to shut down done server stopped -- promote replica to new primary postgres@db:~/playground$ psql -p 5433 psql (14beta1) Type "help" for help. postgres=# select pg_promote(); pg_promote t (1 row) postgres=# select pg_is_in_recovery(); pg_is_in_recovery --- f (1 row) postgres=# \q --do some dml, validate changes replayed to new replica. postgres@db:~/playground$ psql -p 5434 -c 'select count(1) from t;' count --- 100 (1 row) postgres@db:~/playground$ psql -p 5433 -c 'select count(1) from t;' count --- 100 (1 row) postgres@db:~/playground$ psql -p 5433 -c 'delete from t where id < 50;' DELETE 49 postgres@db:~/playground$ psql -p 5433 -c 'select count(1) from t;' count --- 51 (1 row) postgres@db:~/playground$ psql -p 5434 -c 'select count(1) from t;' count --- 51 (1 row) in all my cases. recovery_timeline was set to 'latest'. i did not rx any panic messages in logs.
Re: Streaming replication: PANIC on tertiary when secondary promoted
On 16/06/2021 20:31, Alexey Bashtanov wrote: I had it "latest" as well. I'll try to reproduce it again tomorrow. replica -v -d "dbname=postgres port=5432" -U postgres I cannot quite reproduce it artificially. One more piece of detail: in the chain serverA->serverB->serverC->serverD when serverB was promoted and serverC crashed I had wal_compression enabled on serverA only. Best regards, Alex