Diffs in PG output vs WAL
I am playing with pgoutput and logical replication. With wal2json, I could capture the old values before the row was updated. I am unable to figure out how to do that with pgoutput. Does pgoutput even support this? Relevant info from wal2json: > Also, UPDATE/DELETE old row versions can be accessed depending on the configured replica identity > All of the new/old tuples are available in the JSON object. https://github.com/eulerto/wal2json I want new/old tuples with pgoutput. how? -- Avinash
FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication
Hi Team, i have installed postgresql 13 using packages/tried same using lower versions as well, but replication not working, getting below error. 2020-12-19 02:48:22.346 PST [3574] FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference 2020-12-19 02:48:22.350 PST [3575] FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference 2020-12-19 02:48:27.362 PST [3576] FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference 2020-12-19 02:48:32.366 PST [3580] FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file libcrypto.so.10 with link time reference even though i have exported LD_LIBRARY_PATH and created link also. [root@localhost log]# which libpqwalreceiver.so /usr/bin/libpqwalreceiver.so [root@localhost log]# which libpq.so.5 /usr/bin/libpq.so.5 [postgres@localhost ~]$ export LD_LIBRARY_PATH=$PATH:/usr/pgsql-13/lib/ [postgres@localhost ~]$ echo $LD_LIBRARY_PATH /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/postgres/.local/bin:/home/postgres/bin:/usr/pgsql-13/lib/ every version has the same issue with packages, please help me to resolve this issue.
Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication
Hi Team, below are the os versions details. standalone machine running without any issues, coming to replication not working, even i tried from postgresql 10 onwards as well, no other versions running in the server now. [root@node2 ~]# openssl version OpenSSL 1.0.1e-fips 11 Feb 2013 [root@node2 ~]# uname -a Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux [root@node2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@node2 ~]# On Sat, Dec 19, 2020 at 9:02 PM Adrian Klaver wrote: > On 12/19/20 2:54 AM, Rambabu V wrote: > > Hi Team, > > > > i have installed postgresql 13 using packages/tried same using lower > > versions as well, but replication not working, getting below error. > > What OS and version? > > Where did you get packages? > > Do you have other versions of Postgres running on this machine? > > > > > 2020-12-19 02:48:22.346 PST [3574] FATAL: could not load library > > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: > > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in > > file libcrypto.so.10 with link time reference > > 2020-12-19 02:48:22.350 PST [3575] FATAL: could not load library > > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: > > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in > > file libcrypto.so.10 with link time reference > > 2020-12-19 02:48:27.362 PST [3576] FATAL: could not load library > > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: > > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in > > file libcrypto.so.10 with link time reference > > 2020-12-19 02:48:32.366 PST [3580] FATAL: could not load library > > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: > > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in > > file libcrypto.so.10 with link time reference > > > > even though i have exported LD_LIBRARY_PATH and created link also. > > > > [root@localhost log]# which libpqwalreceiver.so > > /usr/bin/libpqwalreceiver.so > > [root@localhost log]# which libpq.so.5 > > /usr/bin/libpq.so.5 > > > > [postgres@localhost ~]$ export LD_LIBRARY_PATH=$PATH:/usr/pgsql-13/lib/ > > [postgres@localhost ~]$ echo $LD_LIBRARY_PATH > > > /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/postgres/.local/bin:/home/postgres/bin:/usr/pgsql-13/lib/ > > > > every version has the same issue with packages, please help me to > > resolve this issue. > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication
Hi Team, below are the os versions details. standalone machine running without any issues, coming to replication not working, even i tried from postgresql 10 onwards as well. [root@node2 ~]# openssl version OpenSSL 1.0.1e-fips 11 Feb 2013 [root@node2 ~]# uname -a Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux [root@node2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@node2 ~]# On Sat, Dec 19, 2020 at 10:18 PM Tom Lane wrote: > Rambabu V writes: > > i have installed postgresql 13 using packages/tried same using lower > > versions as well, but replication not working, getting below error. > > > 2020-12-19 02:48:22.346 PST [3574] FATAL: could not load library > > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5: > > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in > file > > libcrypto.so.10 with link time reference > > Apparently this copy of libpq was built against a newer version of > OpenSSL than what you have installed locally. The man page for > X509_get_signature_nid says it was added in OpenSSL 1.0.2, in case that > helps. You need to find or build a version of Postgres that's intended > for your specific platform version. Alternatively, maybe you would be > better off moving to a newer platform version, or at least installing > a newer OpenSSL version. Anything older than 1.0.1 is considered > outright insecure these days. > > regards, tom lane >
Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication
Hi Team, below is the link im getting packages. https://www.postgresql.org/download/linux/redhat/ On Sun, Dec 20, 2020 at 6:33 AM Adrian Klaver wrote: > On 12/19/20 4:53 PM, Rambabu V wrote: > > Hi Team, > > > > below are the os versions details. standalone machine running without > > any issues, coming to replication not working, even i tried from > > postgresql 10 onwards as well, no other versions running in the server > now. > > > > [root@node2 ~]# openssl version > > OpenSSL 1.0.1e-fips 11 Feb 2013 > > [root@node2 ~]# uname -a > > Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 > > x86_64 x86_64 x86_64 GNU/Linux > > [root@node2 ~]# cat /etc/redhat-release > > Red Hat Enterprise Linux Server release 7.2 (Maipo) > > [root@node2 ~]# > > > > Where are you getting the packages? > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Sequence vs UUID
Hello, We were trying to understand whether we should use UUID or Sequence in general for primary keys. In many of the blogs(one is below) across multiple databases, I saw over the internet and all are mostly stating the sequence is better as compared to UUID. But I think in the case of concurrent data load scenarios UUID will spread the contention point whereas sequence can be a single point of contention. So we want to understand from experts here, if there are any clear rules available or if we have any pros vs cons list available for each of those to understand the exact scenario in which we should go for one over other? Basically I wanted to see if we can perform some test on sample data to see the percentage of overhead on read and write performances of the query in presence of UUID VS Sequence to draw some conclusion in general? And also considering open source postgres as the base for many databases like redshift etc, so the results which apply to progress would apply to others as well. https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/
Re: Sequence vs UUID
I have a question, As i understand here, usage wise there are multiple benefits of UUID over sequences like, in case of distributed app where we may not be able to rely on one point generator like sequences, in case of multi master architecture, sharding. If we just look in terms of performance wise, the key advantage of sequence is that for read queries, because of the storage size it will be smaller and thus it will cache more index rows and so will be beneficial during read queries and should also be beneficial even on joins because of its smaller size. Also fetching a value from sequence is cheaper than calculating the UUIDS. But the downside is during write operation, it can be a point of contention in case of concurrent data load as every incoming request will try to modify same table/index page/block. But as its mentioned in this blog ( https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state this UUID can be made sequential so even this can be sequential using prefix-timestamp etc. However isn't it that making the UUID sequential will again actually be a disadvantage and can be contention point for this unique index as each incoming write will now fight for same block/page while doing concurrent data load and will contend for the same table block or say one side of the index branch/leaf block etc, whereas in case of random UUIDs the write was spreading across multiple blocks so there was no contention on any specific blocks? Please correct if my understanding is wrong? On Sun, 29 Jan, 2023, 10:33 am Miles Elam, wrote: > On Sat, Jan 28, 2023 at 8:02 PM Ron wrote: > > > > Then it's not a Type 4 UUID, which is perfectly fine; just not random. > > Yep, which is why it really should be re-versioned to UUIDv8 to be > pedantic. In everyday use though, almost certainly doesn't matter. > > > Also, should now() be replaced by clock_timestamp(), so that it can be > > called multiple times in the same transaction? > > Not necessary. Instead of 122 bits of entropy, you get 106 bits of > entropy and a new incremented prefix every minute. now() vs > clock_timestamp() wouldn't make a substantive difference. Should still > be reasonably safe against the birthday paradox for more than a > century when creating more than a million UUIDs per second. > > >
Re: Sequence vs UUID
Was trying to test the performance for simple read/write for the bigint vs UUID. What we see is , ~3 times performance degradation while joining on bigint vs UUID columns. Also even just generation of sequence vs bigint itself is degrading by ~3times too. Also even insert performance on same table for ~10million rows is ~1min 39sec for bigint vs ~3minute 11 sec in case of UUID. Is such extent of degradation in performance this expected for UUID? CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name varchar(20) ); CREATE TABLE test2_UUID (id bigint,source_id varchar(36) PRIMARY KEY,Name varchar(20) ); CREATE TABLE test1_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20)); CREATE TABLE test2_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20)); Loaded same 10million rows. explain Analyze select * from test1_bigint a , test2_bigint b where a.id = b.id Merge Join (cost=12.31..875534.52 rows=1021 width=100) (actual time=0.042..6974.575 rows=1000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using test1_bigint_pkey on test1_bigint a (cost=0.43..362780.75 rows=1021 width=50) (actual time=0.020..2070.079 rows=1000 loops=1) -> Index Scan using test2_bigint_2_pkey on test2_bigint b (cost=0.43..362780.75 rows=1021 width=50) (actual time=0.019..2131.086 rows=1000 loops=1) Planning Time: 0.207 ms Execution Time: 7311.210 ms set enable_seqscan=off; explain Analyze select * from test1_UUID a , test2_UUID b where a.source_id = b.source_id; Merge Join (cost=2.75..2022857.05 rows=1021 width=100) (actual time=0.043..21954.213 rows=1000 loops=1) Merge Cond: ((a.source_id)::text = (b.source_id)::text) -> Index Scan using test1_uuid_pkey on test1_UUID a (cost=0.56..936420.18 rows=1021 width=50) (actual time=0.022..7854.143 rows=1000 loops=1) -> Index Scan using test2_uuid_2_pkey on test2_UUID b (cost=0.56..936437.90 rows=1021 width=50) (actual time=0.017..7971.187 rows=1000 loops=1) Planning Time: 0.516 ms Execution Time: 22292.801 ms ** create sequence myseq cache 32767; select count(nextval('myseq') ) from generate_series(1,1000) 1 row retrieved starting from 1 in 4 s 521 ms (execution: 4 s 502 ms, fetching: 19 ms) select count(gen_random_uuid()) from generate_series(1,1000) 1 row retrieved starting from 1 in 11 s 145 ms (execution: 11 s 128 ms, fetching: 17 ms) On Mon, 30 Jan, 2023, 4:59 pm veem v, wrote: > I have a question, As i understand here, usage wise there are multiple > benefits of UUID over sequences like, in case of distributed app where we > may not be able to rely on one point generator like sequences, in case of > multi master architecture, sharding. > > If we just look in terms of performance wise, the key advantage of > sequence is that for read queries, because of the storage size it will be > smaller and thus it will cache more index rows and so will be beneficial > during read queries and should also be beneficial even on joins because of > its smaller size. Also fetching a value from sequence is cheaper than > calculating the UUIDS. But the downside is during write operation, it can > be a point of contention in case of concurrent data load as every incoming > request will try to modify same table/index page/block. But as its > mentioned in this blog ( > https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state > this UUID can be made sequential so even this can be sequential using > prefix-timestamp etc. However isn't it that making the UUID sequential will > again actually be a disadvantage and can be contention point for this > unique index as each incoming write will now fight for same block/page > while doing concurrent data load and will contend for the same table block > or say one side of the index branch/leaf block etc, whereas in case of > random UUIDs the write was spreading across multiple blocks so there was no > contention on any specific blocks? Please correct if my understanding is > wrong? > > > > On Sun, 29 Jan, 2023, 10:33 am Miles Elam, > wrote: > >> On Sat, Jan 28, 2023 at 8:02 PM Ron wrote: >> > >> > Then it's not a Type 4 UUID, which is perfectly fine; just not random. >> >> Yep, which is why it really should be re-versioned to UUIDv8 to be >> pedantic. In everyday use though, almost certainly doesn't matter. >> >> > Also, should now() be replaced by clock_timestamp(), so that it can be >> > called multiple times in the same transaction? >> >> Not necessary. Instead of 122 bits of entropy, you get 106 bits of >> entropy and a new incremented prefix every minute. now() vs >> clock_timestamp() wouldn't make a substantive difference. Should still >> be reasonably safe against the birthday paradox for more than a >> century when creating more than a million UUIDs per second. >> >> >>
Re: Sequence vs UUID
Thank You So much for the details. I am a bit new to postgres. And these test results I picked were from a dev system. If I understand it correctly, do you mean these settings(usage of C locale or "native" 16-byte uuid) which you mentioned should be there in a production system and thus we should test the performance of the UUID vs sequence on a similar setup? Or say if this sort of degradation of UUID performance is not expected then , how to get these settings tweaked ON, so as to see the best string type or UUID performance, can you please guide me here? On Mon, 30 Jan 2023 at 22:18, Tom Lane wrote: > Dominique Devienne writes: > > On Mon, Jan 30, 2023 at 5:11 PM veem v wrote: > >> CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, > Name varchar(20) ); > > > Maybe if you used a "native" 16-byte uuid, instead of its textual > > serialization with dashes (36 bytes + length overhead), the gap would > > narrow. > > Yeah, especially if your database is not using C locale. The > strcoll or ICU-based comparisons done on string types can be > enormously more expensive than the memcmp() used for binary > types like native uuid. > > regards, tom lane >
Re: Sequence vs UUID
I tried to test quickly below on dbfiddle, below with the UUID as data type and in each of the below cases the UUID performance seems drastically reduced as compared to sequence performance. Let me know if anything is wrong in my testing here? 1) sequence generation vs UUID generation, execution time increased from ~291ms to 5655ms. 2) Insert performance of "sequence" vs "UUID" execution time increased from ~2031ms to 10599ms. 3) Index performance for sequence vs UUID, execution time increased from ~.3ms to .5ms. PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY); CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY); create sequence myseq cache 32767; sequence generation vs UUID generation Test** explain analyze select count(nextval('myseq') ) from generate_series(1,10); QUERY PLAN Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual time=291.030..291.030 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..1000.00 rows=10 width=0) (actual time=53.332..63.941 rows=10 loops=1) Planning Time: 0.155 ms Execution Time: 291.719 ms explain analyze select count(gen_random_uuid()) from generate_series(1,10); QUERY PLAN Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual time=5654.453..5654.454 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..1000.00 rows=10 width=0) (actual time=84.328..514.214 rows=10 loops=1) Planning Time: 0.082 ms Execution Time: 5655.158 ms *** Insert Test*** explain analyze INSERT INTO test_bigint select nextval('myseq') from generate_series(1,10); QUERY PLAN Insert on test_bigint (cost=0.00..2250.00 rows=0 width=0) (actual time=2030.960..2030.961 rows=0 loops=1) -> Function Scan on generate_series (cost=0.00..1250.00 rows=10 width=8) (actual time=48.102..636.311 rows=10 loops=1) Planning Time: 0.065 ms Execution Time: 2031.469 ms explain analyze INSERT INTO test_uuid select gen_random_uuid() from generate_series(1,10); QUERY PLAN Insert on test_uuid (cost=0.00..2250.00 rows=0 width=0) (actual time=10599.230..10599.230 rows=0 loops=1) -> Function Scan on generate_series (cost=0.00..1250.00 rows=10 width=16) (actual time=36.975..6289.811 rows=10 loops=1) Planning Time: 0.056 ms Execution Time: 10599.805 ms Index performance Explain analyze select * from test_bigint where id in (select id from test_bigint limit 100); QUERY PLAN Nested Loop (cost=2.98..734.71 rows=100 width=8) (actual time=0.083..0.269 rows=100 loops=1) -> HashAggregate (cost=2.69..3.69 rows=100 width=8) (actual time=0.046..0.062 rows=100 loops=1) Group Key: test_bigint_1.id Batches: 1 Memory Usage: 24kB -> Limit (cost=0.00..1.44 rows=100 width=8) (actual time=0.011..0.025 rows=100 loops=1) -> Seq Scan on test_bigint test_bigint_1 (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.017 rows=100 loops=1) -> Index Only Scan using test_bigint_pkey on test_bigint (cost=0.29..7.31 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=100) Index Cond: (id = test_bigint_1.id) Heap Fetches: 100 Planning Time: 0.279 ms Execution Time: 0.302 ms Explain analyze select * from test_uuid where id in (select id from test_uuid limit 100); QUERY PLAN Nested Loop (cost=3.21..783.31 rows=100 width=16) (actual time=0.080..0.474 rows=100 loops=1) -> HashAggregate (cost=2.79..3.79 rows=100 width=16) (actual time=0.046..0.066 rows=100 loops=1) Group Key: test_uuid_1.id Batches: 1 Memory Usage: 24kB -> Limit (cost=0.00..1.54 rows=100 width=16) (actual time=0.010..0.025 rows=100 loops=1) -> Seq Scan on test_uuid test_uuid_1 (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.009..0.016 rows=100 loops=1) -> Index Only Scan using test_uuid_pkey on test_uuid (cost=0.42..7.79 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=100) Index Cond: (id = test_uuid_1.id) Heap Fetches: 100 Planning Time: 0.180 ms Execution Time: 0.510 ms On Tue, 31 Jan 2023 at 03:28, Ron wrote: > > And populate that column with UUIDs generated by the gen_random_uuid() > function. > > (Requires v13.) > > On 1/30/23 13:46, Adrian Klaver wrote: > > On 1/30/23 11:43, veem v wrote: > >> Thank You So much for the details. I am a bit new to postgres. And > these > >> test results I picked were from a dev system. If I understand it > >> correctly, do you mean these settings(usage of C locale or "native" > >> 16-byte uuid) which you mentioned should be there in a production > system > >> and thus we should test the performance of the UUID vs seque
Re: Sequence vs UUID
1..3.355 rows=1000 loops=1) -> HashAggregate (cost=27.91..29.91 rows=200 width=16) (actual time=0.399..0.556 rows=1000 loops=1) Group Key: test_uuid_1.id Batches: 1 Memory Usage: 145kB -> Limit (cost=0.00..15.41 rows=1000 width=16) (actual time=0.011..0.185 rows=1000 loops=1) -> Seq Scan on test_uuid test_uuid_1 (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.010..0.093 rows=1000 loops=1) -> Index Only Scan using test_uuid_pkey on test_uuid (cost=0.42..7.21 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000) Index Cond: (id = test_uuid_1.id) Heap Fetches: 1000 Planning Time: 0.234 ms Execution Time: 3.419 ms Explain analyze select * from test_uuid7 where id in (select id from test_uuid7 limit 1000); QUERY PLAN Nested Loop (cost=28.32..1416.01 rows=1000 width=16) (actual time=0.403..2.586 rows=1000 loops=1) -> HashAggregate (cost=27.91..29.91 rows=200 width=16) (actual time=0.371..0.546 rows=1000 loops=1) Group Key: test_uuid7_1.id Batches: 1 Memory Usage: 145kB -> Limit (cost=0.00..15.41 rows=1000 width=16) (actual time=0.011..0.161 rows=1000 loops=1) -> Seq Scan on test_uuid7 test_uuid7_1 (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.010..0.091 rows=1000 loops=1) -> Index Only Scan using test_uuid7_pkey on test_uuid7 (cost=0.42..6.99 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000) Index Cond: (id = test_uuid7_1.id) Heap Fetches: 1000 Planning Time: 0.101 ms Execution Time: 2.661 ms On Thu, 2 Feb 2023 at 20:52, Benedict Holland wrote: > Well... until two processes generate an identical UUID. That happened to > me several times. It's rare but when that happens, oh boy that is a mess to > figure out. > > Thanks, > Ben > > On Thu, Feb 2, 2023, 10:17 AM Miles Elam > wrote: > >> On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak wrote: >> >>> >>> >>> On Wed, Feb 1, 2023 at 1:34 PM veem v wrote: >>> >>>> >>>> 1) sequence generation vs UUID generation, execution time increased >>>> from ~291ms to 5655ms. >>>> 2) Insert performance of "sequence" vs "UUID" execution time increased >>>> from ~2031ms to 10599ms. >>>> 3) Index performance for sequence vs UUID, execution time increased >>>> from ~.3ms to .5ms. >>>> >>>> >>> Yes, assuming that UUIDs would be efficient as keys when they are >>> randomly generated, versus sequences (which tend to expand in one >>> direction, and have been relatively optimized for years). >>> >>> This article explains in detail what is going on. If I were doing this, >>> I would strongly consider a ULID because of getting the best of both worlds. >>> https://blog.daveallie.com/ulid-primary-keys >>> >>> Of course, YMMV... And since ULIDs are not native to PG, there is >>> overhead, but it is far more performant, IMO... >>> >> >> Biased comparison. ULIDs have a timestamp component. The closest UUID >> equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4. >> Another difference not mentioned in the blog article is that UUID is >> versioned, meaning you can figure out what kind of data is in the UUID, >> whereas ULIDs are a "one size fits all" solution. >> >> There is an implementation of sequential UUIDs for Postgres I posted >> earlier in this thread. In addition, here is an implementation of UUIDv7 >> for Postgres: >> >> https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74 >> >> I would suggest running your tests against v1, v7, and sequential UUID >> before jumping on ULID, which has no native type/indexing in Postgres. >> >> It should also be noted that apps cannot provide a bigint ID due to >> collisions, but an app can generate UUIDs and ULIDs without fear, >> essentially shifting the generation time metric in UUID/ULID's favor over a >> bigserial. >> >> - Miles >> >> >>
Re: Sequence vs UUID
Actually I did the testing by connecting to "https://dbfiddle.uk/"; postgres version -15. PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit Am I doing it wrong, please confirm? On Fri, 3 Feb 2023 at 21:28, Dominique Devienne wrote: > Copying the list... > > -- Forwarded message - > From: Dominique Devienne > Date: Fri, Feb 3, 2023 at 4:57 PM > Subject: Re: Sequence vs UUID > To: veem v > > On Thu, Feb 2, 2023 at 8:47 PM veem v wrote: > >> Tested the UUIDv7 generator for postgres as below. >> With regards to performance , It's still way behind the sequence. [...] >> explain analyze select count(nextval('myseq') ) from >> generate_series(1,10); >> Execution Time: 59.687 ms >> >> explain analyze select count(gen_random_uuid()) from >> generate_series(1,100'000); >> Execution Time: 904.868 ms >> >> explain analyze select count(uuid_generate_v7()) from >> generate_series(1,10); >> Execution Time: 1711.187 ms >> > > Something's off regarding Guid generations IMHO... > > You generate 100K Guids in ~1s. While we generate (in C++, Windows > Release, using Boost) 16M of them in +/- the same time: > > Enabling Performance tests >>>> >>> > >> generate 16'000'000 guids in 0.980s (user: 0.984s) 12 MB >>>> >>> generate 16'000'000 guids in parallel on 4 CPUs in 0.309s (user: 1.188s) >>>> 12 MB >>>> >>> > That's 2 orders of magnitude faster. Sure there's some overhead from the > SQL, but still. Something seems fishy. > And that's on a 2.5y old desktop. --DD >
Re: Sequence vs UUID
So, it may be the machine on which the code is getting executed behind the scene , in the site "https://dbfiddle.uk/"; is playing a key role in the speed, however, the comparative performance of UUID vs sequence should stay the same. So I think, after this test we can safely conclude that if we compare the performance of the UUID(both version-4, version 7) VS sequence. The UUID performs a lot worse as compared to sequence. So unless there exists some strong reason/justification for UUID, we should default use the sequence. Correct me if I'm wrong. And also I understand the cases of multi master replication/sharding etc, may be a factor but other than that I can't think of any scenario where sequences can be used. On Fri, 3 Feb 2023 at 23:07, Dominique Devienne wrote: > On Fri, Feb 3, 2023 at 5:48 PM veem v wrote: > >> Actually I did the testing by connecting to "https://dbfiddle.uk/"; >> postgres version -15. >> >> PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 >> 20210514 (Red Hat 8.5.0-10), 64-bit >> >> Am I doing it wrong, please confirm? >> >> > No clue. But even SQLite can generate 1M uuid (16 bytes random blobs are > equivalent) is 1/3 of the time, so 30x faster. > So your timings of generating 100K uuids and counting them seems way too > slow to me. --DD > > sqlite> select count(randomblob(16)) from generate_series(1,1000*1000); > QUERY PLAN > `--SCAN generate_series VIRTUAL TABLE INDEX 3: > addr opcode p1p2p3p4 p5 comment > - - -- - > 0 Init 0 1500 Start at 15 > 1 Null 0 1 10 r[1..1]=NULL > 2 VOpen 0 0 0 vtab:274D3E0 0 > 3 Integer1 4 00 r[4]=1 > 4 Multiply 6 6 50 r[5]=r[6]*r[6] > 5 Integer3 2 00 r[2]=3 > 6 Integer2 3 00 r[3]=2 > 7 VFilter0 1120 iplan=r[2] > zplan='' > 8 Function 1 8 7 randomblob(1) 0 r[7]=func(r[8]) > 9 AggStep0 7 1 count(1) 1 accum=r[1] > step(r[7]) > 10VNext 0 8 00 > 11AggFinal 1 1 0 count(1) 0 accum=r[1] N=1 > 12Copy 1 9 00 r[9]=r[1] > 13ResultRow 9 1 00 output=r[9] > 14Halt 0 0 00 > 15Transaction0 0 1 0 1 usesStmtJournal=0 > 16Integer1000 6 00 r[6]=1000 > 17Integer168 00 r[8]=16 > 18Goto 0 1 00 > ┌───┐ > │ count(randomblob(16)) │ > ├───┤ > │ 100 │ > └───┘ > Run Time: real 0.278 user 0.25 sys 0.00 > > >> On Fri, 3 Feb 2023 at 21:28, Dominique Devienne >> wrote: >> >>> Something's off regarding Guid generations IMHO... >>> You generate 100K Guids in ~1s. While we generate (in C++, Windows >>> Release, using Boost) 16M of them in +/- the same time: >>> >>
Re: Sequence vs UUID
Thank you So much all for such valuable feedback. As "Julian" was pointing, I also tried to test the INSERT independently(as in below test case) without keeping the "generate_series" in the inline query. But in all the cases sequence is performing better as compared to both UUID V4 and UUID V7. And same with Index access path observed i.e. an index on sequence performs better as compared to an index on UUID column. So i believe , its understood that the performance wise sequence is better as compared to even sequential UUID (like UUID-V7). And it seems that, even the UUID V7 maintains better locality because of its sequential nature, but still it all may tied to the higher space/memory consumption of UUID because of its bigger size as compared to sequence and thus the performance hit. But as i saw few comments in this discussion regarding the scenarios which will mandate the usage of UUID like "multi-master replication", "sharding", "Need to be able to move data between databases" Etc..So wanted to understand , why cant we use sequences as PK in these scenarios? Say for e.g. in case of multimaster replication we can use sequence someway as below.., So wanted to know from experts here, is there really exists any scenario in which UUID really cant be avoided? Sequence Number = n*d+m+offset. Where n is the sequence order number, d is the dimensions of the multi-master replication, m ranges from 0 to n-1 is the number assigned to each node in the replication, and offset is the number to offset the sequence numbers. For a 4-ways multi-master replication where m=4, y is in (0, 1, 2, 3), and offset is 100. Node #1 (m=0) :Sequence number = n*4+100 Node #2 (m=1): Sequence number = n*4+101 Node #3 (m=2): Sequence number = n*4+102 Node #4 (m=3): Sequence number = n*4+103 Each sequence will have: 100, 104, 108,112, 116, 120,... 101, 105, 109, 113, 117, 121,... 102, 106, 110, 114, 118, 122... 103, 107, 111, 115, 119, 123 * Test case * CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY); CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY); CREATE UNLOGGED TABLE test_uuid7 ( id uuid PRIMARY KEY); CREATE UNLOGGED TABLE test_bigint_1 ( id bigint PRIMARY KEY); CREATE UNLOGGED TABLE test_uuid_1 ( id uuid PRIMARY KEY); CREATE UNLOGGED TABLE test_uuid7_1 ( id uuid PRIMARY KEY); create sequence myseq cache 32767; *** Insert Test*** INSERT INTO test_bigint select nextval('myseq') from generate_series(1,10); INSERT INTO test_uuid select gen_random_uuid() from generate_series(1,10); INSERT INTO test_uuid7 select uuid_generate_v7() from generate_series(1,10); explain analyze INSERT INTO test_bigint_1 select id from test_bigint; EXPLAIN QUERY PLAN Insert on test_bigint_1 (cost=0.00..1444.18 rows=0 width=0) (actual time=220.689..220.690 rows=0 loops=1) -> Seq Scan on test_bigint (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.012..17.488 rows=10 loops=1) Planning Time: 0.137 ms Execution Time: 220.714 ms explain analyze INSERT INTO test_uuid_1 select id from test_uuid; EXPLAIN QUERY PLAN Insert on test_uuid_1 (cost=0.00..1541.85 rows=0 width=0) (actual time=311.949..311.950 rows=0 loops=1) -> Seq Scan on test_uuid (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.010..20.173 rows=10 loops=1) Planning Time: 0.082 ms Execution Time: 311.973 ms explain analyze INSERT INTO test_uuid7_1 select id from test_uuid7; EXPLAIN QUERY PLAN Insert on test_uuid7_1 (cost=0.00..1541.85 rows=0 width=0) (actual time=244.531..244.532 rows=0 loops=1) -> Seq Scan on test_uuid7 (cost=0.00..1541.85 rows=100085 width=16) (actual time=0.011..16.390 rows=10 loops=1) Planning Time: 0.084 ms Execution Time: 244.554 ms Explain analyze select * from test_bigint where id in (select id from test_bigint limit 10); Nested Loop (cost=2692.77..3793.23 rows=50059 width=8) (actual time=53.739..368.110 rows=10 loops=1) -> HashAggregate (cost=2692.48..2694.48 rows=200 width=8) (actual time=53.681..93.044 rows=10 loops=1) Group Key: test_bigint_1.id Batches: 5 Memory Usage: 11073kB Disk Usage: 208kB -> Limit (cost=0.00..1442.48 rows=10 width=8) (actual time=0.020..18.985 rows=10 loops=1) -> Seq Scan on test_bigint test_bigint_1 (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.019..11.330 rows=10 loops=1) -> Index Only Scan using test_bigint_pkey on test_bigint (cost=0.29..6.53 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=10) Index Cond: (id = test_bigint_1.id) Heap Fetches: 10 Planning Time: 0.373 ms Execution Time: 373.440 ms EXPLAIN Explain analyze select * from test_uuid where id in (select id from test_uuid limit 10); QUERY PLAN Nested Loop (cost=2790.96..4006.29 rows=50042 width=16) (actual time=48.251..410.786 rows=10 loops=1) -> HashAggregate (cost=2790.54..2792.54 rows=200 width=16) (actual time=48.157..76.176 rows=10 loops=1) Group Key: test_uuid_1.id Batches: 1 Memory Usage
Re: Table partition with primary key in 11.3
>> CREATE TABLE public.test1 ( >> x1 integer NOT NULL, >> x2 integer NOT NULL, >> CONSTRAINT test1_pkey PRIMARY KEY (x1) INCLUDE(x2) >> ) PARTITION BY RANGE (x2); >> This query works in 11.1 but fails in 11.3 with messages: >> ERROR: insufficient columns in PRIMARY KEY constraint definition >> DETAIL: PRIMARY KEY constraint on table "test1" lacks column "x2" which >> is part of the partition key. >> SQL state: 0A000 > Indeed, that primary key is no good. It was a bug that 11.1 > allowed it, which was fixed here: > Author: Alvaro Herrera > Branch: master [0ad41cf53] 2019-01-14 19:28:10 -0300 > Branch: REL_11_STABLE Release: REL_11_2 [74aa7e046] 2019-01-14 19:25:19 -0300 > Fix unique INCLUDE indexes on partitioned tables > We were considering the INCLUDE columns as part of the key, allowing > unicity-violating rows to be inserted in different partitions. > Concurrent development conflict in eb7ed3f30634 and 8224de4f42cc. > Reported-by: Justin Pryzby > Discussion: https://postgr.es/m/20190109065109.ga4...@telsasoft.com > regards, tom lane I think that your position about primary keys in partitional tables is not right. If we see regular table, one-field primary key is cross-table unique. In partitional tables for users view we MUST also seen unique one-field primary key because this is user requirement and another keys can destroy logic between regular and partitional tables and functionality of partitioning becomes useless. For administrators of table we not-MAY, but MUST see one-field unique primary key in cross-table realisation. All another realizations are plugging holes in a hurry and non-logical at global review of engine.
Question on Partition key
Hello Friends, We are trying to create a monthly range partition table , partitioned on column PART_DATE. This will hold Orders and part_date is nothing but invoice date. Some Team mates are asking to use the "PART_DATE" column as data type "INTEGER" with "MM" format and also define partitions as below. Want to know experts' views on this. If the data type of the partition key matters here or not? Or if there is any downside of each approach in future? -- With date data type, It will look like as below CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE DATE NOT NULL )PARTITION BY RANGE (PART_DATE); CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01') TO ('2023-10-01'); CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01') TO ('2023-11-01'); CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01') TO ('2023-12-01'); ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE ); VS -- With integer data type, It will look like as below CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE_YM_NM Integer NOT NULL ) PARTITION BY RANGE (PART_DATE_YM_NM); CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('202309') TO ('202310'); CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('202310') TO ('202311'); CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('202311') TO ('202312'); ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 , PART_DATE_YM_NM );
Re: Question on Partition key
Thank you so much for the clarification. Actually team have used similar partitioning strategy on integer columns in past. So they are inclined towards that. I will still, double check with others if any business restrictions exists. But as you already mentioned, it's not good in terms of data quality perspective. I agree to this point. Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality estimates, as because there is a big difference between , comparing or substracting, two date values VS two number values. And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding correct here? On Sun, 3 Sept, 2023, 2:02 pm Deep, wrote: > Have your friends also mentioned how it is going to help to convert date > field to integer !??? > > On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold wrote: > >> On 03/09/2023 00:35 CEST veem v wrote: >> >> > We are trying to create a monthly range partition table , partitioned on >> > column PART_DATE. This will hold Orders and part_date is nothing but >> invoice >> > date. Some Team mates are asking to use the "PART_DATE" column as data >> type >> > "INTEGER" with "MM" format [...] >> >> Why do your team mates favor integer over date? >> >> > Want to know experts' views on this. If the data type of the partition >> key >> > matters here or not? >> >> Both integer and date are stored as 4 bytes. There should be no >> difference >> regarding index size. I don't know if the data type makes a difference in >> partition pruning performance in this case, but I'd be surprised if it >> were >> the case. >> >> > Or if there is any downside of each approach in future? >> >> The downside of integer is that it allows invalid dates (e.g. 202313) >> unless >> you also add check constraints. But then just use date if you want to >> store >> dates. You get input validation and can use the date operators and >> functions >> that Postgres offers. >> >> -- >> Erik >> >> >>
Database selection
Hello All, I see multiple docs on the internet, stating the difference between Mysql and postgres. But I want to understand the real views of the experienced folks here. While we are on AWS and planning to opt for one of the relational databases out of mysql and postgres for our application(It is a Hybrid type, mostly dealing with batch processing but also supporting OLTP type UI screens etc. and ACID transactions preferred). What all things/parameters should we keep in mind here for opting a database? or in other words how to evaluate the suitable database for our application? Regards Veem
Re: Database selection
Thank you. Yes feature wise postgres seems rich as I read multiple blogs. And right now, I can't think of any other reason for opting mysql though as opposed to postgres. However, One of the things regarding the transaction management, which I see in most postgres blogs is 'mvcc'(multiversion concurrency control) being an issue at times in a heavy concurrent system, in postgres as its maintaining exact copies of all the old versions of the rows(if someone still reading those) and at times cleaning these(vacuuming) becomes a pain. Does AWS aurora postgres depend on the same vacuuming technology for maintaining the transactions? Does Aurora Mysql opt for a similar strategy for transaction management? or any different/better ones? On Thu, 21 Sept 2023 at 02:09, Laurenz Albe wrote: > On Thu, 2023-09-21 at 01:17 +0530, veem v wrote: > > I see multiple docs on the internet, stating the difference between > Mysql and postgres. > > But I want to understand the real views of the experienced folks here. > > > > While we are on AWS and planning to opt for one of the relational > databases out of > > mysql and postgres for our application(It is a Hybrid type, mostly > dealing with > > batch processing but also supporting OLTP type UI screens etc. and ACID > transactions preferred). > > > > What all things/parameters should we keep in mind here for opting a > database? > > or in other words how to evaluate the suitable database for our > application? > > You are asking the wrong people. We here on the PostgreSQL mailing list > will tell > you that you shouldn't touch MySQL except with a long stick, and for > almost all > use cases PostgreSQL is preferable. The people on MySQL mailing lists or > forums > might tell an entirely different story. > > I would say something about license and free software, but if you plan to > lock > yourself into the cage of a cloud hosted database, that probably doesn't > matter much. > > Yours, > Laurenz Albe >
Question on Aurora postgres
Hello all, We are planning to use aurora Postgres for a few applications. But wanted some guidance on which instance, class type should we use for lower environment/non prod like Dev/Qa/Uat/perf and what should we use for production database? Is there some recommendation based on usage etc. for this? Regards Veem
Re: Question on Aurora postgres
Thank you so much. So basically this application is an enterprise app with 24/7 users(doing DML and querying both) and is currently in Oracle on premise database. It has data in the range of ~20-30TB and the queries will be happening 24/7 on this database. It's an OLTP app. And ofcourse lower environment Dev/qa/uat doesn't have those kinds of usage and data volume as prod. In the document below , I see a lot of instance types like Memory optimized X family, R family and T based instance class. So I'm a bit confused, how to get started with it, and if we can move from one type to another seamlessly if required in future? Will the cpu and memory, storage mentioned against each of these instance types be directly proportional to the CPU and memory size, storage size of data which we currently have on our on-premise oracle database? https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html On Wed, 1 Nov 2023 at 04:55, Ben Chobot wrote: > veem v wrote on 10/31/23 2:49 PM: > > Hello all, > We are planning to use aurora Postgres for a few applications. But wanted > some guidance on which instance, class type should we use for lower > environment/non prod like Dev/Qa/Uat/perf and what should we use for > production database? Is there some recommendation based on usage etc. for > this? > > > Regardless of if you are going to be making databases in a cloud provider > or on physical hardware, the first step is to figure out what your load > will likely be and what variables will be influencing it. Will you have a > lot of data? Will it churn very much? Will you have a lot of concurrent > reads? Will they be simple queries or memory intensive ones? > > Only you can answer these questions, and once you have, the many (many) > options available to you will start to separate out and there will be some > obvious things to try. One of the great things about making databases in > the cloud is that you aren't locked into any sizing choice like you can by > if you go build a server. >
Question on overall design
Hello All, Although it's not exactly related to opensource postgre but want to ask this question here to understand colleagues' view, considering having decades of experience in the database world, We want some guidance, if the below design looks okay for our customer use case. We currently have financial systems transaction data streams to Oracle exadata(X9) on-premise. This database supports processing of 400million transactions per day. A single transaction for us is a combination of 7-8 inserts into different transaction tables with Indexes , unique constraints etc defined on those. The transactions processed/committed in batches(~1000 batch size) in the database. And this system persists data for ~6 months. We do have all sorts of OLAP(daily/monthly batch reports running) applications run on the same database along with some user facing UI applications showing customer transactions. So it's basically currently serving a hybrid workload and is one stop solution for all use cases. Many of the applications are moving from on premise to AWS cloud as part of modernization journey and AWS being chosen cloud partner also the product is expected to expand across more regions and this system is expected to serve increase in the transaction volume. And also we have a requirement to persist transaction data for ~10years to have those available for analytics/data science use cases. So the team is thinking of splitting it into two parts 1)OLTP type use case in which we will persist/write the transaction data faster and show it to the UI related apps , in near real time/quickest possible time. and this database will store Max 60-90 days of transaction data. Not sure if we have an option of Oracle exadata equivalent on AWS, so team planning of using/experimenting with Aurora postgres. Please correct me, if there are any other options we should use otherwise? 2)Then move the data beyond ~90 days into another database or object storage S3 which will keep it there for ~10 years and will be queryable using the necessary API's. That is supposed to cater to Olap/analytics/data science use cases etc. Is the above design is okay? and also in regards to the second point above i.e. persisting the historical data (that to be in queryable state), should we go for some database like snowflake or should just keep it on S3 as is and make those queryable through APIs. Please advice?
Re: Question on overall design
Ron Johnson wrote: > "OK" is relative, but it's what we did in a similar situation: two years > of data on-line and 5 years of data in compressed files in S3. (We're > required to keep data for 7 years, but they *never* ask for records more > than 2 years old. If they ever do, we'll manually load the relevant data > back into PG.) > (I can't imagine that querying billions of unindexed flat-file records via > S3 would be fast.) > How often do end users look for data more than 90 days old? Two years old? > How quickly does the old data need to be available? > Maybe a four-tiered system of PG-Snowflake-S3-S3_Glacier would balance > speed and cost (or maybe Snowflake would just drive up costs). Thank You so much Ron. When you said "*two years of data on-line and 5 years of data in compressed files in S3*." So do you mean two years of data in aurora postgre which will be OLTP database and rest just dump as is in S3 bucket and when we need any such query for those data , just simply read and dump those S3 files back in the aurora postgre? In the currently running oracle exadata system , it has SIX months of data (which is ~200TB) and the transaction tables are all range partitioned on a daily basis. And out of that ~2months of data gets frequently queried and other ~4months of data gets queried less frequently. However, in the target architecture which we want to build on cloud here, there are some requirements for the analytics/data science team to query ~3years history. Beyond ~3years we may need that data rarely. We were initially thinking of just having one database to serve both OLTP and OLAP use cases(somewhat like oracle exadata currently doing for us) but it seems we don't have such an option here on AWS. Postgre will serve OLTP use case whereas Snowflake will serve OLAP use case. So do you suggest having both the databases in use, like recent transaction data for last 3 months should be streamed to aurora postgre, then from 3months till 3years of data should be parked in snowflake which will serve OLAP/analytics use case. and from 3years till 10years will be kept in S3 (as parquet or Iceberg format) so that even Snowflake can query those directly when needed. OR Do you suggest just keeping last ~3months of data on Aurora postgre and rest everything on snowflake considering it will store those as compressed format and also storage is cheap(23$ per TB per month)? Few colleagues are pointing to databricks for the analytics use case. Is that a good option here? On Sat, 9 Dec 2023 at 16:43, veem v wrote: > Hello All, > Although it's not exactly related to opensource postgre but want to ask > this question here to understand colleagues' view, considering having > decades of experience in the database world, We want some guidance, if the > below design looks okay for our customer use case. > > We currently have financial systems transaction data streams to Oracle > exadata(X9) on-premise. This database supports processing of 400million > transactions per day. A single transaction for us is a combination of 7-8 > inserts into different transaction tables with Indexes , unique constraints > etc defined on those. The transactions processed/committed in batches(~1000 > batch size) in the database. And this system persists data for ~6 months. > We do have all sorts of OLAP(daily/monthly batch reports running) > applications run on the same database along with some user facing UI > applications showing customer transactions. So it's basically currently > serving a hybrid workload and is one stop solution for all use cases. > > Many of the applications are moving from on premise to AWS cloud as part > of modernization journey and AWS being chosen cloud partner also the > product is expected to expand across more regions and this system is > expected to serve increase in the transaction volume. And also we have a > requirement to persist transaction data for ~10years to have those > available for analytics/data science use cases. > > So the team is thinking of splitting it into two parts > 1)OLTP type use case in which we will persist/write the transaction data > faster and show it to the UI related apps , in near real time/quickest > possible time. and this database will store Max 60-90 days of transaction > data. Not sure if we have an option of Oracle exadata equivalent on AWS, so > team planning of using/experimenting with Aurora postgres. Please correct > me, if there are any other options we should use otherwise? > > 2)Then move the data beyond ~90 days into another database or object > storage S3 which will keep it there for ~10 years and will be queryable > using the necessary API's. That is supposed to cater to Olap/analytics/data > science use cases etc. > > Is the above des
Re: Question on overall design
Thank you so much for the response. Got your point, will check if we really need details or summary for the historical data. But it looks like we will need detailed transaction data for ~2 years at least. My understanding was that AWS has two different offerings and "aurora postgresql" is more performant and costlier than "RDS postgresql". Will double check on this though. However , how to prove RDS/aurora postgresql is going to serve our OLTP requirement here , similar to the current on premise Oracle exadata. For the OLTP use case we are expecting ~15K TPS write and 2K TPS read and the response for the UI queries are expected to be within subseconds. But yes, as i mentioned individual transactions will be batched and then will be written to the database, so this should have lesser resource consumption and contention created. To test if Aurora postgresql will be comparable to cater the above needs (in regards to the expected performance with nominal cost) ,how should we test it? As we won't be able to test everything right away, Should we test basic read and write performance and benchmark to have some confidence and go ahead with development? Say for example if one transaction consists of ~8 Inserts we can create a sample target table on aurora Postgre with required indexes/constraints and try running those inserts from multiple threads(for concurrency) using blazemeter and see/compare the response time, CPU, IO, Memory usage for the Postgresql database with set TPS. Similarly to see read performance, we can run multiple select queries from blazemeter and compare the response time. Is this the correct approach for validating the database here or any other approach exists? And another question coming to mind, I read in past Vaccum to be a problem in postgresql, is it going to give trouble in Aurora postgresql too, for such a highly transactional read/write system? How to test/validate that? On Sun, 10 Dec 2023 at 01:29, Ron Johnson wrote: > On Sat, Dec 9, 2023 at 2:13 PM veem v wrote: > >> >> Ron Johnson >> wrote: >> >>> "OK" is relative, but it's what we did in a similar situation: two years >>> of data on-line and 5 years of data in compressed files in S3. (We're >>> required to keep data for 7 years, but they *never* ask for records >>> more than 2 years old. If they ever do, we'll manually load the relevant >>> data back into PG.) >>> (I can't imagine that querying billions of unindexed flat-file records >>> via S3 would be fast.) >>> How often do end users look for data more than 90 days old? Two years >>> old? >>> How quickly does the old data need to be available? >>> Maybe a four-tiered system of PG-Snowflake-S3-S3_Glacier would balance >>> speed and cost (or maybe Snowflake would just drive up costs). >> >> >> Thank You so much Ron. >> >> When you said "*two years of data on-line and 5 years of data in >> compressed files in S3*." So do you mean two years of data in aurora >> postgre which will be OLTP database and rest just dump as is in S3 bucket >> and when we need any such query for those data , just simply read and dump >> those S3 files back in the aurora postgre? >> > > (RDS Postgresql, not Aurora, but that's beside the point.) > > Yes. > > But keep reading... > > In the currently running oracle exadata system , it has SIX months of data >> (which is ~200TB) and the transaction tables are all range partitioned on a >> daily basis. And out of that ~2months of data gets frequently queried and >> other ~4months of data gets queried less frequently. However, in the >> target architecture which we want to build on cloud here, there are some >> requirements for the analytics/data science team to query ~3years history. >> Beyond ~3years we may need that data rarely. >> >> We were initially thinking of just having one database to serve both OLTP >> and OLAP use cases(somewhat like oracle exadata currently doing for us) but >> it seems we don't have such an option here on AWS. Postgre will serve OLTP >> use case whereas Snowflake will serve OLAP use case. >> >> So do you suggest having both the databases in use, like recent >> transaction data for last 3 months should be streamed to aurora postgre, >> then from 3months till 3years of data should be parked in snowflake which >> will serve OLAP/analytics use case. and from 3years till 10years will be >> kept in S3 (as parquet or Iceberg format) so that even Snowflake can query >> those directly when needed. >> >> OR >> >> Do you suggest just keeping last ~3months of data on Aurora postgre and >> rest every
Re: Question on overall design
Thank you so much Ron. I have some more doubts related to this. We were thinking , if there is any utility in PG with which we can create/generate large sample data volume which we can use to run it on our on premise Oracle exadata box and use the same on the aurora postgresql in cloud to see the read and write performance comparison. Is there any such exists, to quickly get some large data volume? But i think , what you are saying is, we should use real data to get actual or closest possible benchmarks, correct me if wrong? We used to see the data dictionary views (called AWR views) in Oracle to see the current and historical performance statistics like CPU, IO , Memory usage, object level contentions etc. in the oracle database. Do we have such a thing available in Aurora postgre, so as to monitor the performance and get some idea of how well the load test goes and what capacity is available or are we saturating it? When you said "*Beware of excessive partitioning. We had to "departion" most tables, because of the impact on performance.*" , as I understand partitioning helps in reducing IO while reading , as it scans less data because of partition pruning. And while writing there is almost minimal overhead in identifying the required partition to which the INSERTed/Updated/Deleted data will be landing ON. So can you please help me understand what exact performance impact you are highlighting here? Are you talking about the size of the partitions or total count of the partitions? In our case we are planning to do either daily range partition or hourly based on data data volume, not sure if there exists some sweet spot in regards to what should be the size of each partition in postgresql be. If you are pointing to higher count of partitions of table , then in our case if we persist ~90 days data then for a transaction table it would be ~90 daily range partitions or ~2160 hourly range partitions in the aurora postgresql. It would be helpful , if you can explain a bit regarding what exact performance impact you faced in regards to the partitioning in aurora postgresql. *"Use ora2pg to export views in the Oracle database. It's very easy; a small EC2 VM running Linux with enough disk space lets you automate the extraction from Oracle and importation into AWS Postgresql.)"* Need to explore a bit more on this I believe. We have an oracle on premise database, so we can move data directly to aurora postgresql in the cloud. Another thing , is we have some sample data available in the AWS snowflake but then not sure if some mechanism is available to move the same data to the aurora postgresql ? On Sun, 10 Dec 2023 at 02:27, Ron Johnson wrote: > I don't know anything about Aurora, only have experience with RDS > Postgresql. > > We successfully migrated from on-prem Oracle (12c, I think) to RDS > Postgresql 12, and were very happy: little down time (I take pride in > designing/implementing that; thanks, ora2pg!), with lower disk (8TB, down > to 5TB) and CPU usage. > > I'm not sure what the TPS was in Oracle, but the server level "we" are on > (I'm no longer with that company, and don't remember the specifics (48 vCPU > / 132 GB RAM, I think, with 10,000 IOPs) is over-powered for *our* needs. > > You're going to have to spin up a full-sized instance, import a *lot* of > real data(*) into a production-identical schema and then run your batch > load process using test data (or copies of real batches). That's the only > way you'll *really* know. > > Beware of excessive partitioning. We had to "departion" most tables, > because of the impact on performance. > > (*) Use ora2pg to export views in the Oracle database. It's *very* easy; > a small EC2 VM running Linux with enough disk space lets you automate the > extraction from Oracle and importation into AWS Postgresql.) > > On Sat, Dec 9, 2023 at 3:36 PM veem v wrote: > >> Thank you so much for the response. >> >> Got your point, will check if we really need details or summary for the >> historical data. But it looks like we will need detailed transaction data >> for ~2 years at least. >> >> My understanding was that AWS has two different offerings and "aurora >> postgresql" is more performant and costlier than "RDS postgresql". Will >> double check on this though. >> >> However , how to prove RDS/aurora postgresql is going to serve our OLTP >> requirement here , similar to the current on premise Oracle exadata. For >> the OLTP use case we are expecting ~15K TPS write and 2K TPS read and the >> response for the UI queries are expected to be within subseconds. But yes, >> as i mentioned individual transactions will be batched and then will be >> written to the database, so
Re: Question on overall design
Thank you for your response. Apology if it sounds silly, but is it advisable to use just one database in this flow, say for e.g snowflake for olap usecase and use a cache layer(like reddish) on top of it to cater our oltp Usecase, rather having a dedicated oltp database like Aurora postgresql? Basically wanted to know, if anythings we would be missing if not having an oltp database here for our usecase. On Sun, 10 Dec, 2023, 10:25 pm Ron Johnson, wrote: > * PG has pgbench; *maybe* you can hack it to work on Oracle. > * If you want to know how well an RDBMS will work on your workload, then > you must provide it with a simulated workload. Right? > * AWS RDS Postgresql has a dashboard that *might* be similar to AWR. Or > it might not... > * We departitioned because SELECT statements were *slow*. All partitions > were scanned, even when the partition key was specified in the WHERE clause. > > On Sun, Dec 10, 2023 at 8:45 AM veem v wrote: > >> Thank you so much Ron. I have some more doubts related to this. >> >> We were thinking , if there is any utility in PG with which we can >> create/generate large sample data volume which we can use to run it on our >> on premise Oracle exadata box and use the same on the aurora postgresql in >> cloud to see the read and write performance comparison. Is there any such >> exists, to quickly get some large data volume? But i think , what you are >> saying is, we should use real data to get actual or closest possible >> benchmarks, correct me if wrong? >> >> We used to see the data dictionary views (called AWR views) in Oracle to >> see the current and historical performance statistics like CPU, IO , Memory >> usage, object level contentions etc. in the oracle database. Do we have >> such a thing available in Aurora postgre, so as to monitor the >> performance and get some idea of how well the load test goes and what >> capacity is available or are we saturating it? >> >> When you said "*Beware of excessive partitioning. We had to "departion" >> most tables, because of the impact on performance.*" , as I understand >> partitioning helps in reducing IO while reading , as it scans less data >> because of partition pruning. And while writing there is almost minimal >> overhead in identifying the required partition to which the >> INSERTed/Updated/Deleted data will be landing ON. So can you please help me >> understand what exact performance impact you are highlighting here? Are you >> talking about the size of the partitions or total count of the partitions? >> In our case we are planning to do either daily range partition or hourly >> based on data data volume, not sure if there exists some sweet spot in >> regards to what should be the size of each partition in postgresql be. If >> you are pointing to higher count of partitions of table , then in our case >> if we persist ~90 days data then for a transaction table it would be ~90 >> daily range partitions or ~2160 hourly range partitions in the aurora >> postgresql. It would be helpful , if you can explain a bit regarding what >> exact performance impact you faced in regards to the partitioning in aurora >> postgresql. >> >> *"Use ora2pg to export views in the Oracle database. It's very easy; a >> small EC2 VM running Linux with enough disk space lets you automate the >> extraction from Oracle and importation into AWS Postgresql.)"* >> >> Need to explore a bit more on this I believe. We have an oracle on >> premise database, so we can move data directly to aurora postgresql in the >> cloud. Another thing , is we have some sample data available in the AWS >> snowflake but then not sure if some mechanism is available to move the same >> data to the aurora postgresql ? >> >> On Sun, 10 Dec 2023 at 02:27, Ron Johnson >> wrote: >> >>> I don't know anything about Aurora, only have experience with RDS >>> Postgresql. >>> >>> We successfully migrated from on-prem Oracle (12c, I think) to RDS >>> Postgresql 12, and were very happy: little down time (I take pride in >>> designing/implementing that; thanks, ora2pg!), with lower disk (8TB, down >>> to 5TB) and CPU usage. >>> >>> I'm not sure what the TPS was in Oracle, but the server level "we" are >>> on (I'm no longer with that company, and don't remember the specifics (48 >>> vCPU / 132 GB RAM, I think, with 10,000 IOPs) is over-powered for *our* >>> needs. >>> >>> You're going to have to spin up a full-sized instance, import a *lot* of >>> real data(*)
Read write performance check
Hello All, Its Aurora postgresql database in AWS. We have a couple of custom tables created with some ~100 odd columns and required indexes and constraints created on them and that represents a certain business use case. Currently we don't have any data in them. Requirement is to test read and write queries performance for this sample use cases/tables. We need to evaluate both row by row and batch read/write performance. Also performance with concurrent read/write from multiple threads/sessions. As i understand below things to be done here, 1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random values as per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batched insert. 2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happen on that table row by row and batched way from the concurrent session. I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creating a simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests. I want to know if this will still work on Aurora postgresql from intellij client worksheet. Regards Veem
Re: Read write performance check
Thank you for the confirmation. So at first, we need to populate the base tables with the necessary data (say 100million rows) with required skewness using random functions to generate the variation in the values of different data types. Then in case of row by row write/read test , we can traverse in a cursor loop. and in case of batch write/insert , we need to traverse in a bulk collect loop. Something like below and then this code can be wrapped into a procedure and passed to the pgbench and executed from there. Please correct me if I'm wrong. Also can you please guide how the batch(say batch size of ~1000) Insert can be written ? -- Row by row write FOR i IN 1..total_rows LOOP data_row := (SELECT md5(random()::TEXT), floor(random() * 100)::INT, random() * 1000::NUMERIC, NOW()::TIMESTAMP ); INSERT INTO BASE_TABLE(column1, column2, column3, column4) VALUES (data_row.column1, data_row.column2, data_row.column3, data_row.column4); END LOOP; --Row by row read BEGIN FOR i IN 1..total_rows LOOP -- Row by row read SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1; END LOOP; END; -- Batch read BEGIN -- Batch read OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows; CLOSE data_set; END; On Tue, 19 Dec 2023 at 12:58, Ilya Kosmodemiansky wrote: > Hi Veem, > > On Tue, Dec 19, 2023 at 7:36 AM veem v wrote: > > 1)For write performance , the rows needs to be inserted from multiple > sessions at same time, with required random values as per the data types > i.e. Character, Number, date columns. And this needs to be tested for row > by row insert and batched insert. > > > > 2)For the read performance test , the table first has to be populated > with those ~100million rows. Then querying will happen on that table row by > row and batched way from the concurrent session. > > > > I am new to postgresql but mostly worked with Oracle, so I wanted to > understand if the above can be achieved by creating a simple procedure or > will a major effort be needed? And I saw a few blogs , pgbench to be used > for concurrency tests. > > Yes, you are right, pgbench with customized script is what you are looking > for > > >I want to know if this will still work on Aurora postgresql from intellij > client worksheet. > > pgbench would work with aurora as with normal postgres, it is > basically a shell script which connects to the database. Not sure if > idea worksheet would help you in such case however, you can run it > just from any machine with pgbench installed > > > Best regards, > Ilya > > > -- > Ilya Kosmodemiansky > CEO, Founder > > Data Egret GmbH > Your remote PostgreSQL DBA team > T.: +49 6821 919 3297 > i...@dataegret.com >
Re: Read write performance check
Thank you. Yes, actually we are trying to compare and see what maximum TPS are we able to reach with both of these row by row and batch read/write test. And then afterwards, this figure may be compared with other databases etc with similar setups. So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable? I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective? On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, wrote: > On 2023-12-20 00:44:48 +0530, veem v wrote: > > So at first, we need to populate the base tables with the necessary > data (say > > 100million rows) with required skewness using random functions to > generate the > > variation in the values of different data types. Then in case of row by > row > > write/read test , we can traverse in a cursor loop. and in case of batch > write/ > > insert , we need to traverse in a bulk collect loop. Something like > below and > > then this code can be wrapped into a procedure and passed to the pgbench > and > > executed from there. Please correct me if I'm wrong. > > One important point to consider for benchmarks is that your benchmark > has to be similar to the real application to be useful. If your real > application runs on a different node and connects to the database over > the network, a benchmark running within a stored procedure may not be > very indicative of real performance. > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
Re: Read write performance check
Thank you. That would really be helpful if such test scripts or similar setups are already available. Can someone please guide me to some docs or blogs or sample scripts, on same please. On Wed, 20 Dec, 2023, 10:34 am Lok P, wrote: > As Rob mentioned, the syntax you posted is not correct. You need to > process or read a certain batch of rows like 1000 or 10k etc. Not all 100M > at one shot. > > But again your uses case seems common one considering you want to compare > the read and write performance on multiple databases with similar table > structure as per your usecase. So in that case, you may want to use some > test scripts which others must have already done rather reinventing the > wheel. > > > On Wed, 20 Dec, 2023, 10:19 am veem v, wrote: > >> Thank you. >> >> Yes, actually we are trying to compare and see what maximum TPS are we >> able to reach with both of these row by row and batch read/write test. And >> then afterwards, this figure may be compared with other databases etc with >> similar setups. >> >> So wanted to understand from experts here, if this approach is fine? Or >> some other approach is advisable? >> >> I agree to the point that , network will play a role in real world app, >> but here, we are mainly wanted to see the database capability, as network >> will always play a similar kind of role across all databases. Do you >> suggest some other approach to achieve this objective? >> >> >> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, wrote: >> >>> On 2023-12-20 00:44:48 +0530, veem v wrote: >>> > So at first, we need to populate the base tables with the necessary >>> data (say >>> > 100million rows) with required skewness using random functions to >>> generate the >>> > variation in the values of different data types. Then in case of row >>> by row >>> > write/read test , we can traverse in a cursor loop. and in case of >>> batch write/ >>> > insert , we need to traverse in a bulk collect loop. Something like >>> below and >>> > then this code can be wrapped into a procedure and passed to the >>> pgbench and >>> > executed from there. Please correct me if I'm wrong. >>> >>> One important point to consider for benchmarks is that your benchmark >>> has to be similar to the real application to be useful. If your real >>> application runs on a different node and connects to the database over >>> the network, a benchmark running within a stored procedure may not be >>> very indicative of real performance. >>> >>> hp >>> >>> -- >>>_ | Peter J. Holzer| Story must make more sense than reality. >>> |_|_) || >>> | | | h...@hjp.at |-- Charles Stross, "Creative writing >>> __/ | http://www.hjp.at/ | challenge!" >>> >>
Re: Read write performance check
Can someone please guide me, if any standard scripting is available for doing such read/write performance test? Or point me to any available docs? On Wed, 20 Dec, 2023, 10:39 am veem v, wrote: > Thank you. > > That would really be helpful if such test scripts or similar setups are > already available. Can someone please guide me to some docs or blogs or > sample scripts, on same please. > > On Wed, 20 Dec, 2023, 10:34 am Lok P, wrote: > >> As Rob mentioned, the syntax you posted is not correct. You need to >> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M >> at one shot. >> >> But again your uses case seems common one considering you want to compare >> the read and write performance on multiple databases with similar table >> structure as per your usecase. So in that case, you may want to use some >> test scripts which others must have already done rather reinventing the >> wheel. >> >> >> On Wed, 20 Dec, 2023, 10:19 am veem v, wrote: >> >>> Thank you. >>> >>> Yes, actually we are trying to compare and see what maximum TPS are we >>> able to reach with both of these row by row and batch read/write test. And >>> then afterwards, this figure may be compared with other databases etc with >>> similar setups. >>> >>> So wanted to understand from experts here, if this approach is fine? Or >>> some other approach is advisable? >>> >>> I agree to the point that , network will play a role in real world app, >>> but here, we are mainly wanted to see the database capability, as network >>> will always play a similar kind of role across all databases. Do you >>> suggest some other approach to achieve this objective? >>> >>> >>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, wrote: >>> >>>> On 2023-12-20 00:44:48 +0530, veem v wrote: >>>> > So at first, we need to populate the base tables with the necessary >>>> data (say >>>> > 100million rows) with required skewness using random functions to >>>> generate the >>>> > variation in the values of different data types. Then in case of row >>>> by row >>>> > write/read test , we can traverse in a cursor loop. and in case of >>>> batch write/ >>>> > insert , we need to traverse in a bulk collect loop. Something like >>>> below and >>>> > then this code can be wrapped into a procedure and passed to the >>>> pgbench and >>>> > executed from there. Please correct me if I'm wrong. >>>> >>>> One important point to consider for benchmarks is that your benchmark >>>> has to be similar to the real application to be useful. If your real >>>> application runs on a different node and connects to the database over >>>> the network, a benchmark running within a stored procedure may not be >>>> very indicative of real performance. >>>> >>>> hp >>>> >>>> -- >>>>_ | Peter J. Holzer| Story must make more sense than reality. >>>> |_|_) || >>>> | | | h...@hjp.at |-- Charles Stross, "Creative writing >>>> __/ | http://www.hjp.at/ | challenge!" >>>> >>>
Moving to Postgresql database
Hello Experts, If some teams are well versed with the Oracle database architecture and its optimizers working and designing applications on top of this. Now moving same team to work on AWS aurora postgresql databases design/development projects. Is any key design/architectural changes should the app development team or the database design team, should really aware about, so as to take right decision on any new development project in AWS aurora postgresql database? Or Is there any list of differences(as compared to Oracle database) in key concepts like for example basic design concepts, Normalization, Partitioning, clustering, backup and recovery, Indexing strategy, isolation level, performance which one should definitely be aware of?
Query running longer
Hello All, We have the below query which is running for ~45 seconds on postgres aurora reader instance. I have captured the explain analyze. Want to understand, where exactly the resources are getting spent and if we can be able to optimize it further. It's a UI query showing top 50 rows and is supposed to finish in sub seconds but it takes around ~45 seconds to finish. Also seeing multiple workers in the plan, does that mean the query is running in parallel somehow? explain (analyze,verbose,costs,buffers) select TAB1.PRI from SCHEMA1.TAB1 TAB1 inner join SCHEMA1.TAB4 TAB4 on TAB4.PRI = TAB1.PRI inner join SCHEMA1."TAB2" TAB2 on TAB2.PRI = TAB1.PRI inner join SCHEMA1.TAB3 a2 on a2.AID = TAB2.AID where TAB2.MID = X and TAB4.TAB4_code = 'XX' and TAB2.TAB2_time between '2024-01-01' and '2024-01-31' order by TAB2.TAB2_time desc limit 50; Limit (cost=13052924.01..13052924.14 rows=50 width=45) (actual time=45211.971..45224.720 rows=50 loops=1) " Output: TAB1.PRI, TAB2.TAB2_time" Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Sort (cost=13052924.01..13052924.19 rows=70 width=45) (actual time=45211.969..45224.713 rows=50 loops=1) "Output: TAB1.PRI, TAB2.TAB2_time" Sort Key: TAB2.TAB2_time DESC Sort Method: top-N heapsort Memory: 32kB Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Gather (cost=92917.38..13052921.87 rows=70 width=45) (actual time=947.004..45221.915 rows=5428 loops=1) " Output: TAB1.PRI, TAB2.TAB2_time" Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 -> Nested Loop (cost=91917.38..13051914.87 rows=18 width=45) (actual time=945.946..45195.224 rows=1086 loops=5) "Output: TAB1.PRI, TAB2.TAB2_time" Inner Unique: true Buffers: shared hit=1980943 read=2335820 I/O Timings: shared/local read=112477.014 Worker 0: actual time=936.808..45193.518 rows=1036 loops=1 Buffers: shared hit=382606 read=465076 I/O Timings: shared/local read=22452.028 Worker 1: actual time=947.246..45194.168 rows=1055 loops=1 Buffers: shared hit=383165 read=484189 I/O Timings: shared/local read=22617.135 Worker 2: actual time=933.623..45192.534 rows=1145 loops=1 Buffers: shared hit=415758 read=473182 I/O Timings: shared/local read=22741.488 Worker 3: actual time=965.639..45193.603 rows=1078 loops=1 Buffers: shared hit=398009 read=449053 I/O Timings: shared/local read=1.094 -> Nested Loop (cost=91916.81..13051828.80 rows=18 width=81) (actual time=945.917..43729.931 rows=1086 loops=5) " Output: TAB1.PRI, TAB2.TAB2_time, TAB2.AID" Inner Unique: true Join Filter: ((TAB4.PRI)::text = (TAB1.PRI)::text) Buffers: shared hit=1962289 read=2328363 I/O Timings: shared/local read=105246.220 Worker 0: actual time=936.781..43732.652 rows=1036 loops=1 Buffers: shared hit=379077 read=463587 I/O Timings: shared/local read=21008.508 Worker 1: actual time=947.212..43699.507 rows=1055 loops=1 Buffers: shared hit=379573 read=482704 I/O Timings: shared/local read=21142.572 Worker 2: actual time=933.589..43696.710 rows=1145 loops=1 Buffers: shared hit=411836 read=471634 I/O Timings: shared/local read=21266.581 Worker 3: actual time=965.608..43768.535 rows=1078 loops=1 Buffers: shared hit=394288 read=447583 I/O Timings: shared/local read=20814.288 -> Parallel Hash Join (cost=91916.24..13051765.39 rows=18 width=117) (actual time=945.879..42758.939 rows=1086 loops=5) "Output: TAB4.PRI, TAB2.TAB2_time, TAB2.PRI, TAB2.AID" Hash Cond: ((TAB2.PRI)::text = (TAB4.PRI)::text) Buffers: shared hit=1943792 read=2322814 I/O Timings: shared/local read=100496.787 Worker 0: actual time=936.743..42798.247 rows=1036 loops=1 Buffers: shared hit=375573 read=462501 I/O Timings: shared/local
Re: Query running longer
On Fri, 2 Feb 2024 at 02:43, Laurenz Albe wrote: > On Fri, 2024-02-02 at 02:27 +0530, veem v wrote: > > We have the below query which is running for ~45 seconds on postgres > aurora reader instance. > > I have captured the explain analyze. Want to understand, where exactly > the resources are > > getting spent and if we can be able to optimize it further. > > Aurora <> PostgreSQL, but here is what I can see: > > - The index scan on SCHEMA1."TAB2" has to check 2 million extra > rows because "work_mem" is too small. Almost the complete time > is spent there. > > - You may be getting a bad plan, because the statistics on > SCHEMA1.TAB4 are either out of date or not detailed enough, > which makes PostgreSQL underestimate the result size. > > Thank you so much. I can read and understand the execution plan in Oracle database easily and can also interpret database parameters too. But new to postgres, are there some good resources to quickly get up to speed to understand the postgres execution path here and also various key parameters(like for e.g. one you mentioned as work_mem)? As you mentioned below, So wondering how you got to know, if this is the step where majority of the DB resources get spent. And as total time the query ran was ~45 seconds, and out of that how much time it spent in this step, how can i get that? And to fix this one line with regards to TAB2, should we increase "work_mem" here? *- The index scan on SCHEMA1."TAB2" has to check 2 million extra rows because "work_mem" is too small. Almost the complete time is spent there.* -> Parallel Bitmap Heap Scan on SCHEMA1.""TAB2"" TAB2 (cost=84860.50..13040301.00 rows=1175611 width=80) (actual time=713.054..26942.082 rows=956249 loops=5)" " Output: TAB2.TAB2_time, TAB2.PRI, TAB2.AID" Recheck Cond: (TAB2.MID = 'X'::numeric) Rows Removed by Index Recheck: 2137395 Filter: ((TAB2.TAB2_time >= '2024-01-01 00:00:00+00'::timestamp with time zone) AND (TAB2.TAB2_time <= '2024-01-31 00:00:00+00'::timestamp with time zone)) Heap Blocks: exact=5300 lossy=782577 Buffers: shared hit=1651569 read=2245157 I/O Timings: shared/local read=29063.286 Another point you mentioned as below , for this , should we run vacuum analyze on the table TAB4? *- You may be getting a bad plan, because the statistics on SCHEMA1.TAB4 are either out of date or not detailed enough, which makes PostgreSQL underestimate the result size.* Output: TAB4.PRI Buckets: 262144 (originally 2048) Batches: 1 (originally 1) Memory Usage: 13936kB Buffers: shared hit=134917 Worker 0: actual time=214.981..214.982 rows=27779 loops=1 Buffers: shared hit=27133 Worker 1: actual time=215.455..215.456 rows=27805 loops=1 Buffers: shared hit=27159 Worker 2: actual time=215.774..215.774 rows=27330 loops=1 Buffers: shared hit=26706 Worker 3: actual time=215.776..215.777 rows=26880 loops=1 Buffers: shared hit=26245 -> Parallel Bitmap Heap Scan on SCHEMA1.TAB4 TAB4 (cost=26.39..7042.63 rows=1049 width=37) (actual time=23.650..201.606 rows=27613 loops=5) Output: TAB4.PRI Recheck Cond: ((TAB4.TAB4_code)::text = 'XX'::text) Rows Removed by Index Recheck: 616610 Heap Blocks: exact=11978 lossy=15624 Buffers: shared hit=134917 Worker 0: actual time=20.627..199.852 rows=27779 loops=1 Buffers: shared hit=27133 Worker 1: actual time=21.065..199.786 rows=27805 loops=1 Buffers: shared hit=27159 Worker 2: actual time=21.445..198.582 rows=27330 loops=1 Buffers: shared hit=26706 Worker 3: actual time=21.470..195.915 rows=26880 loops=1 Buffers: shared hit=26245 -> Bitmap Index Scan on TAB4_idx1 (cost=0.00..25.95 rows=1784 width=0) (actual time=23.938..23.938 rows=138067 loops=1) Index Cond: ((TAB4.TAB4_code)::text = 'XX'::text) Buffers: shared hit=72 > >
Question on partitioning
Hello All, In postgresql, Is it possible to partition an existing nonpartitioned table having data already residing in it and indexes and constraints defined in it, without the need of manually moving the data around, to make it faster? Similarly merging multiple partitions to one partition or splitting a single partition into multiple partitions? Appreciate any reference doc or sample syntax for the same. Thanks and Regards Veem
Why doesn't Postgres apply limit on groups when retrieving N results per group using WHERE IN + ORDER BY
This is related to a question I asked on dbs.stackexchange.com: https://dba.stackexchange.com/questions/335501/why-doesnt-postgres-apply-limit-on-groups-when-retrieving-n-results-per-group But to reiterate - I have a query like this: SELECT "orders".* FROM "orders" WHERE (user_id IN ?, ?, ?) ORDER BY "orders"."created_at" LIMIT 50 I have two indexes - `(user_id)` and `(user_id, created_at)`. Only the first index is ever used with this query. If I query for a specific user_id, as expected it uses the composite index. If I use an inner select and trick the query planner into doing a nested loop for each user, it no longer uses the composite index. I can understand that the query planner would not know ahead of time which users would have those 50 newest orders. I imagined that it would be clever enough to determine that only 50 results are needed, and that it could use the `(user_id, created_at)` index to get 50 orders for each user. Then sort and filter those few hundred results in memory. Instead what I'm seeing is that it gets all orders for each user using the `user_id` index and then sorts/filters them all in memory. Here is an example query plan: Limit (cost=45271.94..45272.06 rows=50 width=57) (actual time=13.221..13.234 rows=50 loops=1) Buffers: shared hit=12321 -> Sort (cost=45271.94..45302.75 rows=12326 width=57) (actual time=13.220..13.226 rows=50 loops=1) Sort Key: orders.created_at Sort Method: top-N heapsort Memory: 36kB Buffers: shared hit=12321 -> Bitmap Heap Scan on orders orders (cost=180.85..44862.48 rows=12326 width=57) (actual time=3.268..11.485 rows=12300 loops=1) Recheck Cond: (orders.user_id = ANY ('{11,1000,3000}'::bigint[])) Heap Blocks: exact=12300 Buffers: shared hit=12321 -> Bitmap Index Scan on index_orders_on_user_id (cost=0.00..177.77 rows=12326 width=0) (actual time=1.257..1.258 rows=12300 loops=1) Index Cond: (orders.user_id = ANY ('{11,1000,3000}'::bigint[])) Buffers: shared hit=21 Planning: Buffers: shared hit=6 Execution time: 13.263 ms The table I'm querying has roughly 50,000,000 orders, with an even distribution of ~4000 orders per user. I have found that I can speed this up significantly using CROSS JOIN LATERAL and it will use the composite index, but I'm struggling to understand WHY the CROSS JOIN LATERAL is needed here for it to use the index. I've tried tweaking costs, disabling bitmap scans, etc, so it seems like this is a functional limitation rather than something to do with cost/statistics. So my question is twofold: - why doesn't Postgres use the composite index, and then retrieve only the minimum necessary amount of rows (50 per user) using the query I posted above? - If it is a functional limitation, is it lack of implementation, or is there a deeper incompatibility with how the query planner works that would prevent it from being able to do this? Thanks!
Re: Question on partitioning
On Mon, 5 Feb 2024 at 17:52, Laurenz Albe wrote: > On Mon, 2024-02-05 at 03:09 +0530, veem v wrote: > > In postgresql, Is it possible to partition an existing nonpartitioned > table having data > > already residing in it and indexes and constraints defined in it, > without the need of > > manually moving the data around, to make it faster? Similarly merging > multiple partitions > > to one partition or splitting a single partition into multiple > partitions? > > There is no way to do that. > > Yours, > Laurenz Albe Thank you very much Laurenz. Actually in other databases (for example like in Oracle) there exists sql syntax to split one partition into multiple and merge multiple partitions back to one. So I was hoping there may be some way to do it in postgres. Anyway, thanks for clarifying my doubt on this. Which means for any such operation we need to create a new partition table with that structure and load that with the data from the existing partition table. Also for making or converting a non partition table to a partitioned one , we have to first create the blank partition table structure as per our need and then pump the data from the non partition table to the newly created partitioned table, which means we need to take some downtime to switch from non partitioned table to partitioned one. Please correct if wrong. Additionally I see a lot of other restrictions like 1)When creating indexes on the partition table "concurrently" keywords are not allowed. 2)While creating foreign key , it does not allow a "not valid" clause if the table is partitioned. 3) While creating indexes on this table or running any ALTER command, the SELECT queries running from other processes run longer. Does it take any lock while doing DDL on the base table in postgres? On Mon, 5 Feb 2024 at 17:52, Laurenz Albe wrote: > On Mon, 2024-02-05 at 03:09 +0530, veem v wrote: > > In postgresql, Is it possible to partition an existing nonpartitioned > table having data > > already residing in it and indexes and constraints defined in it, > without the need of > > manually moving the data around, to make it faster? Similarly merging > multiple partitions > > to one partition or splitting a single partition into multiple > partitions? > > There is no way to do that. > > Yours, > Laurenz Albe >
Re: How to do faster DML
I have not much experience with postgres on this scale though, others may provide better suggestions. However, with this scale you will have a single table size ~30TB+. And someone just pointed out in this thread, ~180 partitions/table as the good to have limit, and if that is true it would be ~170GB per partition. Looks bulky for a system where readresponse time is expected to be in milliseconds. On Mon, 5 Feb 2024 at 16:51, Lok P wrote: > > > On Mon, 5 Feb, 2024, 2:21 am Lok P, wrote: > > > On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane > wrote: > > Partitioning will depend a lot on how you access the data. Is it always > using that transaction_date column in the where clause? Can you share the > table definition via \d? We've talked about this table quite a bit, but not > actually seen what it looks like. We are flying blind a little bit. You > mentioned your queries are slow, but can you share an explain analyze on > one of these slow queries? > > 45 billion rows is quite manageable. How many rows are in each day? You > may want to do a larger bucket than 24 hours per partition. > > 1)Even after partitioning the target table , to speed up the data load on > this table , Is there an option to disable the primary and foreign keys and > re-enable them post data load finishes. Will that be a good idea > > > No. > > 3)As the size of the table or each partition is going to be very large and > this will be a read intensive application, compressing the historical > partition will help us save the storage space and will also help the read > queries performance > > > I am not sure what this means. If you are not going to need the data > anymore, dump the data to deep storage and drop the partition. > > Cheers, > Greg > > > > Thank you. > > The table has ~127 columns of different data types , combinations of > Numeric, varchar, date etc. And is having current size ~1TB holding > ~3billion rows currently and the row size is ~300bytes. > > Currently it has lesser volume , but in future the daily transaction per > day which will be inserted into this table will be Max ~500million > rows/day. And the plan is to persist at least ~3months of transaction data > which will be around 45billion rows in total. And if all works well , we > may need to persist ~6 months worth of data in this database in future and > that will be ~90 billion. > > This table will always be queried on the transaction_date column as one of > the filters criteria. But the querying/search criteria can span from a day > to a month worth of transaction date data. > > When you said "*You may want to do a larger bucket than 24 hours per > partition.*", do you mean to say partition by weekly or so? Currently as > per math i.e. 1TB of storage for ~3billion rows. So the daily range > partition size( to hold ~500million transactions/day) will be around > ~100-150GB. Won't that be too much data for a single partition to operate > on, and increasing the granularity further(say weekly) will make the > partition more bulkier? > > What I mean was, we will definitely need the data for querying purposes by > the users, but just to keep the storage space incontrol (and to help the > read queries), we were thinking of having the historical partitions > compressed. And for that , if any compression strategy should we follow on > postgres? > > With regards to loading data to the table faster, wondering why you said ' > *NO*' to load the data first and enabling/Creating the Primary key and > Foreign key constraint later approach. Because this table is a child table > and the parent is already having data in it, loading data to this table in > presence of PK and FK makes it too slow as it tries to validate those for > each set of rows. So we were thinking if doing it at a later stage at > oneshot will be a faster approach. Please suggest. > > I will try to collect some SELECT query and post the explain analyze. > Currently we are trying to get rid of the duplicates. > > Regards > Lok > > > Any thoughts, based on above usage pattern? > > While I did the maths based on the current stats with limited data sets. > The partitions size coming as 100 to 150gb as I explained above, if we keep > it daily range partition. Should we have to make it hourly then? > > So want some experts view if this going to work fine for a read latency > intensive applications or we should have some different strategy? >
Re: How to do faster DML
On Tue, 6 Feb 2024 at 10:45, Lok P wrote: > Thank you Greg. > > *"and keeping all your active stuff in cache. Since you have 127 columns, > only pull back the columns you absolutely need for each query."* > > Understood the point. As postgres is a "row store" database, so keeping > the size of the row lesser by making the number of columns to minimum, > should definitely help as more number of rows can be packed then into one > page or block (which I believe is 8KB default size per block). And that > mean more number of blocks can be cached while reading, so better cache hit > ratio. > > As you rightly pointed out, Another thing I noticed the shared_buffer > parameters set as 2029684 in this instance, which comes to ~21MB and that > seems very small for a database operating in large scale. And I see we have > RAM in the instance showing as ~256GB. So thinking of bumping it to > something as ~40-50GB. Hope that will help to some extent. Not sure if > there is methods to manually, cache some objects(tables/indexes) which > were getting used frequently by the read queries. > > > On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane > wrote: > >> >> So when you said *"In that case, and based on the numbers you provided, >>> daily partitioning seems a decent solution."* >>> , does it mean that size of the partition (which will be ~100-150GB per >>> partition here) is not an issue for the postgres to serve read latency >>> sensitive queries? >>> >> >> Yes, that should be fine. At the end of the day, what's going to be more >> important is making sure you are using good indexing, and keeping all your >> active stuff in cache. Since you have 127 columns, only pull back the >> columns you absolutely need for each query. >> >> Cheers, >> Greg >> >> > Good discussion!! Don't want to divert the thread. And apology for comparing with another RDBMS, but curious to know the difference in behaviour. With respect to the storage part:- In Oracle database we were supposed to keep the frequently accessed column first and less frequently accessed columns towards last in the table while creating table DDL. This used to help the query performance while accessing those columns as it's a row store database. Are there any such performance implications exists in postgres? And there the data types like varchar used to only occupy the space which the real data/value contains. But here in postgres, it seems a bit different as it has significant overhead in storage, as these seem like being fixed length data types and will occupy the space no matter what the read data holds. So curious to know, if in this case will there be any performance impact accessing those columns, if such a column which is frequently accessed but has been put towards the end of the table because of storage space saving?
Re: Question on partitioning
Thank you Laurenz. Got it. So basically , you mean to say any DDL on a table won't allow the table to be read by other processes. I was under the assumption that it should allow the read queries to move ahead at least. I must be wrong here. Thanks for correcting me. On Tue, 6 Feb 2024 at 15:46, Laurenz Albe wrote: > On Tue, 2024-02-06 at 00:26 +0530, veem v wrote: > > On Mon, 5 Feb 2024 at 17:52, Laurenz Albe > wrote: > > > On Mon, 2024-02-05 at 03:09 +0530, veem v wrote: > > > > In postgresql, Is it possible to partition an existing > nonpartitioned table having data > > > > already residing in it and indexes and constraints defined in it, > without the need of > > > > manually moving the data around, to make it faster? > > > > > > There is no way to do that. > > > > Which means for any such operation we need to create a new partition > table with that > > structure and load that with the data from the existing partition table. > > You could use logical replication to keep the downtime short. > > > Additionally I see a lot of other restrictions like > > 1)When creating indexes on the partition table "concurrently" keywords > are not allowed. > > That is no problem. > You create the index on ONLY the partitioned table, > then you create indexes CONCURRENTLY on the partitions, > then you attach these indexes as partitions to the index you created in > the beginning. > As soon as the last index is attached, the index on the partitioned table > becomes valid. > > > 2)While creating foreign key , it does not allow a "not valid" > clause if the table > > is partitioned. > > Create the foreign key on the partitions; the effect is the same. > > > 3) While creating indexes on this table or running any ALTER command, > the SELECT > >queries running from other processes run longer. Does it take any > lock while > >doing DDL on the base table in postgres? > > I am not sure what the "base table" is. Most DDL always locks the table in > ACCESS EXCLUSIVE mode; that is not specific to partitioned tables. > > Yours, > Laurenz Albe >
Re: Partitioning options
On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane wrote: > On Thu, Feb 8, 2024 at 12:42 AM sud wrote: > ... > >> The key transaction table is going to have ~450 Million transactions per >> day and the data querying/filtering will always happen based on the >> "transaction date" column. >> > ... > >> Should we go for simple daily range partitioning on the transaction_date >> column? >> > > This one gets my vote. That and some good indexes. > > Cheers, > Greg > > Hello Greg, Out of curiosity, As OP mentioned that there will be Joins and also filters on column Customer_id column , so why don't you think that subpartition by customer_id will be a good option? I understand List subpartition may not be an option considering the new customer_ids gets added slowly in the future(and default list may not be allowed) and also OP mentioned, there is skewed distribution of data for customer_id column. However what is the problem if OP will opt for HASH subpartition on customer_id in this situation? Is it because the number of partitions will be higher i.e. If you go with simple range partitioning, for 5 months you will have ~150 daily range partitions and with each index the count of partition will gets double, for e.g if you will have 10 indexes, the total partitions will be = ~150 table partition+ (10*150)index partition= 1650 total number of partitions. If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 8, hash sub-partitions , then the total number of partitions will be = (8*150) table partitions+ (8*150*10) index partitions= ~13200 partitions. Though there are no theoretical limits to the number of partitions in postgres, there are some serious issues noted in the past with higher number of table partitions. One such is below. Is this the reason? https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits Regards Veem
How should we design our tables and indexes
Hello, We want to have the response time in <1 sec for our UI search query requirement. These will be pagination queries. These read queries will be on big transaction tables (will have ~500+ attributes approx will have approx. rows size of ~1KB) having a continuous stream of inserts consumed from source. And these tables will be a daily range partitioned on the processing_date column. Each partition is going to hold approx ~450million rows when it will serve in full capacity to all the customers transactions. The customer will be given the capability to search on a Max date range of 30 days of transaction data i.e ~30 range partitions and are supposed to get the query response back in <1 sec as it will be UI from which those queries will be submitted. 1)Is there any way in postgres to influence the optimizer for the "first_row" optimization, so that it won't go for evaluating all the rows from such UI search queries. As because these will be pagination queries and the user will be interested in seeing top 100 transactions in the first page asap? 2) For e.g below is one sample query from a similar system but in a different database. Want to understand , what would be the appropriate indexes to make this above search query run in the quickest possible time? one Index on table1(MID) , one index Table1(CID), one index on table2(ACN_NBR)? OR Should we create a composite index here combining PR_ID i.e (PR_ID, MID), (PR_ID, CID), (PR_ID, ACN_NBR) as that is the most unique attribute here? select count(*) over() as total_record, * from (select ... from TABLE1 Left join schema1.TABLE2 on TABLE2.PR_ID = TABLE1.PR_ID and TABLE2.MID = TABLE1.MID and TABLE2.processing_date=TABLE1.processing_date where TABLE2.processing_date between '2023-04-20' and '2023-05-21'-- Partition pruning and TABLE2.ACN_NBR = '' and ( TABLE1.MID in (XX) OR TABLE1.CID in (XX)) order by TABLE1.PR_TIME DESC ) limit 100 offset 0; The data pattern for the columns used in predicate are as below:- Table1 will be the driving table. count(distinct ACN_NBR) - 25million count(distinct MID) - 223k count(distinct CID) - 59k count(*)from table1 and table2- ~350 million PR_ID is a unique key. 3)One of the use cases is that the customer should be able to search on certain attributes and should be able to see the transactions in "desc by processing_date" i.e. latest transactions on the first page on the UI. And in such scenario, if the search attribute is less unique and the customer puts a date range of a month i.e. over 30 partitions , it may results in scanning and sorting billions of rows to get the top/recent ~100 transactions and most likely not going to respond back in <1 sec, even goes for the index. So how should we handle or design indexes for catering such queries? For e.g. if we have the only filter on column "TABLE1.CID" in the above query, which is very less unique then how to handle it? 4)And finally, the parameter "settings" which we have currently is as below for the current system. So I wanted to understand, if they are accurate or any oddity and we should change those to cater such requirements? For e.g. if we set "max_parallel_workers_per_gather"=4 to speed up the queries, then we will be able to serve only 32/4=8 concurrent user requests at any point in time. If we are targeting to serve ~100 concurrent users , will it be advisable to change or we should test the system with default i.e. not setting this parallel parameter? *Settings: * effective_cache_size = '176237472kB', maintenance_io_concurrency = '1', max_parallel_workers = '32', max_parallel_workers_per_gather = '4', search_path = 'public, public, "$user"', temp_buffers = '16MB', work_mem = '2GB', enable_partitionwise_join = 'on' Regards Veem
Re: How should we design our tables and indexes
Thank you So much Greg. Will try to test the things as max as possible. I was trying to see basically, if any obvious things we should take care of before designing a system for satisfying such requirements. As you pointed few things , i am trying t answer those below On Sun, 11 Feb 2024 at 10:43, Greg Sabino Mullane wrote: > There is a lot to unpack here. I'm going to take a quick pass, but you > ought to consider getting some custom expert help. > > On Sat, Feb 10, 2024 at 2:39 PM veem v wrote: > >> >> Pagination is already a hard problem, and does not even make sense when > combined with "a continuous stream of inserts". What should the user see > when they click on page 2? > When the user clicks to the second page , it will see the next set of rows i.e 100 to 200 and next will see 200 to 300 and so on till the result set finishes. > Also, 500 attributes?! What data types are those? If boolean, you may want > to look at using bitfields. > >> Using LIMIT does allow for this, with certain caveats. The best answer >> for a lot of these questions is "try it and see" with a simplified version >> of your queries against some dummy data. >> > > All those attributes are majorly Varchar and numeric in nature , so not sure if any options exist there for these? Additionally, in other databases like Oracle we use hints like '/*+first_rows*/ to let optimizers favour index paths to favor these types of UI search queries , so I was wondering if anything like those exists over here. > one Index on table1(MID) , one index Table1(CID), one index on >> table2(ACN_NBR)? >> > > This. One index for each important column. Especially if they vary between > queries, as you alluded to later. > > If PR_ID is a must in the Join criteria between these table tables table1, table2 in all the queries, then is it advisable to have a composite index like (pr_id, mid), (pr_id,cid) etc rather than having index on individual columns? > select count(*) over() as total_record, * [large query omitted] >> > > Queries starting with select count(*) are also a red flag, but it's hard > to tell from this. Probably best to explain what you think the query is > doing using regular words. > > Actually this inner query is doing the main work, i.e finding the search results based on the input search criteria. The outer query is just fetching the results from the inner query along with count(*), to pass on to the API , so as to calculate and show the user how many pages there total with a full result set. basically it will count(*)/N records per page, and that figure will be displayed in the first page of the UI screen. > 3)One of the use cases is that the customer should be able to search on >> certain attributes and should be able to see the transactions in "desc by >> processing_date" i.e. latest transactions on the first page on the UI. And >> in such scenario, if the search attribute is less unique and the customer >> puts a date range of a month i.e. over 30 partitions , it may results in >> scanning and sorting billions of rows to get the top/recent ~100 >> transactions and most likely not going to respond back in <1 sec, even goes >> for the index. So how should we handle or design indexes for catering such >> queries? For e.g. if we have the only filter on column "TABLE1.CID" in the >> above query, which is very less unique then how to handle it? >> > > Well, there is only so much a database can do if you are pulling from 500 > different attributes. But the whole purpose of an indexed column is to > prevent having to scan *all* the rows. If it's a common value, then > hopefully there is something else in the where clause more specific that is > also indexed. As mentioned above, a LIMIT and ORDER BY, with the > appropriate indexes, can return early. > > So here in case , if there is just the filter on column CID and the join condition on PR_ID column between the two tables, No other filter exists to help make the result set smaller. Lets say the query really results in millions of rows. And it's because the CID column is having very less distinct values. Also the results are getting ordered on the PR_TIME column to show the latest transactions in the first page. So indexes on CID won't be very effective. So is there any option to index any other way or a different set of columns to make this type of query run faster (not sure if including the column which is there in order by clause 'pr_time' in the index will make a difference) ? Or any other way to store the results in a sorted way already in the table, so as to not do heavy sorting to show the first 100 rows to the customer out of millions satisfying the search criter
Re: How to do faster DML
Thank You so much for the detailed explanation. On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer wrote: > > Yes. Numbers in Oracle are variable length, so most Oracle tables > wouldn't contain many fixed length columns. In PostgreSQL must numeric > types are fixed length, so you'll have quite a lot of them. > > So it means , say in other databases like (oracle database), we were careless choosing the data length , say for example Varchar2(4000), if the real data which is inserted into the table holds a varchar string of length 20 bytes then Oracle trimmed it to occupy the 20 bytes length only in the storage. but in postgre here we need to be cautious and define the length as what the data attribute can max contains , because that amount of fixed space is allocated to every value which is inserted into the table for that attribute/data element. Similarly for Number/Numeric data type. Please correct if my understanding is wrong. Regards Veem
Re: How to do faster DML
On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer wrote: > > Similarly for Number/Numeric data type. > > Number in Oracle and numeric in PostgreSQL are variable length types. > But in PostgreSQL you also have a lot of fixed length numeric types > (from boolean to bigint as well as float4 and float8) and you would > normally prefer those over numeric (unless you really need a decimal or > very long type). So padding is something you would encounter in a > typical PostgreSQL database while it just wouldn't happen in a typical > Oracle database. > > > When you said *"you would normally prefer those over numeric " *I was thinking the opposite. As you mentioned integer is a fixed length data type and will occupy 4 bytes whether you store 15 or .But in case of variable length type like Number or numeric , it will resize itself based on the actual data, So is there any downside of going with the variable length data type like Numeric, Varchar type always for defining the data elements? Regards Veem
Re: How to do faster DML
Thank you . On Mon, 12 Feb 2024 at 03:52, Peter J. Holzer wrote: > On 2024-02-11 12:08:47 -0500, Ron Johnson wrote: > > On Sun, Feb 11, 2024 at 11:54 AM veem v wrote: > > When you said "you would normally prefer those over numeric " I was > > thinking the opposite. As you mentioned integer is a fixed length > data type > > and will occupy 4 bytes whether you store 15 or .But in case > of > > variable length type like Number or numeric , it will resize itself > based > > on the actual data, So is there any downside of going with the > variable > > length data type like Numeric, > > > > > > Consider a table with a bunch of NUMERIC fields. One of those records > has > > small values (aka three bytes). It fits neatly in 2KiB. > > > > And then you update all those NUMERIC fields to big numbers that take 15 > > bytes. Suddenly (or eventually, if you update them at different times), > the > > record does not fit in 2KiB, and so must be moved to its own.page. > That causes > > extra IO. > > I think that's not much of a concern with PostgreSQL because you can't > update a row in-place anyway because of MVCC. So in any case you're > writing a new row. If you're lucky there is enough free space in the same > page and you can do a HOT update, but that's quite independent on > whether the row changes size. > > > Good to know. So it means here in postgres, there is no such concern like "row chaining", "row migration" etc. which we normally have in a non mvcc database (like Oracle say). And there its not advisable to have more than ~255 columns in a table even its technically possible. And if such requirement arises, we normally break the table into 2 different tables with some columns in common to join them. https://jonathanlewis.wordpress.com/2015/02/19/255-columns/ So we were thinking, adding many column to a table should be fine in postgres (as here we have a use case in which total number of columns may go till ~500+). But then, considering the access of columns towards the end of a row is going to add more time as compared to the column which is at the start of the row. As you mentioned, accessing 100th column may add 4 to 5 times more as compared to the access of first column. So , is it advisable here to go for similar approach of breaking the table into two , if the total number of column reaches certain number/threshold for a table? Regards Veem
Re: How should we design our tables and indexes
Thank You. On Mon, 12 Feb 2024 at 22:17, Greg Sabino Mullane wrote: > Sure will try to test and see how it behaves when the number of >> simultaneous queries (here 32/4=8 concurrent queries) exceed the >> max_parallel_workers limit. Though I am expecting the further queries >> exceeding the limit might get serialized. >> > > Yes - if there are not enough workers available, it will run with a > reduced number of workers, including possibly zero. You can see that when > you run an explain analyze, it will show you the number of workers it wants > and the number if actually was able to get. > > Thank you . Got the point. If these quick queries ran within a certain time frame (say for a duration of ~1hr) and few of the executions ran longer(which might be because of the less parallel workers as the max limit exhausted because of concurrent executions, OR It may be because of the change in the execution path for certain execution of the queries. Is there any way to track those historical executions and be able to find the exact root cause of the slow executions confidently? Regards Veem
Re: How to do faster DML
Thank you so much for the clarification. On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer wrote: > The fixed width types are those that the CPU can directly process: > Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64 > bits. The CPU can read and write them with a single memory access, it > can do arithmetic with a single instruction, etc. > > Number/Numeric are not native types on any CPU. To read them the CPU > needs several memory accesses (probably one per byte unless you get > really clever) and then it can't do any calculations with them > directly, instead it has run a subroutine which does operations on > little chunks and then puts those chunks together again - basically the > same as you do when you're doing long addition or multiplication on > paper. So that's not very efficient. > > So it looks like the fixed length data type(like integer, float) should be the first choice while choosing the data type of the attributes wherever possible, as these are native types. (Like choosing "Integer/float" over "Numeric", "Char" over "Varchar" etc). However I do see even in Oracle databases, we have Integer type too, but it's suggesting(For e.g. in below blog) to rather go with Number types over Integer and Varchar2 over Char, which is opposite of what we are discussing here. Is the fixed length data type behaves differently in postgres vs oracle and thus should be treated differently? https://www.databasestar.com/oracle-data-types/ >From above blog:- *When to use CHAR: There should be no reason to use the CHAR data type, as it is similar to a VARCHAR2 and it’s better to be consistent.* *When to use INTEGER: You should use the NUMBER data type instead.* Regards Veem
Re: How to do faster DML
On Tue, 13 Feb 2024 at 02:01, Ron Johnson wrote: > On Mon, Feb 12, 2024 at 3:23 PM veem v wrote: > [snip] > >> So it looks like the fixed length data type(like integer, float) should >> be the first choice while choosing the data type of the attributes >> wherever possible, as these are native types. >> > > Correct. > > >> (Like choosing "Integer/float" over "Numeric", "Char" over "Varchar" >> etc). >> > However I do see even in Oracle databases, we have Integer type too, but >> it's suggesting(For e.g. in below blog) to rather go with Number types over >> Integer and Varchar2 over Char, which is opposite of what we are discussing >> here. Is the fixed length data type behaves differently in postgres vs >> oracle and thus should be treated differently? >> >> https://www.databasestar.com/oracle-data-types/ >> > >> >> From above blog:- >> > > Oracle is not Postgresql. WTH are you using an Oracle blog to decide on > Postgresql data types > > >> >> >> *When to use CHAR: There should be no reason to use the CHAR data type, >> as it is similar to a VARCHAR2 and it’s better to be consistent.* >> *When to use INTEGER: You should use the NUMBER data type instead.* >> >> > Did you actually read that blog post? > > Have you even read the Postgresql documentation on data types? > > My Apology, If interpreting wrongly. My thought was that , as fixed length data types are native type ones, ideally it should be faster in all the databases. So was comparing with different databases. And mainly as I had worked mainly in Oracle database in the past and so it was a key learning here and seemed totally opposite, so was curious to know. Regards Veem
Re: How should we design our tables and indexes
On Tue, 13 Feb 2024 at 20:59, Peter J. Holzer wrote: > For some kinds of queries a composite index can be dramatically faster. > While Postgres can combine indexes that means scanning both indexes and > combining the result, which may need a lot more disk I/O than scanning a > composite index. Indeed, in the cases where a composite index would be > useful but doesn't exist, PostgreSQL usually just chooses the best of > the single column indexes and ignores the rest. > > That said, my rule of thumb is to create just single column indexes at > first and only create composite indexes if they are necessary. > > Thank you so much. As I understand optimizer uses indexed column as "access criteria" and rest of the predicate as "filter criteria" while evaluating the query predicate. And if the majority of the rows are getting eliminated in the filtered step , that means adding that filtered criteria column to the index could give us better performance. So I was trying to understand say in below query having TABLE1 as driving table ( if we forget about column selectivity for a moment), Can the optimizer, only scan the TABLE1 using ACCESS criteria " TABLE1.MID in ()" or "TABLE1.CID in ()" which will be catered by two different index i.e one index on column "MID" and other on column "CID"? OR It can utilize other columns as access criteria those used in join conditions like MID, PR_ID, in which case a composite index on the columns(CID,PR_ID) (MID, PR_ID) will provide better selectivity and faster access? Similarly for TABLE2 a composite index on (ACN_NBR,PR_ID,MID) or just an index on (ACN_NBR)? select ... from TABLE1 Left join schema1.TABLE2 on TABLE2.PR_ID = TABLE1.PR_ID and TABLE2.MID = TABLE1.MID and TABLE2.processing_date=TABLE1.processing_date where TABLE1.processing_date between '2023-04-20' and '2023-05-21' -- Considering processing_date here as partition key. and TABLE2.ACN_NBR = '' and ( TABLE1.MID in () OR TABLE1.CID in ()) order by TABLE1.PR_TIME DESC
Re: How to do faster DML
On Tue, 13 Feb 2024 at 20:32, Peter J. Holzer wrote: > Please do not conflate "char(n)" with native machine types like int or > float. These are very different things. A char(n) is string of fixed but > arbitrary length. This is not something a CPU can process in a single > instruction. It has to go over it character by character. > > There is almost never a reason to use char(n). Just use varchar(n) or in > the case of PostgreSQL just varchar or text. > > > However I do see even in Oracle databases, we have Integer type too, > > Not really. INTEGER is just an alias for NUMBER(38) in Oracle (see for > example > > https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html > ). > It's not the same as an INTEGER in PostgreSQL. > > Thank you so much for the clarification. Basically as i understood, we can follow below steps in serial, Step-1)First wherever possible use Smallint,Integer,bigint,float data types rather than numeric. This will give better performance. Step-2)Use the frequently queried columns first and least frequently queried columns towards last in the row while creating the table. This is too intended for better performance. Step-3)Define the columns with typlen desc as per below formula( column tetris symptom). This is for better storage space utilization. SELECT a.attname, t.typname, t.typalign, t.typlen FROM pg_class c JOIN pg_attribute a ON (a.attrelid = c.oid) JOIN pg_type t ON (t.oid = a.atttypid) WHERE c.relname = 'user_order' AND a.attnum >= 0 ORDER BY t.typlen DESC; One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the length of data keeps increasing (in case of a sequence generated PK column which will keep increasing), and we want to alter the column to "bigint" now. In such scenario, will it append/pad the additional spaces to all the existing values which were already stored with integer type initially in the table. And that would be then an increase to the existing table storage. Please correct me if I'm wrong. Regards Veem
Re: How to do faster DML
On Wed, 14 Feb, 2024, 10:30 am Ron Johnson, wrote: > On Tue, Feb 13, 2024 at 4:17 PM veem v wrote: > [sni[] > >> One question here, if we have defined one column as a fixed length data >> type "integer" and slowly we noticed the length of data keeps increasing >> (in case of a sequence generated PK column which will keep increasing), and >> we want to alter the column to "bigint" now. In such scenario, will it >> append/pad the additional spaces to all the existing values which were >> already stored with integer type initially in the table. And that would be >> then an increase to the existing table storage. Please correct me if I'm >> wrong. >> > > ALTER TABLE foo ALTER COLUMN bar BIGINT; rewrites the whole table. Do > yourself a favor, and start with BIGINT. > Not yet tried, but that's interesting. I was initially thinking as it will be increasing the length, so it would be just a metadata change and finish within seconds. But as you mentioned, it seems to be the effect of "fixed length data type" which is why it's going to rewrite whole table even we just increases the column length. Hope it won't be the case in variable length data type. >
Re: How to do faster DML
On Thu, 15 Feb 2024 at 00:43, Adrian Klaver wrote: > It depends: > > https://www.postgresql.org/docs/current/sql-altertable.html > > "Adding a column with a volatile DEFAULT or changing the type of an > existing column will require the entire table and its indexes to be > rewritten. As an exception, when changing the type of an existing column, > if the USING clause does not change the column contents and the old type > is either binary coercible to the new type or an unconstrained domain over > the new type, a table rewrite is not needed. However, indexes must always > be rebuilt unless the system can verify that the new index would be > logically equivalent to the existing one. For example, if the collation for > a column has been changed, an index rebuild is always required because the > new sort order might be different. However, in the absence of a collation > change, a column can be changed from text to varchar (or vice versa) > without rebuilding the indexes because these data types sort identically. > Table and/or index rebuilds may take a significant amount of time for a > large table; and will temporarily require as much as double the disk space. > > " > > > create table int_test(int_fld integer); > > insert into int_test select * from generate_series(1, 1, 1); > > > select ctid, int_fld from int_test ; > > ctid | int_fld > --+- > (0,1)| 1 > (0,2)| 2 > (0,3)| 3 > (0,4)| 4 > (0,5)| 5 > (0,6)| 6 > (0,7)| 7 > (0,8)| 8 > (0,9)| 9 > (0,10) | 10 > > > alter table int_test alter column int_fld set data type bigint; > > select ctid, int_fld from int_test ; > > ctid | int_fld > --+- > (0,1)| 1 > (0,2)| 2 > (0,3)| 3 > (0,4)| 4 > (0,5)| 5 > (0,6)| 6 > (0,7)| 7 > (0,8)| 8 > (0,9)| 9 > (0,10) | 10 > > update int_test set int_fld = int_fld; > > select ctid, int_fld from int_test order by int_fld; > > (63,1)| 1 > (63,2)| 2 > (63,3)| 3 > (63,4)| 4 > (63,5)| 5 > (63,6)| 6 > (63,7)| 7 > (63,8)| 8 > (63,9)| 9 > (63,10) | 10 > > > Where ctid is: > > https://www.postgresql.org/docs/current/ddl-system-columns.html > > > Thank you so much. So as I also tested the same as you posted, there has been no change in "ctid" , when I altered the column data type from 'int' to 'bigint' in the table, so that means full table rewriting won't happen in such a scenario. Regards Veem
Re: How to do faster DML
On Thu, 15 Feb 2024 at 22:40, Adrian Klaver wrote: > On 2/15/24 09:00, Greg Sabino Mullane wrote: > > On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver > > mailto:adrian.kla...@aklaver.com>> wrote: > > > > That is a mixed bag: > > > > > > Ha! Good point. Our contrived example table does suffer from that, so > > perhaps the test should be: > > > > create table int_test(c1 int, c2 int); > > Alright now I see: > > test=# create table int_test(c1 int, c2 int); > CREATE TABLE > > test=# select pg_relation_filenode('int_test'); > pg_relation_filenode > -- > 70021 > (1 row) > > > test=# insert into int_test select a, a+1 from generate_series(1, > 1, 1) as t(a); > INSERT 0 1 > > test=# select pg_relation_size('int_test'); > pg_relation_size > -- > 368640 > (1 row) > > test=# alter table int_test alter column c2 set data type bigint; > ALTER TABLE > > test=# select pg_relation_filenode('int_test'); > pg_relation_filenode > -- > 70024 > (1 row) > > test=# select pg_relation_size('int_test'); > pg_relation_size > -- > 450560 > (1 row) > > Thank you. Did a similar test as below using DB fiddle. Same results for fixed length data type i.e the size is getting increased. However for variable length types (like numeric) , it remains the same, so it must be just metadata change and thus should be quick enough even for a big table. So one learning for me, i.e. one of the downside of fixed length data type is, with fixed length data types any future changes to it , will be a full table rewrite. And thus this sort of change for big tables will be a nightmare. https://dbfiddle.uk/_gNknf0D Regards Veem
Re: How to do faster DML
On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer wrote: > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer > wrote: > > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > > On Tuesday, February 13, 2024, veem v wrote: > > > > > > float data types rather than numeric. This will give better > > > performance. > > > > > > > > > Only use an inexact floating-point data type if you truly > > > understand what you are getting yourself into. Quickly getting > > > the wrong answer isn’t tolerable solution. > > > > Do NOT assume that a decimal type (even if it can grow to ridiculuous > > lengths like PostgreSQL's numeric) is exact in the mathematical > sense. > > It isn't. It cannot represent almost all real numbers > > > > > > Thank You. So it looks like the use cases where we don't need precision or decimal point values to be stored in postgres , integer data type is the way to go without a doubt. However in cases of precision is required, as you all mentioned there are certain issues(rounding error etc) with "Float" data type and considering a normal developers usage point of view, it should be the Numeric type which we should use. I think the consistent working or functionality of an application takes precedence over performance. And I believe , in most real life scenarios, when we need precisions we expect them to behave consistently across all the application and database platforms(mainly banking industries), and thus it seems Numeric data type is the safest one to use as a multi database platform type. Please correct me if I'm wrong. Regards Veem
Performance issue debugging
Hi All, As i understand we have pg_stats_activity which shows the real time activity of sessions currently running in the database. And the pg_stats_statement provides the aggregated information of the historical execution of all the queries in the database. But I don't see any sampling or timing information in those views. For example at a certain point in time in the past , what queries were getting executed in the database and overall wait events etc. So is there any other view which provides such information to dig into the past to diagnose any historical performance issues ? or should we create our own table and flush the information from the pg_stats_activity view to that with the current timestamp and that would be helpful in analyzing performance issues or any extension available for such? Also even the explain analyze can only provide the exact run time stats of a completed query. If we want to see what's going on for a long running query and at what step in the execution path the query is spending most resources and time when it keeps running in the database, is there any available option in postgres database? for e.g. in a SELECT query index access path if taking most of the time OR in an INSERT query INDEX block is causing contention while inserting data into the table , how would we be able to find that for a currently running query or a for a historical query? Regards Veem
Re: Performance issue debugging
On Fri, 23 Feb, 2024, 2:54 am Vick Khera, wrote: > On Thu, Feb 22, 2024 at 4:03 PM veem v wrote: > >> Hi All, >> As i understand we have pg_stats_activity which shows the real time >> activity of sessions currently running in the database. And the >> pg_stats_statement provides the aggregated information of the historical >> execution of all the queries in the database. But I don't see any sampling >> or timing information in those views. For example at a certain point in >> time in the past , what queries were getting executed in the database and >> overall wait events etc. >> >> So is there any other view which provides such information to dig into >> the past to diagnose any historical performance issues ? or should we >> create our own table and flush the information from the pg_stats_activity >> view to that with the current timestamp and that would be helpful in >> analyzing performance issues or any extension available for such? >> >> > Look at the auto_explain setting. Taking a random interval snapshot of > running queries likely will not teach you anything useful. > > >> Also even the explain analyze can only provide the exact run time stats >> of a completed query. If we want to see what's going on for a long running >> query and at what step in the execution path the query is spending most >> resources and time when it keeps running in the database, is there any >> available option in postgres database? for e.g. in a SELECT query index >> access path if taking most of the time OR in an INSERT query INDEX block is >> causing contention while inserting data into the table , how would we be >> able to find that for a currently running query or a for a historical query? >> > > You can see locking contention in the pg_locks table. In my experience I > rarely ever saw anything in there even when I was cranking tens of millions > of inserts and updates per day. I don't think there's anything for > historical queries or to probe anything more about a currently running > query's progress. > > Take some time to think about your queries and how you can reduce any > locking they need. If you do need some locking, consider using the FOR > UPDATE clause in SELECT to limit what you do lock. > Thank you. Actually in Oracle Database there used to be a view (v$active_session_history)which used to collect the stats of session activity in certain time period and from that we used to be able to track and debug historical performance issues at exact point in time. So I was thinking, if such thing not available by default in postgres, and if we can do it manually and that will be helpful in long-term.
Aligning grants and privileges
Hi, We have two different types of DBA group in current on-premise Oracle databases, one who deploy code in prod and highest level of privileges (i.e having write access to the database objects, performing backup recovery, export/import and performing other database maintenance jobs etc). and other groups, who only have read only privileges (select catalogue role i.e access to data dictionary views, run time and historical performance views along) mainly look into the performance issues and more alligns towards the functional part of the application , understands the database design working closely with Application Development team and they have read only access to the databases. They are expert in reading query execution path and have privilege to run the explain command, reading sqlmonitor report, creating profiles, baselines , adding hints to the query by understanding inefficient execution path, fetching AWR report, tracing sql queries, gathering object statistics, accessing OEM(oracle enterprise manager ) to monitor performance. Suggesting appropriate indexes and partitioning strategies for tables etc. Now that we are moving few of the applications to Postgres on-premise database and few to aws RDS postgres. What kind of privileges should we provide to these two groups? Mainly we don't want the performance guys to have the elevated privileges. I can understand , readonly access to the table is something we can give to those users. Apart from that, are there any specific privileges which we should provide, so as to enable look and debug into all types of performance issues without any obstruction and at the same time not giving elevated privileges? Regards Veem
Re: Aligning grants and privileges
On Wed, 28 Feb, 2024, 2:14 am Adrian Klaver, wrote: > On 2/27/24 12:40, veem v wrote: > > Hi, > > We have two different types of DBA group in current on-premise Oracle > > databases, one who deploy code in prod and highest level of privileges > > (i.e having write access to the database objects, performing backup > > recovery, export/import and performing other database maintenance jobs > etc). > > > > and other groups, who only have read only privileges (select catalogue > > role i.e access to data dictionary views, run time and historical > > performance views along) mainly look into the performance issues and > > more alligns towards the functional part of the application , > > understands the database design working closely with Application > > Development team and they have read only access to the databases. They > > are expert in reading query execution path and have privilege to run the > > explain command, reading sqlmonitor report, creating profiles, baselines > > , adding hints to the query by understanding inefficient execution path, > > fetching AWR report, tracing sql queries, gathering object statistics, > > accessing OEM(oracle enterprise manager ) to monitor performance. > > Suggesting appropriate indexes and partitioning strategies for tables > etc. > > > > Now that we are moving few of the applications to Postgres on-premise > > database and few to aws RDS postgres. What kind of privileges should we > > provide to these two groups? Mainly we don't want the performance guys > > to have the elevated privileges. I can understand , readonly access to > > the table is something we can give to those users. Apart from that, are > > there any specific privileges which we should provide, so as to enable > > look and debug into all types of performance issues without any > > obstruction and at the same time not giving elevated privileges? > > Read: > > https://www.postgresql.org/docs/current/predefined-roles.html > > > > > Regards > > Veem > > -- > Adrian Klaver > adrian.kla...@aklaver.com Thank you so much. All the reader roles mentioned in the above url looks appropriate for someone who needs to have high levels read privilege but no write privilege. Hope same available in RDS too. Vacuum analyze looks similar to stats gather , will those need special privilege or the read privilege like pg_read_all_data, pg_read_all_stats, pg_read_all_setting will be enough to be able to run vaccum on a table? Is there some facilities to trace a running session and see the trace files to analyze performance issues. And if any special role needed for that? Regards Veem
Re: Question on Table creation
On Wed, 28 Feb 2024 at 01:24, sud wrote: > While testing the pg_partman extension I see it by default creates a > "default partition" even if we provide the parameter p_start_partition as > "current date". But if someone purposely doesn't want to have the default > partitions , so as to avoid any unwanted data entering the table , which > may be required from a data quality perspective. Is there any option in the > create_parent function to achieve that? Or do we have to manually drop the > default partition after getting this created through create_parent function? > > I am not able to understand the significance of the "p_template_table" > parameter. When we create the partitions without passing this parameter the > template table gets created automatically in the partman schema. Checking > the details of the template table, it doesn't show any indexes or > constraints present in it, but still the child partitions get created with > all the necessary indexes and constraints as those are there defined for > the main table. So wondering , in what real life scenario do we really need > the template table to be defined and how will it help if the partitions are > different then from the main table structure? > > Lastly , the template table is by default created in the partman schema > but it also works without error, if we pass the template table to be > created in the application schema. So is there any downside of having the > template table reside in the application schema? > > As per my understanding , you can't control the creation of the default partition. If you somehow don't want to keep it then, you can drop it post creation of the partition for the first time i.e after the create_parent function call. Template_table is necessary when someone needs the child partitions to have a difference in structure as compared to main table like difference in indexes or constraints. But I think the number and data types of columns should be the same in main table and template tables. I have not tried though. I think keeping a template table in an application schema is not a good idea considering the fact that it's not a business table but a technical one, and nobody should accidentally alter this. > >
When manual analyze is needed
Hi, We see in one of the RDS postgres instances, from pg_stat_user_tables , the auto vacuum and auto analyze happening on the tables without our manual intervention. So is auto vacuum analyze is sufficient to make sure optimal stats and unbloated table structure in place or should we do it manually based on the type of tables like for e.g. if a table is having just insert only types and getting similar amount of data throughout the day 24*7 VS a table which is getting bulk data load once a day only VS a volatile table with truncate load kind of operation VS a table with heavy Update/deletes also happening along with inserts throughout the day 24*7. Will auto vacuum/analyze take care of all such tables, or we need to do it manually in certain scenarios? And if any suboptimal plan is taken by the optimizer for a specific query suddenly, because of the missing stats , how to catch that. Basically, how to get the old plan hash and get it compared with the current plan hash? Regards Veem
Re: When manual analyze is needed
On Mon, 4 Mar 2024 at 09:42, Laurenz Albe wrote: > > The only things that require manual ANALYZE are > > 1. partitioned tables (autoanalyze will collect statistics on the >partitions, but not the partitioned table itself) > > So the partitioned table stats is nothing but the rolledover stats of all the partitions. As you mentioned, autoanalyze only works for child partitions but not the partition tables, so does it mean we have to schedule some jobs manually(through some scheduletr like pg_cron) to analyze these partitioned tables at certain time intervals to keep those up to date for partitioned tables? And won't that scan all the child partitions again to have the stats aggregated/rolled over from all the child partitions level to the table level?
Re: When manual analyze is needed
Additionally if a query was working fine but suddenly takes a suboptimal plan because of missing stats , do we have any hash value column on any performance view associated with the queryid which we can refer to see past vs current plans difference and identify such issues quickly and fix it? I am not seeing any such column in pg_stat_activity or pg_stat_statements to hold hash value of the plan and also the query column is showing "" for many of the entries, why so?
Re: When manual analyze is needed
On Mon, 4 Mar 2024 at 21:46, Greg Sabino Mullane wrote: > On Mon, Mar 4, 2024 at 12:23 AM veem v wrote: > >> Additionally if a query was working fine but suddenly takes a >> suboptimal plan because of missing stats , do we have any hash value column >> on any performance view associated with the queryid which we can refer to >> see past vs current plans difference and identify such issues quickly and >> fix it? >> > > You can use auto_explain; nothing else tracks things at that fine a level. > You can use pg_stat_statements to track the average and max time for each > query. Save and reset periodically to make it more useful. > > https://www.postgresql.org/docs/current/auto-explain.html > > https://www.postgresql.org/docs/current/pgstatstatements.html > > Thank you so much Greg. That helps. We were planning to have the auto_explain extension added and set the log_min_duration to ~5 seconds and log_analyze to true. So that all the queries going above that time period will be logged and provide detailed information on the exact point of bottleneck. Will it be a good idea to set it on production DB which is a highly active database? or should we only have the extension added but only set the parameters while we debug some performance issue and then reset it back after we are done.
Not able to purge partition
Hello All, We created two partitioned tables one being parent and other child. Both are range partitioned and one table has a foreign key to the other parent table. Now , when we run the partition maintenance job to drop the older partitions which meet the partition retention criteria, it's not letting us drop any of the partitions from the parent table. Child table works fine though. And when we try using cascade , it drops the foreign key constraint from all the partitions. Can you please let me know how to make the partition maintenance work in such a scenario while maintaining the foreign key intact, as we are expecting the foreign key to be attached to the respective partition only but not the full table? And this partition maintenance job which runs through cron scheduler in an automated way is failing on a daily basis for us without being able to drop any of the historical partitions from the parent partitioned table. Below is the test case:- CREATE TABLE schema1.test_part_drop_parent ( c1 varchar(36) NOT NULL , c2_part_date timestamp with time zone NOT NULL , c3 numeric(12,0) NOT NULL , CONSTRAINT test_part_drop_parent_PK PRIMARY KEY (c1,c2_part_date) ) PARTITION BY RANGE (c2_part_date); CREATE INDEX idx1_test_part_parent ON schema1.test_part_drop_parent( C3 ASC); CREATE TABLE schema1.test_part_drop_child ( C1_child varchar(36) NOT NULL , C1 varchar(36) NOT NULL , c2_part_date timestamp with time zone NOT NULL , C3 numeric(12,0) NOT NULL , CONSTRAINT test_part_drop_child_PK PRIMARY KEY (C1_child,c2_part_date), CONSTRAINT test_part_drop_child_FK1 FOREIGN KEY (C1,c2_part_date) REFERENCES schema1.test_part_drop_parent(c1,c2_part_date) ) PARTITION BY RANGE (c2_part_date); CREATE INDEX test_part_drop_child_fk ON schema1.test_part_drop_child( c1, c2_part_date); CREATE INDEX test_part_drop_child_idx ON schema1.test_part_drop_child( c1_child, c2_part_date); select partman.create_parent( p_parent_table := 'schema1.test_part_drop_child', p_control := 'c2_part_date', p_type := 'native', p_interval := '1 day', p_premake := 5, p_start_partition => '2024-02-01 00:00:00' ); update partman.part_config set infinite_time_partitions = 'true' , retention = '1 months', retention_keep_table='false', retention_keep_index='false' where parent_table = 'schema1.test_part_drop_child'; select partman.create_parent( p_parent_table := 'schema1.test_part_drop_parent', p_control := 'c2_part_date', p_type := 'native', p_interval := '1 day', p_premake := 5, p_start_partition => '2024-02-01 00:00:00' ); update partman.part_config set infinite_time_partitions = 'true' , retention = '1 months', retention_keep_table='false', retention_keep_index='false' where parent_table = 'schema1.test_part_drop_parent'; select partman.run_maintenance('schema1.test_part_drop_child'); select partman.run_maintenance('schema1.test_part_drop_parent'); SQL Error [P0001]: ERROR: cannot drop table schema1.test_part_drop_parent_p2024_02_01 because other objects depend on it CONTEXT: SQL statement "DROP TABLE schema1.test_part_drop_parent_p2024_02_01" PL/pgSQL function drop_partition_time(text,interval,boolean,boolean,text,timestamp with time zone) line 213 at EXECUTE PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 195 at assignment DETAIL: constraint test_part_drop_child_fk1 on table schema1.test_part_drop_child depends on table schema1.test_part_drop_parent_p2024_02_01 HINT: Use DROP ... CASCADE to drop the dependent objects too. CONTEXT: PL/pgSQL function drop_partition_time(text,interval,boolean,boolean,text,timestamp with time zone) line 308 at RAISE PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 195 at assignment DETAIL: HINT: Where: PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 413 at RAISE
Re: Not able to purge partition
Also, i see, its not allowing to drop the parent table partition even all the respective child partitions are dropped and no child record exists in the child table. But this started working , when we detach parent partition and then drop the partitions. So it means if we just change the setup of the parent partition table in the part_config as retention_keep_table='true', retention_keep_index='true' and retention_schema=, it will work without giving error. And then we have to drop the partitions from that retention_schema through another procedure may be. Correct me if anyone has done with some other workaround. However , i have one question now, if we have child record exists in child partition table, it wont even allow the detach the respective parent partition, so is the "CALL partman.run_maintenance_proc()" automatically ensures the ordering of child and parent table which will be passed through the partition maintenance ? Or for that also we need to write our own procedure and schedule through the cron? Something as below, CREATE PROCEDURE part_maintenance_proc() LANGUAGE SQL AS $$ declare drop_partition_cursor CURSOR FOR table_name from information_schema.tables where table_name like '%test_part_drop_parent%' and table_schema like '%retention_schema%'; drop_partition_record RECORD; Begin partman.run_maintenance('cpod.test_part_drop_child'); partman.run_maintenance('cpod.test_part_drop_parent'); OPEN drop_partition_cursor loop FETCH NEXT FROM drop_partition_cursor INTO drop_partition_record; EXIT WHEN NOT FOUND; drop table drop_partition_record.table_name; end loop; close drop_partition_cursor; END; $$; SELECT cron.schedule('@hourly', ); On Thu, 21 Mar, 2024, 11:07 am veem v, wrote: > Hello All, > We created two partitioned tables one being parent and other child. Both > are range partitioned and one table has a foreign key to the other parent > table. Now , when we run the partition maintenance job to drop the older > partitions which meet the partition retention criteria, it's not letting us > drop any of the partitions from the parent table. Child table works fine > though. > > And when we try using cascade , it drops the foreign key constraint from > all the partitions. Can you please let me know how to make the partition > maintenance work in such a scenario while maintaining the foreign key > intact, as we are expecting the foreign key to be attached to the > respective partition only but not the full table? > > And this partition maintenance job which runs through cron scheduler in > an automated way is failing on a daily basis for us without being able to > drop any of the historical partitions from the parent partitioned table. > > Below is the test case:- > > CREATE TABLE schema1.test_part_drop_parent > ( > c1 varchar(36) NOT NULL , > c2_part_date timestamp with time zone NOT NULL , > c3 numeric(12,0) NOT NULL , > CONSTRAINT test_part_drop_parent_PK PRIMARY KEY (c1,c2_part_date) > ) PARTITION BY RANGE (c2_part_date); > > CREATE INDEX idx1_test_part_parent ON schema1.test_part_drop_parent( C3 > ASC); > > > CREATE TABLE schema1.test_part_drop_child > ( > C1_child varchar(36) NOT NULL , > C1 varchar(36) NOT NULL , > c2_part_date timestamp with time zone NOT NULL , > C3 numeric(12,0) NOT NULL , > CONSTRAINT test_part_drop_child_PK PRIMARY KEY (C1_child,c2_part_date), > CONSTRAINT test_part_drop_child_FK1 FOREIGN KEY (C1,c2_part_date) > REFERENCES schema1.test_part_drop_parent(c1,c2_part_date) > ) PARTITION BY RANGE (c2_part_date); > > CREATE INDEX test_part_drop_child_fk ON schema1.test_part_drop_child( c1, > c2_part_date); > > CREATE INDEX test_part_drop_child_idx ON schema1.test_part_drop_child( > c1_child, c2_part_date); > > > select partman.create_parent( >p_parent_table := 'schema1.test_part_drop_child', >p_control := 'c2_part_date', >p_type := 'native', >p_interval := '1 day', >p_premake := 5, >p_start_partition => '2024-02-01 00:00:00' > ); > > update partman.part_config set infinite_time_partitions = 'true' , > retention = '1 months', retention_keep_table='false', > retention_keep_index='false' > where parent_table = 'schema1.test_part_drop_child'; > > > select partman.create_parent( >p_parent_table := 'schema1.test_part_drop_parent', >p_control := 'c2_part_date', >p_type := 'native', >p_interval := '1 day', >p_premake := 5, >p_start_partition => '2024-02-01 00:00:00' > ); > > update partman.part_config set infinite_time_partitions =
Re: Not able to purge partition
On Thu, 21 Mar, 2024, 6:18 pm Laurenz Albe, wrote: What you *can* do is detach the partition and then drop it, but detatching will be slow because PostgreSQL has to check for referencing rows. The best solution is to create the foreign key *not* between the partitioned tables, but between the individual table partitions. That should be easy if you have the same partition boundaries for both. Then you can simply drop a partition from both tables at the same time. Thank you Laurenz. With the existing foreign key setup, the drop partitions works fine for the child tables when they are triggered through the cron job by calling "run_maintenance_proc". But its failing for parent table partition drop. And as you mentioned if we do the detach and then drop it works fine for the parent table. However, we are creating partitions using the create_parent procedure something as below ,and thus it creates the default template table based on the main table structure informations and also the subsequent partitions are created/dropped accordingly using the scheduled cron as below. So when you mentioned "*to create the foreign keys *not* between the * *partitioned** table but between the individual partitions*" , can that be done using the same "partman.create_parent" procedure and automated cron job schedule or has to be done any other way manually ? Additionally , do we have to give call to the cron job in orderly fashion, i.e child first and then parent table? As because, currently we were just , scheduling "run_maintenance_proc" once and all the partition tables maintenance(i.e. creation of new partition and dropping of old partitions) was being taken care automatically by that one call. select partman.create_parent( p_parent_table := 'schema1.test_part_drop_child', p_control := 'c2_part_date', p_type := 'native', p_interval := '1 day', p_premake := 5, p_start_partition => '2024-02-01 00:00:00' ); update partman.part_config set infinite_time_partitions = 'true' , retention = '1 months', retention_keep_table='true', retention_keep_index='true',retention_schema='public' where parent_table = 'schema1.test_part_drop_child'; SELECT cron.schedule('@hourly', );
Re: Not able to purge partition
On Thu, 21 Mar 2024 at 23:39, Laurenz Albe wrote: > On Thu, 2024-03-21 at 22:50 +0530, veem v wrote: > > So when you mentioned "to create the foreign keys *not* between the > > partitioned table but between the individual partitions" , can that > > be done using the same "partman.create_parent" procedure and automated > > cron job schedule or has to be done any other way manually ? > > I don't know the capabilities of partmen, but I would be surprised if > it could automatically create foreign keys on the partitions. > > Yes, the constraints on each of the child partitions and parent partitions were getting created automatically. As I see from information_schema.table_constraints, it shows one foreign key constraint in each of the child partitions created through the partman procedure. It works smoothly without any issue, until we noticed this issue while trying to purge the partition from the parent table partition. But I believe this extension is extensively used , so I'm just wondering if I am missing something here with regards to foreign key creation using this automated partition creation/partman extension functionality.
Re: Not able to purge partition
Can someone please confirm if this behavior of foreign key is expected with the partition created through partman extension and thus we need to have our manual process written for partition purge (in order of child first and then parent) , rather relying on partman partition maintenance to take care drop partition automatically for us? On Fri, 22 Mar, 2024, 12:42 am veem v, wrote: > On Thu, 21 Mar 2024 at 23:39, Laurenz Albe > wrote: > >> On Thu, 2024-03-21 at 22:50 +0530, veem v wrote: >> > So when you mentioned "to create the foreign keys *not* between the >> > partitioned table but between the individual partitions" , can that >> > be done using the same "partman.create_parent" procedure and automated >> > cron job schedule or has to be done any other way manually ? >> >> I don't know the capabilities of partmen, but I would be surprised if >> it could automatically create foreign keys on the partitions. >> >> > Yes, the constraints on each of the child partitions and parent partitions > were getting created automatically. As I see from > information_schema.table_constraints, it shows one foreign key constraint > in each of the child partitions created through the partman procedure. > > It works smoothly without any issue, until we noticed this issue while > trying to purge the partition from the parent table partition. But I > believe this extension is extensively used , so I'm just wondering if I am > missing something here with regards to foreign key creation using this > automated partition creation/partman extension functionality. > > >
Re: Not able to purge partition
Trying to consolidate the main questions here as below. 1)As we see having foreign key defined is making the detach partition run for minutes(in our case 5-10minutes for 60 million rows partition), so how to make the parent table partition detach and drop work fast in such a scenario while maintaining the foreign key intact? 2)As we are using pg_partman for maintaining the partitions, So do we need to schedule multiple crons (one for parent table and other for child table), so that first it will drop the child table partition and then the parent table partition(As because detaching/dropping parent partition first will error out) called in order? OR only one cron job call like below can do the job for us and make sure the dependency between the tables is taken care of automatically? SELECT cron.schedule('@daily',partman.run_maintenance());
Re: Not able to purge partition
On Sat, 23 Mar 2024 at 23:08, Laurenz Albe wrote: > On Sat, 2024-03-23 at 22:41 +0530, veem v wrote: > > 1)As we see having foreign key defined is making the detach partition run > > for minutes(in our case 5-10minutes for 60 million rows partition), so > > how to make the parent table partition detach and drop work fast in > such > > a scenario while maintaining the foreign key intact? > > I told you: don't do it. > Instead, use foreign keys between the partitions. > I am struggling to understand how to maintain those partitions then? As because we were planning to use pg_partman for creating and dropping partitions automatically without much hassle. So do you mean to say do the partition maintenance(create/drop) by creating our own jobs and not to use the pg_partman extension for this. Say for example in our case the parent table has 3-4 child table and all are partitioned on same keys/columns, so how we can identify the child partitions and then create all foreign keys to the respective parent table partitions and attach those partitions to parent table and also make this process automated? Appreciate any guidance on this. Actually, using pg_partman was taking care of everything starting from creating partitions with different names and creating respective indexes, constraints also with different names for each partitions without us being worrying anything about those.
Re: Not able to purge partition
On Sun, 24 Mar 2024 at 20:29, Laurenz Albe wrote: > On Sun, 2024-03-24 at 00:37 +0530, veem v wrote: > > > Instead, use foreign keys between the partitions. > > > > I am struggling to understand how to maintain those partitions then? > > As because we were planning to use pg_partman for creating and dropping > > partitions automatically without much hassle. > > I understand. > > But do you want to pursue a way that is not working well, just because > the tool you chose cannot handle it? > > Yours, > Laurenz Albe Thank you so much Laurenz. Can you please suggest some docs which shows the way we should do the partition maintenance (if not using pg_partman)? Actually , I am not able to visualize the steps here. Do you mean to say , we will just create the partition tables without any foreign key constraints first. Then create parent table future partitions first (say 10 days partitions) and then child table future 10 days partitions manually and during that define the foreign key constraints , indexes etc. Here , how to exactly find the exact matching parent partitions so as to create the foreign keys one to one?
Re: Moving delta data faster
On Fri, 5 Apr 2024 at 06:10, Adrian Klaver wrote: > > > S3 is not a database. You will need to be more specific about '... > then > > from the S3 it will be picked and gets merged to the target postgres > > database.' > > > > > > The data from S3 will be dumped into the stage table and then the > > upsert/merge from that table to the actual table. > > The S3 --> staging table would be helped by having the data as CSV and > then using COPY. The staging --> final table step could be done as > either ON CONFLICT or MERGE, you would need to test in your situation to > verify which works better. > Just a thought , in case the delta record changes are really higher(say >30-40% of the total number of rows in the table) can OP also evaluate the "truncate target table +load target table" strategy here considering DDL/Trunc is transactional in postgres so can be done online without impacting the ongoing read queries and also performance wise, it would be faster as compared to the traditional Update/Insert/Upsert/Merge?
Question on trigger
Hi, We used to use Oracle database in which we had audit triggers(something as below) mandated for all tables by the control team. Now we are going to use the postgresql 15.4 database for one of our applications. So,wanted to understand if there exists any downside of such audit trigger setup for all the tables? Will it impact the bulk data insert/update/delete OR slowdown of any of the DML operations significantly (and thus will not be advisable to use for all tables but selected ones)? CREATE OR REPLACE TRIGGER TAB_AUD_TRG BEFORE DELETE OR INSERT OR UPDATE ON tab FOR EACH ROW BEGIN IF inserting THEN :NEW.create_timestamp := systimestamp; :NEW.create_userid := sys_context('USERENV','SESSION_USER'); :NEW.update_timestamp := systimestamp; :NEW.update_userid := sys_context('USERENV','SESSION_USER'); ELSIF updating THEN IF updating('create_userid') OR updating('create_timestamp') THEN :new.create_userid := :old.create_userid; :new.create_timestamp := :old.create_timestamp; END IF; :NEW.update_timestamp := systimestamp; :NEW.update_userid := sys_context('USERENV','SESSION_USER'); END IF; END; / Regards Veem
Re: Question on trigger
Thank you Adrian. So it seems the heavy DML tables will see an impact if having triggers (mainly for each row trigger) created on them. And also the bulk DML/array based insert (which inserts multiple rows in one short or one batch) , in those cases it seems the trigger will not make that happen as it will force it to make it happen row by row, as the trigger is row based. Will test anyway though. On Thu, 11 Apr 2024 at 22:00, Adrian Klaver wrote: > On 4/11/24 07:31, veem v wrote: > > Hi, We used to use Oracle database in which we had audit > > triggers(something as below) mandated for all tables by the control > > team. Now we are going to use the postgresql 15.4 database for one of > > our applications. So,wanted to understand if there exists any downside > > of such audit trigger setup for all the tables? Will it impact the bulk > > data insert/update/delete OR slowdown of any of the DML operations > > significantly (and thus will not be advisable to use for all tables but > > selected ones)? > > Triggers are overhead in Postgres as they where in Oracle. If they > didn't cause an issue in Oracle I would suspect that would also be the > case in Postgres. To confirm you would need to create a test setup and > run some common operations and see what the overhead is. > > Some potential performance improvements: > > https://www.postgresql.org/docs/current/sql-createtrigger.html > > "...a trigger that is marked FOR EACH STATEMENT only executes once for > any given operation, regardless of how many rows it modifies (in > particular, an operation that modifies zero rows will still result in > the execution of any applicable FOR EACH STATEMENT triggers)." > > <...> > > "The REFERENCING option enables collection of transition relations, > which are row sets that include all of the rows inserted, deleted, or > modified by the current SQL statement. This feature lets the trigger see > a global view of what the statement did, not just one row at a time. > This option is only allowed for an AFTER trigger that is not a > constraint trigger; also, if the trigger is an UPDATE trigger, it must > not specify a column_name list. OLD TABLE may only be specified once, > and only for a trigger that can fire on UPDATE or DELETE; it creates a > transition relation containing the before-images of all rows updated or > deleted by the statement. Similarly, NEW TABLE may only be specified > once, and only for a trigger that can fire on UPDATE or INSERT; it > creates a transition relation containing the after-images of all rows > updated or inserted by the statement." > > > As example: > > https://www.postgresql.org/docs/current/plpgsql-trigger.html > > Example 43.7. Auditing with Transition Tables > > > > > CREATE OR REPLACE TRIGGER TAB_AUD_TRG > >BEFORE DELETE OR INSERT OR UPDATE > >ON tab > >FOR EACH ROW > > BEGIN > >IF inserting THEN > > :NEW.create_timestamp := systimestamp; > > :NEW.create_userid := sys_context('USERENV','SESSION_USER'); > > :NEW.update_timestamp := systimestamp; > > :NEW.update_userid := sys_context('USERENV','SESSION_USER'); > >ELSIF updating THEN > > IF updating('create_userid') OR updating('create_timestamp') > THEN > > :new.create_userid := :old.create_userid; > > :new.create_timestamp := :old.create_timestamp; > > END IF; > > :NEW.update_timestamp := systimestamp; > > :NEW.update_userid := sys_context('USERENV','SESSION_USER'); > >END IF; > >END; > > / > > > > Regards > > Veem > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Question on trigger
On Sat, 13 Apr 2024 at 21:44, Adrian Klaver wrote: > On 4/13/24 00:03, veem v wrote: > > Thank you Adrian. > > > > So it seems the heavy DML tables will see an impact if having triggers > > (mainly for each row trigger) created on them. > > > > And also the bulk DML/array based insert (which inserts multiple rows in > > one short or one batch) , in those cases it seems the trigger will not > > make that happen as it will force it to make it happen row by row, as > > the trigger is row based. Will test anyway though. > > You said you have triggers in the Oracle database and I assumed they > worked and where not a show stopping issue there. What makes you think > that would be different in Postgres? > > What type of triggers where there in Oracle, per row, per statement or a > mix? > > Actually we have row level triggers in oracle which are running for smaller volume DML and are making the direct path inserts to happen in conventional row by row insert, in presence of trigger. So was wondering if it postgres we will be encountering a similar issue and batch inserts may be converted back to row by row automatically. And here we are going to process higher volume DMLS in postgresql database.
How to create efficient index in this scenario?
Hi , It's postgres version 15.4. A table is daily range partitioned on a column transaction_timestamp. It has a unique identifier which is the ideal for primary key (say transaction_id) , however as there is a limitation in which we have to include the partition key as part of the primary key, so it has to be a composite index. Either it has to be (transaction_id,transaction_timestamp) or ( transaction_timestamp, transaction_id). But which one should we go for, if both of the columns get used in all the queries? We will always be using transaction_timestamp as mostly a range predicate filter/join in the query and the transaction_id will be mostly used as a join condition/direct filter in the queries. So we were wondering, which column should we be using as a leading column in this index? There is a blog below (which is for oracle), showing how the index should be chosen and it states , "*Stick the columns you do range scans on last in the index, filters that get equality predicates should come first.* ", and in that case we should have the PK created as in the order (transaction_id,transaction_timestamp). It's because making the range predicate as a leading column won't help use that as an access predicate but as an filter predicate thus will read more blocks and thus more IO. Does this hold true in postgres too? https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ Additionally there is another scenario in which we have the requirement to have another timestamp column (say create_timestamp) to be added as part of the primary key along with transaction_id and we are going to query this table frequently by the column create_timestamp as a range predicate. And ofcourse we will also have the range predicate filter on partition key "transaction_timestamp". But we may or may not have join/filter on column transaction_id, so in this scenario we should go for (create_timestamp,transaction_id,transaction_timestamp). because "transaction_timestamp" is set as partition key , so putting it last doesn't harm us. Will this be the correct order or any other index order is appropriate? Thanks and Regards Veem
Re: How to create efficient index in this scenario?
On Sun, 9 Jun 2024 at 09:45, Lok P wrote: > > On Sat, Jun 8, 2024 at 7:03 PM veem v wrote: > >> >> There is a blog below (which is for oracle), showing how the index should >> be chosen and it states , "*Stick the columns you do range scans on >> last in the index, filters that get equality predicates should come first.* >> ", and in that case we should have the PK created as in the order >> (transaction_id,transaction_timestamp). It's because making the range >> predicate as a leading column won't help use that as an access predicate >> but as an filter predicate thus will read more blocks and thus more IO. >> Does this hold true in postgres too? >> >> >> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ >> > > I believe the analogy holds true here in postgres too and the index in > this case should be on (transaction_id, transaction_timestamp). > > >> >> >> Additionally there is another scenario in which we have the requirement >> to have another timestamp column (say create_timestamp) to be added as part >> of the primary key along with transaction_id and we are going to query this >> table frequently by the column create_timestamp as a range predicate. And >> ofcourse we will also have the range predicate filter on partition key >> "transaction_timestamp". But we may or may not have join/filter on column >> transaction_id, so in this scenario we should go for >> (create_timestamp,transaction_id,transaction_timestamp). because >> "transaction_timestamp" is set as partition key , so putting it last >> doesn't harm us. Will this be the correct order or any other index order is >> appropriate? >> >> >> > In this case , the index should be on ( > create_timestamp,transaction_id,transaction_timestamp), considering the > fact that you will always have queries with "create_timestamp" as predicate > and may not have transaction_id in the query predicate. > So in the second scenario, if we keep the create_timestamp as the leading column ,is it not against the advice which the blog provides i.e. to not have the range predicate as the leading column in the index?
Does trigger only accept functions?
Hi, It's version 15.4 of postgres. We have a requirement to have the audit enabled for the delete queries on the base table. And for that we are planning to have one audit table created for each base table and have triggers on each of the base tables to be fired on delete which will insert records into the audit table. But I see the trigger is not accepting the insert query directly, rather it's asking to call a function and to put the business logic inside that function, something as below. So does that mean, to enable audit on the ~50 base table , we will have ~50 functions to be created and also they need to be called from ~50 triggers? or any other better approach exists to handle this? CREATE OR REPLACE FUNCTION log_deletes_source_table() RETURNS TRIGGER AS $$ BEGIN INSERT INTO source_table_delete_history (record_id, delete_timestamp, col1, col2,col3) VALUES (OLD.id, current_timestamp, old.col1,old.col2,old.col3); RETURN OLD; END; $$ LANGUAGE plpgsql; -- Trigger for deletes CREATE TRIGGER before_delete BEFORE DELETE ON source_table FOR EACH ROW EXECUTE FUNCTION log_deletes_source_table (); Regards Veem
Re: Does trigger only accept functions?
On Tue, 11 Jun 2024 at 17:03, hubert depesz lubaczewski wrote: > On Tue, Jun 11, 2024 at 12:47:14AM +0530, veem v wrote: > > to be called from ~50 triggers? or any other better approach exists to > > handle this? > > pgaudit extension? > > Or just write all the changes to single table? > > Or use dynamic queries that will build the insert based on the name of > table the event happened on? > > Or pass arguments? > > Best regards, > > depesz > > Thank you so much. I hope you mean something as below when you say making it dynamic. Because we have the audit tables having more number of columns as compared to the source table and for a few the column name is a bit different. -- Trigger for deletes CREATE TRIGGER before_delete BEFORE DELETE ON source_table FOR EACH ROW EXECUTE FUNCTION log_deletes(); -- Trigger for source_table1 CREATE TRIGGER before_delete_source_table1 BEFORE DELETE ON source_table1 FOR EACH ROW EXECUTE FUNCTION log_deletes(); -- Trigger for source_table2 CREATE TRIGGER before_delete_source_table2 BEFORE DELETE ON source_table2 FOR EACH ROW EXECUTE FUNCTION log_deletes(); CREATE OR REPLACE FUNCTION log_deletes() RETURNS TRIGGER AS $$ BEGIN IF TG_TABLE_NAME = 'source_table1' THEN INSERT INTO delete_audit1 ( col1, col2, col3) VALUES (OLD.col1, OLD.col2, OLD.col3); ELSIF TG_TABLE_NAME = 'source_table2' THEN INSERT INTO delete_audit2 ( col4, col5, col6) VALUES (OLD.col4, OLD.col5, OLD.col6); -- Add more conditions for other tables ELSE RAISE EXCEPTION 'Audit table not defined for %', TG_TABLE_NAME; END IF; RETURN OLD; END; $$ LANGUAGE plpgsql;
Re: Does trigger only accept functions?
On Wed, 12 Jun 2024 at 00:26, hubert depesz lubaczewski wrote: > > > No, I meant building dynamic queries and then EXECUTE-ing, like docs > show: > > https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN > > Best regards, > > depesz > > My apology, if interpreting it wrong way. It doesn't make much difference though, but do you mean something like below? CREATE OR REPLACE FUNCTION log_deletes() RETURNS TRIGGER AS $$ DECLARE audit_table_name TEXT; audit_query TEXT; BEGIN IF TG_TABLE_NAME = 'source_table1' THEN audit_table_name := 'delete_audit1'; audit_query := 'INSERT INTO ' || audit_table_name || ' (record_id, delete_timestamp, col1, col2, col3) VALUES ( $2, $3, $4)'; EXECUTE audit_query USING OLD.id, OLD.col1, OLD.col2, OLD.col3; ELSIF TG_TABLE_NAME = 'source_table2' THEN audit_table_name := 'delete_audit2'; audit_query := 'INSERT INTO ' || audit_table_name || ' ( col4, col5, col6) VALUES ( $2, $3, $4)'; EXECUTE audit_query USING OLD.id, OLD.col4, OLD.col5, OLD.col6; ELSE RAISE EXCEPTION 'Audit table not defined for %', TG_TABLE_NAME; END IF; RETURN OLD; END; $$ LANGUAGE plpgsql;
Issue while creating index dynamically
Hi, It's postgres version 15.4. We have a requirement to create an index on a big partition table and want to do it online. And to make the script run in an automated way on any day , through our ci/cd pipeline we were trying to write it as below inside a begin/end block. I.e. create index using "ON ONLY" option and then create index on each partition using 'CONCURRENTLY" key word and then attach the index partitions to the main index, something as below. But we are getting an error while executing saying it cant be executed in transaction block with "CONCURRENTLY". So I want to understand , is there any alternate way to get away with this? EXECUTE format('CREATE INDEX %I ON ONLY %I (%I);', index_name, table_name, column_name); FOR partition_name IN SELECT inhrelid::regclass::text FROM pg_inherits WHERE inhparent = table_name::regclass LOOP partition_index_name := partition_name || '_' || index_name || '_idx'; EXECUTE format(' CREATE INDEX CONCURRENTLY %I ON %I (%I);', partition_index_name, partition_name, column_name); EXECUTE format(' ALTER INDEX %I ATTACH PARTITION %I;', index_name, partition_index_name); END LOOP; ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block CONTEXT: SQL statement " CREATE INDEX CONCURRENTLY partitioned_table_0_index1_idx ON partitioned_table_0 (id);" PL/pgSQL function inline_code_block line 20 at EXECUTE Regards Veem
Re: Issue while creating index dynamically
On Wed, 24 Jul 2024 at 02:02, Tom Lane wrote: > Ron Johnson writes: > > On Tue, Jul 23, 2024 at 4:10 PM veem v wrote: > >> But we are getting an error while executing saying it cant be executed > in > >> transaction block with "CONCURRENTLY". So I want to understand , is > there > >> any alternate way to get away with this? > > > I'd write that in bash, not in a DO block. > > Yeah. I thought for a bit about using contrib/dblink to carry out > the commands in a different session, but I don't think that'll work: > CREATE INDEX CONCURRENTLY would think it has to wait out the > transaction running the DO block at some steps. Shove the logic > over to the client side and you're good to go. > > regards, tom lane > > > Thank you . I was thinking the individual statement will work fine if I pull out those from the begin..end block, as those will then be not bounded by any outer transaction. However, When I was trying it from dbeaver by extracting individual index creation statements rather from within the "begin ..end;" block, it still failed with a different error as below. Why is it so? "SQL Error [25001]: Error: create index concurrently cannot be executed within a pipeline "
Getting specific partition from the partition name
Hi , We are using postgres version 15.4. We have a range partition table and the partition naming convention is generated by pg_partman and is something like "table_name>_p_MM_DD". We have a requirement of extracting specific partitions ordered by the date criteria and also do some operations on that specific date. But I am struggling and it's not working as expected.I tried something as below but it's not working.Can somebody guide me here please. to_date( substring('table_part_p2024_08_08' from '_p(\d{4})_(\d{2})_(\d{2})'), '_MM_DD' ) < current_date or is there any ready-made data dictionary which will give us the order of the partitions by the date and we can get hold of the specific nth partition in that table? Regards Veem
Re: Getting specific partition from the partition name
This helps. Thank you very much. On Fri, 9 Aug 2024 at 02:15, Greg Sabino Mullane wrote: > _MM_DD is already setup for sorting, so just do: > > SELECT table_name FROM information_schema.tables WHERE table_name ~ > 'table_part_p' ORDER BY 1 DESC; > > If you need to grab the numbers: > > SELECT substring('table_part_p2022_03_04' from '([\d_]+)$'); > > Cheers, > Greg > >
Re: Column type modification in big tables
On Tue, 13 Aug 2024 at 19:39, Greg Sabino Mullane wrote: > On Sat, Aug 10, 2024 at 5:06 PM Lok P wrote: > > >> Can someone through some light , in case we get 5-6hrs downtime for this >> change , then what method should we choose for this Alter operation? >> > > We can't really answer that. Only you know what resources you have, what > risk/reward you are willing to handle, and how long things may take. For > that latter item, your best bet is to try this out on the same/similar > hardware and see how long it takes. Do a smaller table and extrapolate if > you need to. Or promote one of your replicas offline and modify that. I've > given you a low-risk / medium-reward option with check constraints, but for > the ALTER TABLE options you really need to try it and see (on non-prod). > >> >> What about if the OP opt a strategy something as below, 1) Detaching the partitions 2)Altering individual partitions with required column type and length 3)Altering the table 4)Attaching the partitions back to the main table This should be faster and also a controlled fashion for each partition individually.
Partitioning and unique key
Hello, We have our transaction tables daily range partitioned based on transaction_timestamp column which is timestamptz data type and these are having composite primary key on (transaction_id, transaction_timestamp). And we were using an "insert on conflict" for loading data to our system , which means if another record comes to the system with the same transaction_id and transaction_timestamp, it will get updated. This way we already have 60 days worth of data stored in our system with approx. 70 million transactions per day. But we just got to know from business that the data should be unique by only transaction_id but not transaction_timestamp. Any incoming data with the same transaction_id(even different transaction_timestamp) should get updated but not inserted. Also these daily partitions are going to hold 400million rows in future and will be queried on the transaction_timestamp filter so we can't really avoid the partitioning option here considering future growth. But due to postgres limitations we are unable to have this unique constraint or primary key only on the transaction_id column, we have to include transaction_timestamp with it as a composite key. So I want to understand from experts if there is any possible way to satisfy both partitioning on transaction_timestamp column and unique key or pk just on *trans*action_id only? Note-its 15.4 postgres database. Regards Veem
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 11:38, veem v wrote: > > On Sun, 1 Sept 2024 at 10:03, veem v wrote: > >> >> On Sun, 1 Sept 2024 at 09:13, David G. Johnston < >> david.g.johns...@gmail.com> wrote: >> >>> On Saturday, August 31, 2024, veem v wrote: >>> >>>> >>>> >>>> iii)And then alter the datatype of the partition key transaction_date >>>> to DATE in one shot at the table level(which should be fast as its having >>>> more granularity as compare to existing timestamptype, so should be catalog >>>> or dictionary change only), and that will remain the part of composite PK >>>> (transaction_id,transaction_date). >>>> >>> >>> While this might seem logical, in reality date and timestamptz are >>> different fixed-width data types and thus any attempt to change from one to >>> the other will involve a table rewrite. Best you could do is leave the >>> timestamptz in place and just truncate to day so the time is always >>> midnight UTC. >>> >>> >>>> >>>> >> Here , if we keep the PK column as is i.e. the transaction_timestamp as >> timestamptz but truncate the time component , in that case again in future >> if someone tries to insert(using insert on conflict) data into the table >> with time component , it will get consumed and will not be restricted by >> the PK constraint. So I was trying to make the data type also as DATE for >> the transaction_timestap column. >> >> As in this case anyway we have to create another column to populate the >> date+timestamp values as we cant throw those values away per business need, >> so we will be kind of rewriting the table.So is it okay if if we will >> >> 1) Detach all the partitions. >> 2)Do the alter using "only" key word in table level. (For adding new >> column transaction_timestamp_new to hold date+timestamp value and also >> altering the existing transaction_timestamp column to DATE from type >> timestamptz). >> 3)Then do the data fix(delete the duplicates) and alter the column, one >> partition at a time for all of the partitions and once done , attach those >> partitions one by one. >> 5)Rename the columns at table level.Hope this won't need any table >> rewrite. >> >> Is there any downside if we go by the above approach? >> > > Or do you mean to say there is no way we can modify the data type of a > partition key even by detaching the partitions one by one? And thus we may > have only way left is to create the table from scratch with partitions and > populate the data to it? I was avoiding this because we have many indexes > also in it , so creating from scratch means creating those indexes again. > So I wanted to achieve it by detaching partitions, doing the required > change and attaching it again. > > I tried by detaching the partitions, but still then it's not allowing me to alter the DAT TYPE of the partition key and throwing error as below. ERROR: cannot alter column "" because it is part of the partition key of relation "" Now I am thinking if it's really going to get too complex if we try to stick with the partition detach and attach strategy. As a few teammates say , having a new column added with just a date type and then drop the existing FK and PK first and then detach all the partitions, and attach the partitions back using the new DATE column. and then recreate the PK again. Btw we have ~5 partition tables with parent child relationship on which this fix has to be applied. So I'm still wondering the best way possible for fixing this issue.
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 03:58, Adrian Klaver wrote: > > The model is at odds with itself and untenable. If the tables hold > multiple rows for a given transaction_id then you cannot have a > PK/Unique constraint on that column. Seems there is a decided lack of > any planning. The only way I can see this happening is consolidating all > the duplicate transaction_id rows into a single row for each > transaction_id. That then leads to the question of how to do that and > retain the 'correct' information from the selection of rows for each > transaction_id. > > Yes we had messed up the data for now and have multiple records for each transaction_id persisted and thus we need to fix the data. But more than that , as I stated , I wanted to understand first 1) if it's technically possible to have a unique key on only the transaction_id column having the partition key on the transaction_timestamp, because the table is going to be queried/purged based on the transaction_timestamp? 2) Additionally we were thinking if above is technically not possible, then the maximum granularity which we can have for each transaction_id will be a day, so the partition key transaction_timestmp can be truncated to have only date component but no time component. So the primary key will be (transaction_id, transaction_date). But we also don't want to lose the time component and persist the existing data of transaction_timestmp (which will have a time component in it, in a separate column). And in above case , for fixing the existing data in least disruptive way, as we have currently duplicate transaction_id inserted into the table already because of the composite primary key(transaction_id, transaction_timestmp).Can we simply i)rename the existing column transaction_timestmp to transaction_date and then add new column transaction_timestmp using the values of existing column partition by partition. ii)And then delete the duplicate data using query something as below , each partition by partition. iii)And then alter the datatype of the partition key transaction_date to DATE in one shot at the table level(which should be fast as its having more granularity as compare to existing timestamptype, so should be catalog or dictionary change only), and that will remain the part of composite PK (transaction_id,transaction_date). iv) Repeat this step for all child partition tables and then for the parent partition tables. Will this technique be the most efficient way of fixing this mess? WITH ranked_records AS ( SELECT column1_id, column2_timestamptz, ROW_NUMBER() OVER (PARTITION BY column1_id, date_trunc('day', column2_timestamptz) ORDER BY column2_timestamptz DESC) AS rn FROM partition_name ) DELETE FROM partition_name T1 WHERE EXISTS ( SELECT 1 FROM ranked_records T2 WHERE T1.column1_id = T2.column1_id AND T1.column2_timestamptz = T2.column2_timestamptz AND T2.rn > 1 )
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 09:13, David G. Johnston wrote: > On Saturday, August 31, 2024, veem v wrote: > >> >> >> iii)And then alter the datatype of the partition key transaction_date to >> DATE in one shot at the table level(which should be fast as its having more >> granularity as compare to existing timestamptype, so should be catalog or >> dictionary change only), and that will remain the part of composite PK >> (transaction_id,transaction_date). >> > > While this might seem logical, in reality date and timestamptz are > different fixed-width data types and thus any attempt to change from one to > the other will involve a table rewrite. Best you could do is leave the > timestamptz in place and just truncate to day so the time is always > midnight UTC. > > >> >> Here , if we keep the PK column as is i.e. the transaction_timestamp as timestamptz but truncate the time component , in that case again in future if someone tries to insert(using insert on conflict) data into the table with time component , it will get consumed and will not be restricted by the PK constraint. So I was trying to make the data type also as DATE for the transaction_timestap column. As in this case anyway we have to create another column to populate the date+timestamp values as we cant throw those values away per business need, so we will be kind of rewriting the table.So is it okay if if we will 1) Detach all the partitions. 2)Do the alter using "only" key word in table level. (For adding new column transaction_timestamp_new to hold date+timestamp value and also altering the existing transaction_timestamp column to DATE from type timestamptz). 3)Then do the data fix(delete the duplicates) and alter the column, one partition at a time for all of the partitions and once done , attach those partitions one by one. 5)Rename the columns at table level.Hope this won't need any table rewrite. Is there any downside if we go by the above approach?
Re: Partitioning and unique key
On Sun, 1 Sept 2024 at 10:03, veem v wrote: > > On Sun, 1 Sept 2024 at 09:13, David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Saturday, August 31, 2024, veem v wrote: >> >>> >>> >>> iii)And then alter the datatype of the partition key transaction_date to >>> DATE in one shot at the table level(which should be fast as its having more >>> granularity as compare to existing timestamptype, so should be catalog or >>> dictionary change only), and that will remain the part of composite PK >>> (transaction_id,transaction_date). >>> >> >> While this might seem logical, in reality date and timestamptz are >> different fixed-width data types and thus any attempt to change from one to >> the other will involve a table rewrite. Best you could do is leave the >> timestamptz in place and just truncate to day so the time is always >> midnight UTC. >> >> >>> >>> > Here , if we keep the PK column as is i.e. the transaction_timestamp as > timestamptz but truncate the time component , in that case again in future > if someone tries to insert(using insert on conflict) data into the table > with time component , it will get consumed and will not be restricted by > the PK constraint. So I was trying to make the data type also as DATE for > the transaction_timestap column. > > As in this case anyway we have to create another column to populate the > date+timestamp values as we cant throw those values away per business need, > so we will be kind of rewriting the table.So is it okay if if we will > > 1) Detach all the partitions. > 2)Do the alter using "only" key word in table level. (For adding new > column transaction_timestamp_new to hold date+timestamp value and also > altering the existing transaction_timestamp column to DATE from type > timestamptz). > 3)Then do the data fix(delete the duplicates) and alter the column, one > partition at a time for all of the partitions and once done , attach those > partitions one by one. > 5)Rename the columns at table level.Hope this won't need any table rewrite. > > Is there any downside if we go by the above approach? > Or do you mean to say there is no way we can modify the data type of a partition key even by detaching the partitions one by one? And thus we may have only way left is to create the table from scratch with partitions and populate the data to it? I was avoiding this because we have many indexes also in it , so creating from scratch means creating those indexes again. So I wanted to achieve it by detaching partitions, doing the required change and attaching it again.
Re: Partitioning and unique key
On Mon, 2 Sept 2024 at 19:13, Laurenz Albe wrote: > On Sun, 2024-09-01 at 01:32 +0530, veem v wrote: > > due to postgres limitations we are unable to have this unique constraint > or primary key > > only on the transaction_id column, we have to include > transaction_timestamp with it as > > a composite key. So I want to understand from experts if there is any > possible way to > > satisfy both partitioning on transaction_timestamp column and unique key > or pk just on > > transaction_id only? > > No, you cannot have both. > > Usually the solution is to *not* create a primary key on the partitioned > table > and instead create a primary key on each partition. > > That won't guarantee global uniqueness (and there is no way to do that), > but it > goes a long way by ensuring that the column is unique within each > partition. > > Yours, > Laurenz Albe > Thank you so much. So it means in our case the existing PK on table level on column (txn_id and txn_timestamp), we should drop that and create a unique index on each partition level and also the same way the foreign key also maps to the parent table partitions. And in that case , can we follow this as best practices to not have the primary keys defined at the tabe level at all, but on the partition level only, or there exist any down side to it too? Please suggest. Also then what I see is, it will make the data load query fail which uses "insert on conflict" to insert data into the table and that requires the primary key on both the columns to have on table level. Also the partition maintenance job which uses partman extension uses the template table which in turn uses table level properties for creating new partitions and they will not have these unique indexes created for the new partitions as because the unique index property is not on the table level but partition level. Can you share your thoughts on these?
Re: Partitioning and unique key
On Tue, 3 Sept 2024 at 01:14, Laurenz Albe wrote: > > You can keep the primary key defined on both columns if it is good enough > for you. > But it will give you lower guarantees of uniqueness: with that primary > key, there could > be two rows with a different timestamp, but the same "txn_id", and these > two rows could > be in the same partition... > > Also, if you need a foreign key pointing *to* the partitioned table, you > cannot do without > a primary key. But I recommend that you do *not* define such foreign > keys: they will make > it more difficult to detach a partition. > > If you partition two tables in the same way, you can use foreign keys > between the partitions > instead of foreign keys between the partitioned tables. Such foreign keys > won't be a problem. > > Thank You so much. As you rightly said *"they will make it more difficult to detach a partition." , *we are really seeing a longer time when detaching parent table partitions.It runs forever sometimes. So do you mean it's because we have primary key defined table level or it's because we have FK defined in table level (for multiple child tables which are also partitioned)? We were thinking it's because we have FK defined on tablelevel , so we were planning to make the FK on partition level. But as you just pointed now , even keeping the PK on table level will also make the detach partition slow? I understand, for detaching partitions , it may be scanning while child because of the FK defined on the table level. but i am unable to understand how the table level PK impacts the detach partition from parent here. My understanding is PK can only be created on table level but not on the partition level. On the partition level we only can have a "unique index" defined. Correct me if my understanding is wrong.
IO related waits
Hi, One of our application using RDS postgres. In one of our streaming applications(using flink) which processes 100's of millions of transactions each day, we are using row by row transaction processing for inserting data into the postgres database and commit is performed for each row. We are seeing heavy IO:XactSynch wait events during the data load and also high overall response time. Architecture team is suggesting to enable asynch io if possible, so that the streaming client will not wait for the commit confirmation from the database. So I want to understand , how asynch io can be enabled and if any downsides of doing this? Regards Veem
Re: IO related waits
On Tue, 17 Sept 2024 at 03:41, Adrian Klaver wrote: > > Are you referring to this?: > > > https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/ > > If not then you will need to be more specific. > > Yes, I was referring to this one. So what can be the caveats in this approach, considering transactions meant to be ACID compliant as financial transactions.Additionally I was not aware of the parameter "synchronous_commit" in DB side which will mimic the synchronous commit. Would both of these mimic the same asynchronous behaviour and achieves the same, which means the client data load throughput will increase because the DB will not wait for those data to be written to the WAL and give a confirmation back to the client and also the client will not wait for the DB to give a confirmation back on the data to be persisted in the DB or not?. Also, as in the backend the flushing of the WAL to the disk has to happen anyway(just that it will be delayed now), so can this method cause contention in the database storage side if the speed in which the data gets ingested from the client is not getting written to the disk , and if it can someway impact the data consistency for the read queries?
Re: IO related waits
On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane wrote: > > This is a better place to optimize. Batch many rows per transaction. > Remove unused indexes. > > flushing of the WAL to the disk has to happen anyway(just that it will be >> delayed now), so can this method cause contention in the database storage >> side if the speed in which the data gets ingested from the client is not >> getting written to the disk , and if it can someway impact the data >> consistency for the read queries? >> > > Not quite clear what you are asking here re data consistency. The data > will always be consistent, even if synchronous_commit is disabled. The only > danger window is on a server crash. > > (Keep in mind that RDS is not Postgres, so take tuning recommendations and > advice with a grain of salt.) > > > Thank you Greg. Yes, our Java application was doing row by row commit and we saw that from pg_stat_database from the column "xact_commit" which was closely the same as the sum of tup_inserted, tup_updated, tup_deleted column. And also we verified in pg_stats_statements the number against the "calls" column were matching to the "rows" column for the INSERT queries, so it means also we are inserting exactly same number of rows as the number of DB calls, so it also suggest that we are doing row by row operations/dmls. And we then asked the application tema to make the inserts in batches, but still seeing those figures in these above two views are not changing much the number "xact_commit" is staying almost same and also even the "calls" and the "rows" column in pg_stats_statements also staying almost same. So does it mean that the application code is somehow still doing the same row by row processing or we are doing something wrong in the above analysis? And another thing we noticed , even after the data load finished , even then the "xact_commit" was keep increasing along with "tup_fetched", so does it mean that its doing some kind of implicit commit even for the fetch type queries which must be "select" queries i believe? Also not sure if its expected, but here in postgres i have seen unless we put a code within begin and end block , it's default gets committed even we just run it on the console , it doesn't ask for a explicit commit/rollback ,so not sure if that is someway playing a role here. Regards Veem > >
Question on pg_stat* views
Hi, As per my understanding , the pg_stat_statements stores the aggregated statistics of the query execution history(with almost near real time executions) and the max number of statements or queries it can store depends on the pg_stat_statement.max value(which I am seeing as ~5000 in pg_settings). I have below questions, 1)Doing a count(*) on pg_stat_statements giving ~4818. But still pg_stat_statements_info showing ~1990 as "dealloc" which means there are more sql queries coming up and they are getting flushed out of the pg_stat_statements, so does it mean that we should increase the pg_stat_statement.max to further higher value? 2)The stats_reset column in pg_stat_statements_info view is showing as 16th august , so does it mean that, whatever query stats are getting logged in the pg_stat_statements, those are the aggregated stats for all the executions that have happened since the day 16th august till today? Also as we have not done any "stats reset" manually , so does it happen automatically when the DB instance restarts or with any other database events? 3)As pg_stat_statements holds the aggregated stats of all the execution for a particular sql query ,so it's not easy to identify if in the past at some point in time the same query suffered and thus went for high response time. So to debug such performance issues scenarios , is it advisable to insert the records from this pg_stat* views to another table manually periodically through a cron job? Regards Veem
Re: IO related waits
On Wed, 18 Sept 2024 at 05:07, Adrian Klaver wrote: > On 9/17/24 12:34, veem v wrote: > > > > It does if autocommit is set in the client, that is common to other > databases also: > > https://dev.mysql.com/doc/refman/8.4/en/commit.html > > > https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html > > > https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16 > > You probably need to take a closer look at the client/driver you are > using and the code that interacting with it. > > In fact I would say you need to review the entire data transfer process > to see if there are performance gains that can be obtained without > adding an entirely new async component. > > > You were spot on. When we turned off the "auto commit" we started seeing less number of commits as per the number of batches. However we also started seeing deadlock issues. We have foreign key relationships between the tables and during the batch we do insert into the parent first and then to the child , but this does happen from multiple sessions for different batches. So why do we see below error, as we ensure in each batch we first insert into parent and then into the child tables? caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected Detail: Process 10443 waits for ShareLock on transaction 220972157; blocked by process 10454. Process 10454 waits for ShareLock on transaction 220972155; blocked by process 10443. Hint: See server log for query details. Where: while inserting index tuple (88736,28) in relation "TAB1_p2024_08_29"
Re: IO related waits
On Thu, 19 Sept 2024 at 02:01, veem v wrote: > > On Wed, 18 Sept 2024 at 05:07, Adrian Klaver > wrote: > >> On 9/17/24 12:34, veem v wrote: >> > >> >> It does if autocommit is set in the client, that is common to other >> databases also: >> >> https://dev.mysql.com/doc/refman/8.4/en/commit.html >> >> >> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html >> >> >> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16 >> >> You probably need to take a closer look at the client/driver you are >> using and the code that interacting with it. >> >> In fact I would say you need to review the entire data transfer process >> to see if there are performance gains that can be obtained without >> adding an entirely new async component. >> >> >> > You were spot on. When we turned off the "auto commit" we started seeing > less number of commits as per the number of batches. > > However we also started seeing deadlock issues. We have foreign key > relationships between the tables and during the batch we do insert into the > parent first and then to the child , but this does happen from multiple > sessions for different batches. So why do we see below error, as we > ensure in each batch we first insert into parent and then into the child > tables? > > caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected > Detail: Process 10443 waits for ShareLock on transaction 220972157; > blocked by process 10454. > Process 10454 waits for ShareLock on transaction 220972155; blocked by > process 10443. > Hint: See server log for query details. > Where: while inserting index tuple (88736,28) in relation > "TAB1_p2024_08_29" > As we are able to get hold of one session, we see "insert into " was blocked by "insert into ". And the "insert into " was experiencing a "client read" wait event. Still unable to understand why it's happening and how to fix it?
Re: IO related waits
On Sat, 21 Sept 2024 at 03:47, Peter J. Holzer wrote: > On 2024-09-20 14:11:38 -0700, Adrian Klaver wrote: > > On 9/20/24 1:01 PM, veem v wrote: > > > Able to reproduce this deadlock graph as below. Now my question is , > > > this is a legitimate scenario in which the same ID can get inserted > from > > > multiple sessions and in such cases it's expected to skip that (thus > "On > > > conflict Do nothing" is used) row. But as we see it's breaking the code > > > > Yeah, as I see it that would not work with concurrent uncommitted > sessions > > as it would be unresolved whether a conflict actually exists until at > least > > one of the sessions completes. > > > > > with deadlock error during race conditions where a lot of parallel > > > threads are operating. So how should we handle this scenario? Will > > > setting the "lock_timeout" parameter at session level will help us > > > anyway here? > > > > Serializable transaction?: > > > > > https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE > > Doesn't help here, at least not directly. It would help indirectly > because isolation level serializable makes it very proable that > serialization errors occur. So an application designed for serializable > would have some kind of retry logic already in place. > > SO that leads as to another solution: > > Retry each batch (possibly after reducing the batch size) until it > succeeds. > > Actually here the application is using kafka and flink stream and is using one of existing code in which it was doing row by row commit which is now changed to Batch commit i.e. the commit point is shifted from row by row to batch now. There are multiple sessions spawning at the same time to process the incoming messages 24/7. And also as described in another ongoing thread and also we saw in the past we did not see much difference between "batch commit" and "Batch insert" performance. We only realized the row by row commit is having worse performance. Now, in such a scenario when the messages are coming from a streaming framework using kafka +flink and also the insert is happening using row by row only (but just the commit is shifted to batch), i don't see any way to sort the ID columns in this streaming process, so that they won't overlap across session. In such a situation , the only way seems to have the messages replayed for which the deadlock error happens , as I think during a deadlock error, one session gets terminated by postgres and that messages perhap we can save in some exception table and then replay?