Diffs in PG output vs WAL

2022-09-01 Thread V
I am playing with pgoutput and logical replication. With wal2json, I could
capture the old values before the row was updated. I am unable to figure
out how to do that with pgoutput. Does pgoutput even support this?

Relevant info from wal2json:

> Also, UPDATE/DELETE old row versions can be accessed depending on the
configured replica identity
> All of the new/old tuples are available in the JSON object.

https://github.com/eulerto/wal2json

I want new/old tuples with pgoutput. how?

-- 
Avinash


FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication

2020-12-19 Thread Rambabu V
Hi Team,

i have installed postgresql 13 using packages/tried same using lower
versions as well, but replication not working, getting below error.

2020-12-19 02:48:22.346 PST [3574] FATAL:  could not load library
"/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file
libcrypto.so.10 with link time reference
2020-12-19 02:48:22.350 PST [3575] FATAL:  could not load library
"/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file
libcrypto.so.10 with link time reference
2020-12-19 02:48:27.362 PST [3576] FATAL:  could not load library
"/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file
libcrypto.so.10 with link time reference
2020-12-19 02:48:32.366 PST [3580] FATAL:  could not load library
"/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
symbol X509_get_signature_nid, version libcrypto.so.10 not defined in file
libcrypto.so.10 with link time reference

even though i have exported LD_LIBRARY_PATH and created link also.

[root@localhost log]# which libpqwalreceiver.so
/usr/bin/libpqwalreceiver.so
[root@localhost log]# which libpq.so.5
/usr/bin/libpq.so.5

[postgres@localhost ~]$ export LD_LIBRARY_PATH=$PATH:/usr/pgsql-13/lib/
[postgres@localhost ~]$ echo $LD_LIBRARY_PATH
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/postgres/.local/bin:/home/postgres/bin:/usr/pgsql-13/lib/

every version has the same issue with packages, please help me to resolve
this issue.


Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication

2020-12-19 Thread Rambabu V
Hi Team,

below are the os versions details. standalone machine running without any
issues, coming to replication not working, even i tried from postgresql 10
onwards as well, no other versions running in the server now.

[root@node2 ~]# openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013
[root@node2 ~]# uname -a
Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015
x86_64 x86_64 x86_64 GNU/Linux
[root@node2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@node2 ~]#

On Sat, Dec 19, 2020 at 9:02 PM Adrian Klaver 
wrote:

> On 12/19/20 2:54 AM, Rambabu V wrote:
> > Hi Team,
> >
> > i have installed postgresql 13 using packages/tried same using lower
> > versions as well, but replication not working, getting below error.
>
> What OS and version?
>
> Where did you get packages?
>
> Do you have other versions of Postgres running on this machine?
>
> >
> > 2020-12-19 02:48:22.346 PST [3574] FATAL:  could not load library
> > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
> > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in
> > file libcrypto.so.10 with link time reference
> > 2020-12-19 02:48:22.350 PST [3575] FATAL:  could not load library
> > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
> > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in
> > file libcrypto.so.10 with link time reference
> > 2020-12-19 02:48:27.362 PST [3576] FATAL:  could not load library
> > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
> > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in
> > file libcrypto.so.10 with link time reference
> > 2020-12-19 02:48:32.366 PST [3580] FATAL:  could not load library
> > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
> > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in
> > file libcrypto.so.10 with link time reference
> >
> > even though i have exported LD_LIBRARY_PATH and created link also.
> >
> > [root@localhost log]# which libpqwalreceiver.so
> > /usr/bin/libpqwalreceiver.so
> > [root@localhost log]# which libpq.so.5
> > /usr/bin/libpq.so.5
> >
> > [postgres@localhost ~]$ export LD_LIBRARY_PATH=$PATH:/usr/pgsql-13/lib/
> > [postgres@localhost ~]$ echo $LD_LIBRARY_PATH
> >
> /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/postgres/.local/bin:/home/postgres/bin:/usr/pgsql-13/lib/
> >
> > every version has the same issue with packages, please help me to
> > resolve this issue.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication

2020-12-19 Thread Rambabu V
Hi Team,

below are the os versions details. standalone machine running without any
issues, coming to replication not working, even i tried from postgresql 10
onwards as well.

[root@node2 ~]# openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013
[root@node2 ~]# uname -a
Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015
x86_64 x86_64 x86_64 GNU/Linux
[root@node2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@node2 ~]#

On Sat, Dec 19, 2020 at 10:18 PM Tom Lane  wrote:

> Rambabu V  writes:
> > i have installed postgresql 13 using packages/tried same using lower
> > versions as well, but replication not working, getting below error.
>
> > 2020-12-19 02:48:22.346 PST [3574] FATAL:  could not load library
> > "/usr/pgsql-13/lib/libpqwalreceiver.so": /usr/pgsql-13/lib/libpq.so.5:
> > symbol X509_get_signature_nid, version libcrypto.so.10 not defined in
> file
> > libcrypto.so.10 with link time reference
>
> Apparently this copy of libpq was built against a newer version of
> OpenSSL than what you have installed locally.  The man page for
> X509_get_signature_nid says it was added in OpenSSL 1.0.2, in case that
> helps.  You need to find or build a version of Postgres that's intended
> for your specific platform version.  Alternatively, maybe you would be
> better off moving to a newer platform version, or at least installing
> a newer OpenSSL version.  Anything older than 1.0.1 is considered
> outright insecure these days.
>
> regards, tom lane
>


Re: FATAL: could not load library "/usr/pgsql-13/lib/libpqwalreceiver.so" during replication

2020-12-19 Thread Rambabu V
Hi Team,

below is the link im getting packages.

https://www.postgresql.org/download/linux/redhat/

On Sun, Dec 20, 2020 at 6:33 AM Adrian Klaver 
wrote:

> On 12/19/20 4:53 PM, Rambabu V wrote:
> > Hi Team,
> >
> > below are the os versions details. standalone machine running without
> > any issues, coming to replication not working, even i tried from
> > postgresql 10 onwards as well, no other versions running in the server
> now.
> >
> > [root@node2 ~]# openssl version
> > OpenSSL 1.0.1e-fips 11 Feb 2013
> > [root@node2 ~]# uname -a
> > Linux node2 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015
> > x86_64 x86_64 x86_64 GNU/Linux
> > [root@node2 ~]# cat /etc/redhat-release
> > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > [root@node2 ~]#
> >
>
> Where are you getting the packages?
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Sequence vs UUID

2023-01-26 Thread veem v
Hello, We were trying to understand whether we should use UUID or Sequence
in general for primary keys. In many of the blogs(one is below) across
multiple databases, I saw over the internet and all are mostly stating the
sequence is better as compared to UUID. But I think in the case of
concurrent data load scenarios UUID will spread the contention point
whereas sequence can be a single point of contention.

So we want to understand from experts here, if there are any clear rules
available or if we have any pros vs cons list available for each of those
to understand the exact scenario in which we should go for one over
other? Basically
I wanted to see if we can perform some test on sample data to see the
percentage of overhead on read and write performances of the query in
presence of UUID VS Sequence to draw some conclusion in general? And also
considering open source postgres as the base for many databases like
redshift etc, so the results which apply to progress would apply to others
as well.

https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/


Re: Sequence vs UUID

2023-01-30 Thread veem v
I have a question, As i understand here, usage wise there are multiple
benefits of UUID over sequences like, in case of distributed app where we
may not be able to rely on one point generator like sequences, in case of
multi master architecture, sharding.

If we just look in terms of performance wise, the key advantage of sequence
is that for read queries, because of the storage size it will be smaller
and thus it will cache more index rows and so will be beneficial during
read queries and should also be beneficial even on joins because of its
smaller size. Also fetching a value from sequence is cheaper than
calculating the UUIDS. But the downside is during write operation, it can
be a point of contention in case of concurrent data load as every incoming
request will try to modify same table/index page/block. But as its
mentioned in this blog (
https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state
this UUID can be made sequential so even this can be sequential using
prefix-timestamp etc. However isn't it that making the UUID sequential will
again actually be a disadvantage and can be contention point for this
unique index as each incoming write will now fight for same block/page
while doing concurrent data load and will contend for the same table block
or say one side of the index branch/leaf block etc, whereas in case of
random UUIDs the write was spreading across multiple blocks so there was no
contention on any specific blocks? Please correct if my understanding is
wrong?



On Sun, 29 Jan, 2023, 10:33 am Miles Elam, 
wrote:

> On Sat, Jan 28, 2023 at 8:02 PM Ron  wrote:
> >
> > Then it's not a Type 4 UUID, which is perfectly fine; just not random.
>
> Yep, which is why it really should be re-versioned to UUIDv8 to be
> pedantic. In everyday use though, almost certainly doesn't matter.
>
> > Also, should now() be replaced by clock_timestamp(), so that it can be
> > called multiple times in the same transaction?
>
> Not necessary. Instead of 122 bits of entropy, you get 106 bits of
> entropy and a new incremented prefix every minute. now() vs
> clock_timestamp() wouldn't make a substantive difference. Should still
> be reasonably safe against the birthday paradox for more than a
> century when creating more than a million UUIDs per second.
>
>
>


Re: Sequence vs UUID

2023-01-30 Thread veem v
Was trying to test the performance for simple read/write for the bigint vs
UUID. What we see is , ~3 times performance degradation while joining on
bigint vs UUID columns. Also even just generation of sequence vs bigint
itself is degrading by ~3times too. Also even insert performance on same
table for ~10million rows is ~1min 39sec for bigint vs ~3minute 11 sec in
case of UUID. Is such extent of degradation in performance this expected
for UUID?

CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name
varchar(20) );
CREATE TABLE test2_UUID (id bigint,source_id varchar(36) PRIMARY KEY,Name
varchar(20) );

CREATE TABLE test1_bigint ( id bigint PRIMARY KEY, source_id varchar(36) ,
Name varchar(20));
CREATE TABLE test2_bigint ( id bigint PRIMARY KEY, source_id varchar(36) ,
Name varchar(20));

Loaded same 10million rows.

explain Analyze select * from test1_bigint a , test2_bigint b where a.id =
b.id

Merge Join (cost=12.31..875534.52 rows=1021 width=100) (actual
time=0.042..6974.575 rows=1000 loops=1)
  Merge Cond: (a.id = b.id)
  -> Index Scan using test1_bigint_pkey on test1_bigint a
(cost=0.43..362780.75 rows=1021 width=50) (actual time=0.020..2070.079
rows=1000 loops=1)
  -> Index Scan using test2_bigint_2_pkey on test2_bigint b
(cost=0.43..362780.75 rows=1021 width=50) (actual time=0.019..2131.086
rows=1000 loops=1)
Planning Time: 0.207 ms
Execution Time: 7311.210 ms

set enable_seqscan=off;

explain Analyze select * from test1_UUID a , test2_UUID b where a.source_id
= b.source_id;

Merge Join (cost=2.75..2022857.05 rows=1021 width=100) (actual
time=0.043..21954.213 rows=1000 loops=1)
  Merge Cond: ((a.source_id)::text = (b.source_id)::text)
  -> Index Scan using test1_uuid_pkey on test1_UUID a (cost=0.56..936420.18
rows=1021 width=50) (actual time=0.022..7854.143 rows=1000 loops=1)
  -> Index Scan using test2_uuid_2_pkey on test2_UUID b
(cost=0.56..936437.90 rows=1021 width=50) (actual time=0.017..7971.187
rows=1000 loops=1)
Planning Time: 0.516 ms
Execution Time: 22292.801 ms

**

create sequence myseq cache 32767;

select count(nextval('myseq') ) from generate_series(1,1000)
 1 row retrieved starting from 1 in 4 s 521 ms (execution: 4 s 502 ms,
fetching: 19 ms)

 select count(gen_random_uuid()) from generate_series(1,1000)
 1 row retrieved starting from 1 in 11 s 145 ms (execution: 11 s 128 ms,
fetching: 17 ms)



On Mon, 30 Jan, 2023, 4:59 pm veem v,  wrote:

> I have a question, As i understand here, usage wise there are multiple
> benefits of UUID over sequences like, in case of distributed app where we
> may not be able to rely on one point generator like sequences, in case of
> multi master architecture, sharding.
>
> If we just look in terms of performance wise, the key advantage of
> sequence is that for read queries, because of the storage size it will be
> smaller and thus it will cache more index rows and so will be beneficial
> during read queries and should also be beneficial even on joins because of
> its smaller size. Also fetching a value from sequence is cheaper than
> calculating the UUIDS. But the downside is during write operation, it can
> be a point of contention in case of concurrent data load as every incoming
> request will try to modify same table/index page/block. But as its
> mentioned in this blog (
> https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state
> this UUID can be made sequential so even this can be sequential using
> prefix-timestamp etc. However isn't it that making the UUID sequential will
> again actually be a disadvantage and can be contention point for this
> unique index as each incoming write will now fight for same block/page
> while doing concurrent data load and will contend for the same table block
> or say one side of the index branch/leaf block etc, whereas in case of
> random UUIDs the write was spreading across multiple blocks so there was no
> contention on any specific blocks? Please correct if my understanding is
> wrong?
>
>
>
> On Sun, 29 Jan, 2023, 10:33 am Miles Elam, 
> wrote:
>
>> On Sat, Jan 28, 2023 at 8:02 PM Ron  wrote:
>> >
>> > Then it's not a Type 4 UUID, which is perfectly fine; just not random.
>>
>> Yep, which is why it really should be re-versioned to UUIDv8 to be
>> pedantic. In everyday use though, almost certainly doesn't matter.
>>
>> > Also, should now() be replaced by clock_timestamp(), so that it can be
>> > called multiple times in the same transaction?
>>
>> Not necessary. Instead of 122 bits of entropy, you get 106 bits of
>> entropy and a new incremented prefix every minute. now() vs
>> clock_timestamp() wouldn't make a substantive difference. Should still
>> be reasonably safe against the birthday paradox for more than a
>> century when creating more than a million UUIDs per second.
>>
>>
>>


Re: Sequence vs UUID

2023-01-30 Thread veem v
Thank You So much for the details. I am a bit new to postgres. And these
test results I picked were from a dev system. If I understand it correctly,
do you mean these settings(usage of C locale or "native" 16-byte uuid)
which you mentioned should be there in a production system  and thus we
should test the performance of the UUID vs sequence on a similar setup? Or
say if this sort of degradation of UUID performance is not expected then ,
how to get these settings tweaked ON, so as to see the best string type or
UUID performance, can you please guide me here?

On Mon, 30 Jan 2023 at 22:18, Tom Lane  wrote:

> Dominique Devienne  writes:
> > On Mon, Jan 30, 2023 at 5:11 PM veem v  wrote:
> >> CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY,
> Name varchar(20) );
>
> > Maybe if you used a "native" 16-byte uuid, instead of its textual
> > serialization with dashes (36 bytes + length overhead), the gap would
> > narrow.
>
> Yeah, especially if your database is not using C locale.  The
> strcoll or ICU-based comparisons done on string types can be
> enormously more expensive than the memcmp() used for binary
> types like native uuid.
>
> regards, tom lane
>


Re: Sequence vs UUID

2023-02-01 Thread veem v
I tried to test quickly below on dbfiddle, below with the UUID as data type
and in each of the below cases the UUID performance seems
drastically reduced as compared to sequence performance. Let me know if
anything is wrong in my testing here?

1) sequence generation vs UUID generation, execution time increased from
~291ms to 5655ms.
2) Insert performance of "sequence" vs "UUID"  execution time increased
from ~2031ms to 10599ms.
3) Index performance for sequence vs UUID,  execution time increased from
~.3ms to .5ms.


 PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-10), 64-bit

CREATE UNLOGGED TABLE test_bigint (  id bigint  PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid (   id uuid  PRIMARY KEY);
create sequence myseq cache 32767;

 sequence generation vs UUID generation Test**
explain analyze  select count(nextval('myseq') ) from
generate_series(1,10);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual
time=291.030..291.030 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=10
width=0) (actual time=53.332..63.941 rows=10 loops=1)
Planning Time: 0.155 ms
Execution Time: 291.719 ms

explain analyze  select count(gen_random_uuid()) from
generate_series(1,10);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual
time=5654.453..5654.454 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=10
width=0) (actual time=84.328..514.214 rows=10 loops=1)
Planning Time: 0.082 ms
Execution Time: 5655.158 ms

*** Insert Test***

explain analyze INSERT INTO test_bigint select nextval('myseq')  from
generate_series(1,10);

QUERY PLAN
Insert on test_bigint  (cost=0.00..2250.00 rows=0 width=0) (actual
time=2030.960..2030.961 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=10
width=8) (actual time=48.102..636.311 rows=10 loops=1)
Planning Time: 0.065 ms
Execution Time: 2031.469 ms

explain analyze  INSERT INTO test_uuid  select gen_random_uuid()  from
generate_series(1,10);


QUERY PLAN
Insert on test_uuid  (cost=0.00..2250.00 rows=0 width=0) (actual
time=10599.230..10599.230 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=10
width=16) (actual time=36.975..6289.811 rows=10 loops=1)
Planning Time: 0.056 ms
Execution Time: 10599.805 ms

 Index performance

Explain analyze select * from test_bigint where id in (select id from
test_bigint limit 100);

QUERY PLAN
Nested Loop  (cost=2.98..734.71 rows=100 width=8) (actual time=0.083..0.269
rows=100 loops=1)
  ->  HashAggregate  (cost=2.69..3.69 rows=100 width=8) (actual
time=0.046..0.062 rows=100 loops=1)
Group Key: test_bigint_1.id
Batches: 1  Memory Usage: 24kB
->  Limit  (cost=0.00..1.44 rows=100 width=8) (actual
time=0.011..0.025 rows=100 loops=1)
  ->  Seq Scan on test_bigint test_bigint_1
 (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.017
rows=100 loops=1)
  ->  Index Only Scan using test_bigint_pkey on test_bigint
 (cost=0.29..7.31 rows=1 width=8) (actual time=0.002..0.002 rows=1
loops=100)
Index Cond: (id = test_bigint_1.id)
Heap Fetches: 100
Planning Time: 0.279 ms
Execution Time: 0.302 ms

Explain analyze  select * from test_uuid where  id in (select id from
test_uuid limit 100);

QUERY PLAN
Nested Loop  (cost=3.21..783.31 rows=100 width=16) (actual
time=0.080..0.474 rows=100 loops=1)
  ->  HashAggregate  (cost=2.79..3.79 rows=100 width=16) (actual
time=0.046..0.066 rows=100 loops=1)
Group Key: test_uuid_1.id
Batches: 1  Memory Usage: 24kB
->  Limit  (cost=0.00..1.54 rows=100 width=16) (actual
time=0.010..0.025 rows=100 loops=1)
  ->  Seq Scan on test_uuid test_uuid_1  (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.009..0.016 rows=100 loops=1)
  ->  Index Only Scan using test_uuid_pkey on test_uuid  (cost=0.42..7.79
rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=100)
Index Cond: (id = test_uuid_1.id)
Heap Fetches: 100
Planning Time: 0.180 ms
Execution Time: 0.510 ms

On Tue, 31 Jan 2023 at 03:28, Ron  wrote:

>
> And populate that column with UUIDs generated by the gen_random_uuid()
> function.
>
> (Requires v13.)
>
> On 1/30/23 13:46, Adrian Klaver wrote:
> > On 1/30/23 11:43, veem v wrote:
> >> Thank You So much for the details. I am a bit new to postgres. And
> these
> >> test results I picked were from a dev system. If I understand it
> >> correctly, do you mean these settings(usage of C locale or "native"
> >> 16-byte uuid) which you mentioned should be there in a production
> system
> >>   and thus we should test the performance of the UUID vs seque

Re: Sequence vs UUID

2023-02-02 Thread veem v
1..3.355 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual
time=0.399..0.556 rows=1000 loops=1)
Group Key: test_uuid_1.id
Batches: 1  Memory Usage: 145kB
->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.185 rows=1000 loops=1)
  ->  Seq Scan on test_uuid test_uuid_1  (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.093 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid_pkey on test_uuid  (cost=0.42..7.21
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (id = test_uuid_1.id)
Heap Fetches: 1000
Planning Time: 0.234 ms
Execution Time: 3.419 ms

Explain analyze  select * from test_uuid7 where  id in (select id from
test_uuid7 limit 1000);

QUERY PLAN
Nested Loop  (cost=28.32..1416.01 rows=1000 width=16) (actual
time=0.403..2.586 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual
time=0.371..0.546 rows=1000 loops=1)
Group Key: test_uuid7_1.id
Batches: 1  Memory Usage: 145kB
->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.161 rows=1000 loops=1)
  ->  Seq Scan on test_uuid7 test_uuid7_1  (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.091 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid7_pkey on test_uuid7  (cost=0.42..6.99
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: (id = test_uuid7_1.id)
Heap Fetches: 1000
Planning Time: 0.101 ms
Execution Time: 2.661 ms

On Thu, 2 Feb 2023 at 20:52, Benedict Holland 
wrote:

> Well... until two processes generate an identical UUID. That happened to
> me several times. It's rare but when that happens, oh boy that is a mess to
> figure out.
>
> Thanks,
> Ben
>
> On Thu, Feb 2, 2023, 10:17 AM Miles Elam 
> wrote:
>
>> On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak  wrote:
>>
>>>
>>>
>>> On Wed, Feb 1, 2023 at 1:34 PM veem v  wrote:
>>>
>>>>
>>>> 1) sequence generation vs UUID generation, execution time increased
>>>> from ~291ms to 5655ms.
>>>> 2) Insert performance of "sequence" vs "UUID"  execution time increased
>>>> from ~2031ms to 10599ms.
>>>> 3) Index performance for sequence vs UUID,  execution time increased
>>>> from ~.3ms to .5ms.
>>>>
>>>>
>>> Yes, assuming that UUIDs would be efficient as keys when they are
>>> randomly generated, versus sequences (which tend to expand in one
>>> direction, and have been relatively optimized for years).
>>>
>>> This article explains in detail what is going on.  If I were doing this,
>>> I would strongly consider a ULID because of getting the best of both worlds.
>>> https://blog.daveallie.com/ulid-primary-keys
>>>
>>> Of course, YMMV...  And since ULIDs are not native to PG, there is
>>> overhead, but it is far more performant, IMO...
>>>
>>
>> Biased comparison. ULIDs have a timestamp component. The closest UUID
>> equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4.
>> Another difference not mentioned in the blog article is that UUID is
>> versioned, meaning you can figure out what kind of data is in the UUID,
>> whereas ULIDs are a "one size fits all" solution.
>>
>> There is an implementation of sequential UUIDs for Postgres I posted
>> earlier in this thread. In addition, here is an implementation of UUIDv7
>> for Postgres:
>>
>> https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74
>>
>> I would suggest running your tests against v1, v7, and sequential UUID
>> before jumping on ULID, which has no native type/indexing in Postgres.
>>
>> It should also be noted that apps cannot provide a bigint ID due to
>> collisions, but an app can generate UUIDs and ULIDs without fear,
>> essentially shifting the generation time metric in UUID/ULID's favor over a
>> bigserial.
>>
>> - Miles
>>
>>
>>


Re: Sequence vs UUID

2023-02-03 Thread veem v
Actually I did the testing by connecting to "https://dbfiddle.uk/";
postgres version -15.

PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-10), 64-bit

Am I doing it wrong, please confirm?


On Fri, 3 Feb 2023 at 21:28, Dominique Devienne  wrote:

> Copying the list...
>
> -- Forwarded message -
> From: Dominique Devienne 
> Date: Fri, Feb 3, 2023 at 4:57 PM
> Subject: Re: Sequence vs UUID
> To: veem v 
>
> On Thu, Feb 2, 2023 at 8:47 PM veem v  wrote:
>
>> Tested the UUIDv7 generator for postgres as below.
>> With regards to performance , It's still way behind the sequence. [...]
>> explain analyze  select count(nextval('myseq') ) from
>> generate_series(1,10);
>> Execution Time: 59.687 ms
>>
>> explain analyze  select count(gen_random_uuid()) from
>> generate_series(1,100'000);
>> Execution Time: 904.868 ms
>>
>> explain analyze  select count(uuid_generate_v7()) from
>> generate_series(1,10);
>> Execution Time: 1711.187 ms
>>
>
> Something's off regarding Guid generations IMHO...
>
> You generate 100K Guids in ~1s. While we generate (in C++, Windows
> Release, using Boost) 16M of them in +/- the same time:
>
> Enabling Performance tests
>>>>
>>>
>
>> generate 16'000'000 guids in 0.980s (user: 0.984s) 12 MB
>>>>
>>> generate 16'000'000 guids in parallel on 4 CPUs in 0.309s (user: 1.188s)
>>>> 12 MB
>>>>
>>>
> That's 2 orders of magnitude faster. Sure there's some overhead from the
> SQL, but still. Something seems fishy.
> And that's on a 2.5y old desktop. --DD
>


Re: Sequence vs UUID

2023-02-06 Thread veem v
So, it may be the machine on which the code is getting executed behind the
scene , in the site "https://dbfiddle.uk/"; is playing a key role in the
speed, however, the comparative performance of UUID vs sequence should stay
the same.
 So I think, after this test we can safely conclude that if we compare
the performance of the UUID(both version-4, version 7) VS sequence. The
UUID performs a lot worse as compared to sequence. So unless there exists
some strong reason/justification for UUID, we should default use the
sequence. Correct me if I'm wrong. And also  I understand the cases
of multi master replication/sharding etc, may be a factor but other than
that I can't think of any scenario where sequences can be used.



On Fri, 3 Feb 2023 at 23:07, Dominique Devienne  wrote:

> On Fri, Feb 3, 2023 at 5:48 PM veem v  wrote:
>
>> Actually I did the testing by connecting to "https://dbfiddle.uk/";
>> postgres version -15.
>>
>> PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
>> 20210514 (Red Hat 8.5.0-10), 64-bit
>>
>> Am I doing it wrong, please confirm?
>>
>>
> No clue. But even SQLite can generate 1M uuid (16 bytes random blobs are
> equivalent) is 1/3 of the time, so 30x faster.
> So your timings of generating 100K uuids and counting them seems way too
> slow to me. --DD
>
> sqlite> select count(randomblob(16)) from generate_series(1,1000*1000);
> QUERY PLAN
> `--SCAN generate_series VIRTUAL TABLE INDEX 3:
> addr  opcode p1p2p3p4 p5  comment
>   -        -  --  -
> 0 Init   0 1500   Start at 15
> 1 Null   0 1 10   r[1..1]=NULL
> 2 VOpen  0 0 0 vtab:274D3E0   0
> 3 Integer1 4 00   r[4]=1
> 4 Multiply   6 6 50   r[5]=r[6]*r[6]
> 5 Integer3 2 00   r[2]=3
> 6 Integer2 3 00   r[3]=2
> 7 VFilter0 1120   iplan=r[2]
> zplan=''
> 8   Function   1 8 7 randomblob(1)  0   r[7]=func(r[8])
> 9   AggStep0 7 1 count(1)   1   accum=r[1]
> step(r[7])
> 10VNext  0 8 00
> 11AggFinal   1 1 0 count(1)   0   accum=r[1] N=1
> 12Copy   1 9 00   r[9]=r[1]
> 13ResultRow  9 1 00   output=r[9]
> 14Halt   0 0 00
> 15Transaction0 0 1 0  1   usesStmtJournal=0
> 16Integer1000  6 00   r[6]=1000
> 17Integer168 00   r[8]=16
> 18Goto   0 1 00
> ┌───┐
> │ count(randomblob(16)) │
> ├───┤
> │ 100   │
> └───┘
> Run Time: real 0.278 user 0.25 sys 0.00
>
>
>> On Fri, 3 Feb 2023 at 21:28, Dominique Devienne 
>> wrote:
>>
>>> Something's off regarding Guid generations IMHO...
>>> You generate 100K Guids in ~1s. While we generate (in C++, Windows
>>> Release, using Boost) 16M of them in +/- the same time:
>>>
>>


Re: Sequence vs UUID

2023-02-08 Thread veem v
Thank you So much all for such valuable feedback.

As "Julian" was pointing, I also tried to test the INSERT independently(as
in below test case) without keeping the "generate_series" in the inline
query. But in all the cases sequence is performing better as compared to
both UUID V4 and UUID V7. And same with Index access path observed i.e. an
index on sequence performs better as compared to an index on UUID column.
So i believe , its understood that the performance wise sequence is better
as compared to even sequential UUID (like UUID-V7). And it seems that, even
the UUID V7 maintains better locality because of its sequential nature, but
still it all may tied to the higher space/memory consumption of UUID
because of its bigger size as compared to sequence and thus the performance
hit.

But as i saw few comments in this discussion regarding the scenarios which
will mandate the usage of UUID like "multi-master replication", "sharding",
"Need to be able to move data between databases" Etc..So wanted to
understand , why cant we use sequences as PK in these scenarios? Say for
e.g. in case of multimaster replication we can use sequence someway as
below..,

So wanted to know from experts here, is there really exists any scenario in
which UUID really cant be avoided?

Sequence Number = n*d+m+offset. Where n is the sequence order number, d is
the dimensions of the multi-master replication, m ranges from 0 to n-1 is
the number assigned to each node in the replication, and offset is the
number to offset the sequence numbers.

For a 4-ways multi-master replication where m=4, y is in (0, 1, 2, 3), and
offset is 100.
Node #1 (m=0) :Sequence number = n*4+100
Node #2 (m=1): Sequence number = n*4+101
Node #3 (m=2): Sequence number = n*4+102
Node #4 (m=3): Sequence number = n*4+103

Each sequence will have:
100, 104, 108,112, 116, 120,...
101, 105, 109, 113, 117, 121,...
102, 106, 110, 114, 118, 122...
103, 107, 111, 115, 119, 123

* Test case *

CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7 ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_bigint_1 ( id bigint PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid_1 ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7_1 ( id uuid PRIMARY KEY);
create sequence myseq cache 32767;

*** Insert Test***
INSERT INTO test_bigint select nextval('myseq') from
generate_series(1,10);
INSERT INTO test_uuid select gen_random_uuid() from
generate_series(1,10);
INSERT INTO test_uuid7 select uuid_generate_v7() from
generate_series(1,10);


explain analyze INSERT INTO test_bigint_1 select id from test_bigint;

EXPLAIN
QUERY PLAN
Insert on test_bigint_1 (cost=0.00..1444.18 rows=0 width=0) (actual
time=220.689..220.690 rows=0 loops=1)
-> Seq Scan on test_bigint (cost=0.00..1444.18 rows=100118 width=8) (actual
time=0.012..17.488 rows=10 loops=1)
Planning Time: 0.137 ms
Execution Time: 220.714 ms

explain analyze INSERT INTO test_uuid_1 select id from test_uuid;

EXPLAIN
QUERY PLAN
Insert on test_uuid_1 (cost=0.00..1541.85 rows=0 width=0) (actual
time=311.949..311.950 rows=0 loops=1)
-> Seq Scan on test_uuid (cost=0.00..1541.85 rows=100085 width=16) (actual
time=0.010..20.173 rows=10 loops=1)
Planning Time: 0.082 ms
Execution Time: 311.973 ms


explain analyze INSERT INTO test_uuid7_1 select id from test_uuid7;

EXPLAIN
QUERY PLAN
Insert on test_uuid7_1 (cost=0.00..1541.85 rows=0 width=0) (actual
time=244.531..244.532 rows=0 loops=1)
-> Seq Scan on test_uuid7 (cost=0.00..1541.85 rows=100085 width=16) (actual
time=0.011..16.390 rows=10 loops=1)
Planning Time: 0.084 ms
Execution Time: 244.554 ms

Explain analyze select * from test_bigint where id in (select id from
test_bigint limit 10);

Nested Loop (cost=2692.77..3793.23 rows=50059 width=8) (actual
time=53.739..368.110 rows=10 loops=1)
-> HashAggregate (cost=2692.48..2694.48 rows=200 width=8) (actual
time=53.681..93.044 rows=10 loops=1)
Group Key: test_bigint_1.id
Batches: 5 Memory Usage: 11073kB Disk Usage: 208kB
-> Limit (cost=0.00..1442.48 rows=10 width=8) (actual
time=0.020..18.985 rows=10 loops=1)
-> Seq Scan on test_bigint test_bigint_1 (cost=0.00..1444.18 rows=100118
width=8) (actual time=0.019..11.330 rows=10 loops=1)
-> Index Only Scan using test_bigint_pkey on test_bigint (cost=0.29..6.53
rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=10)
Index Cond: (id = test_bigint_1.id)
Heap Fetches: 10
Planning Time: 0.373 ms
Execution Time: 373.440 ms
EXPLAIN

Explain analyze select * from test_uuid where id in (select id from
test_uuid limit 10);

QUERY PLAN
Nested Loop (cost=2790.96..4006.29 rows=50042 width=16) (actual
time=48.251..410.786 rows=10 loops=1)
-> HashAggregate (cost=2790.54..2792.54 rows=200 width=16) (actual
time=48.157..76.176 rows=10 loops=1)
Group Key: test_uuid_1.id
Batches: 1 Memory Usage

Re: Table partition with primary key in 11.3

2019-06-06 Thread Alex V.
>> CREATE TABLE public.test1 (
>>  x1 integer NOT NULL,
>>  x2 integer NOT NULL,
>>  CONSTRAINT test1_pkey PRIMARY KEY (x1) INCLUDE(x2)
>> ) PARTITION BY RANGE (x2);

>> This query works in 11.1 but fails in 11.3 with messages:

>> ERROR: insufficient columns in PRIMARY KEY constraint definition
>> DETAIL: PRIMARY KEY constraint on table "test1" lacks column "x2" which 
>> is part of the partition key.
>> SQL state: 0A000

> Indeed, that primary key is no good.  It was a bug that 11.1
> allowed it, which was fixed here:

> Author: Alvaro Herrera 
> Branch: master [0ad41cf53] 2019-01-14 19:28:10 -0300
> Branch: REL_11_STABLE Release: REL_11_2 [74aa7e046] 2019-01-14 19:25:19 -0300

> Fix unique INCLUDE indexes on partitioned tables

> We were considering the INCLUDE columns as part of the key, allowing
> unicity-violating rows to be inserted in different partitions.

> Concurrent development conflict in eb7ed3f30634 and 8224de4f42cc.

> Reported-by: Justin Pryzby
> Discussion: https://postgr.es/m/20190109065109.ga4...@telsasoft.com

>   regards, tom lane

I think that your position about primary keys in partitional tables is not 
right.

If we see regular table, one-field primary key is cross-table unique.
In partitional tables for users view we MUST also seen unique one-field primary 
key because this is user requirement and another keys can destroy logic between 
regular and partitional tables and functionality of partitioning becomes 
useless.
For administrators of table we not-MAY, but MUST see one-field unique primary 
key in cross-table realisation.
All another realizations are plugging holes in a hurry and non-logical at 
global review of engine.




Question on Partition key

2023-09-02 Thread veem v
Hello Friends,
We are trying to create a monthly range partition table , partitioned on
column  PART_DATE. This will hold Orders and part_date is nothing but
invoice date. Some Team mates are asking to use the "PART_DATE" column as
data type "INTEGER" with "MM" format and also define partitions as
below. Want to know experts' views on this. If the data type of the
partition key matters here or not? Or if there is any downside of each
approach in future?

-- With date data type, It will look like as below
CREATE TABLE TAB1(  COLUMN1 VARCHAR(36) NOT NULL ,   PART_DATE DATE NOT
NULL )PARTITION BY RANGE (PART_DATE);

CREATE TABLE  TAB1_202309  PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01')
TO ('2023-10-01');
CREATE TABLE  TAB1_202310  PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01')
TO ('2023-11-01');
CREATE TABLE  TAB1_202311  PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01')
TO ('2023-12-01');

ALTER TABLE TAB1  ADD  CONSTRAINT PK_TAB1 PRIMARY KEY  ( COLUMN1 ,
PART_DATE  );

VS

-- With integer data type,  It will look like as below

CREATE TABLE TAB1(  COLUMN1 VARCHAR(36) NOT NULL ,   PART_DATE_YM_NM
Integer NOT NULL  )  PARTITION BY RANGE (PART_DATE_YM_NM);
CREATE TABLE  TAB1_202309  PARTITION OF TAB1 FOR VALUES FROM ('202309') TO
('202310');
CREATE TABLE  TAB1_202310  PARTITION OF TAB1 FOR VALUES FROM ('202310') TO
('202311');
CREATE TABLE  TAB1_202311  PARTITION OF TAB1 FOR VALUES FROM ('202311') TO
('202312');

ALTER TABLE TAB1   ADD  CONSTRAINT PK_TAB1 PRIMARY KEY  ( COLUMN1 ,
PART_DATE_YM_NM  );


Re: Question on Partition key

2023-09-03 Thread veem v
Thank you so much for the clarification.

Actually team have used similar partitioning strategy on integer columns in
past. So they are inclined towards that. I will still, double check with
others if any business restrictions exists. But as you already mentioned,
it's not good in terms of data quality perspective. I agree to this point.

Additionally, is it true that optimizer will also get fooled on getting the
math correct during cardinality estimates, as because there is a big
difference between , comparing or substracting, two date values VS two
number values. And storing the dates in the number columns will pose this
problem for the optimizer. Is my understanding correct here?

On Sun, 3 Sept, 2023, 2:02 pm Deep,  wrote:

> Have your friends also mentioned how it is going to help to convert date
> field to integer !???
>
> On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold  wrote:
>
>> On 03/09/2023 00:35 CEST veem v  wrote:
>>
>> > We are trying to create a monthly range partition table , partitioned on
>> > column PART_DATE. This will hold Orders and part_date is nothing but
>> invoice
>> > date. Some Team mates are asking to use the "PART_DATE" column as data
>> type
>> > "INTEGER" with "MM" format [...]
>>
>> Why do your team mates favor integer over date?
>>
>> > Want to know experts' views on this. If the data type of the partition
>> key
>> > matters here or not?
>>
>> Both integer and date are stored as 4 bytes.  There should be no
>> difference
>> regarding index size.  I don't know if the data type makes a difference in
>> partition pruning performance in this case, but I'd be surprised if it
>> were
>> the case.
>>
>> > Or if there is any downside of each approach in future?
>>
>> The downside of integer is that it allows invalid dates (e.g. 202313)
>> unless
>> you also add check constraints.  But then just use date if you want to
>> store
>> dates.  You get input validation and can use the date operators and
>> functions
>> that Postgres offers.
>>
>> --
>> Erik
>>
>>
>>


Database selection

2023-09-20 Thread veem v
Hello All,

I see multiple docs on the internet, stating the difference between Mysql
and postgres. But I want to understand the real views of the experienced
folks here.

While we are on AWS and planning to opt for one of the relational databases
out of mysql and postgres for our application(It is a Hybrid type, mostly
dealing with batch processing but also supporting OLTP type UI screens etc.
and ACID transactions preferred).

What all things/parameters should we keep in mind here for opting a
database? or in other words how to evaluate the suitable database for our
application?

Regards
Veem


Re: Database selection

2023-09-20 Thread veem v
Thank you.

Yes feature wise postgres seems rich as I read multiple blogs. And right
now, I can't think of any other reason for opting mysql though as opposed
to postgres.

However, One of the things regarding the transaction management, which I
see in most postgres blogs is 'mvcc'(multiversion concurrency control)
being an issue at times in a heavy concurrent system, in postgres as its
maintaining exact copies of all the old versions of the rows(if someone
still reading those) and at times cleaning these(vacuuming) becomes a pain.

Does AWS aurora postgres depend on the same vacuuming technology for
maintaining the transactions?
Does Aurora Mysql opt for a similar strategy for transaction management? or
any different/better ones?

On Thu, 21 Sept 2023 at 02:09, Laurenz Albe 
wrote:

> On Thu, 2023-09-21 at 01:17 +0530, veem v wrote:
> > I see multiple docs on the internet, stating the difference between
> Mysql and postgres.
> > But I want to understand the real views of the experienced folks here.
> >
> > While we are on AWS and planning to opt for one of the relational
> databases out of
> > mysql and postgres for our application(It is a Hybrid type, mostly
> dealing with
> > batch processing but also supporting OLTP type UI screens etc. and ACID
> transactions preferred).
> >
> > What all things/parameters should we keep in mind here for opting a
> database?
> > or in other words how to evaluate the suitable database for our
> application?
>
> You are asking the wrong people.  We here on the PostgreSQL mailing list
> will tell
> you that you shouldn't touch MySQL except with a long stick, and for
> almost all
> use cases PostgreSQL is preferable.  The people on MySQL mailing lists or
> forums
> might tell an entirely different story.
>
> I would say something about license and free software, but if you plan to
> lock
> yourself into the cage of a cloud hosted database, that probably doesn't
> matter much.
>
> Yours,
> Laurenz Albe
>


Question on Aurora postgres

2023-10-31 Thread veem v
Hello all,
We are planning to use aurora Postgres for a few applications. But wanted
some guidance on which instance, class type should we use for lower
environment/non prod like Dev/Qa/Uat/perf and what should we use for
production database? Is there some recommendation based on usage etc. for
this?

Regards
Veem


Re: Question on Aurora postgres

2023-10-31 Thread veem v
Thank you so much. So basically this application is an enterprise app with
24/7 users(doing DML and querying both) and is currently in Oracle on
premise database. It has data in the range of ~20-30TB and the queries will
be happening 24/7 on this database. It's an OLTP app. And ofcourse lower
environment Dev/qa/uat doesn't have those kinds of usage and data volume as
prod.

In the document below , I see a lot of instance types like Memory optimized
X family, R family and T based instance class. So I'm a bit confused, how
to get started with it, and if we can move from one type to
another seamlessly if required in future?

Will the cpu and memory, storage mentioned against each of these instance
types be directly proportional to the CPU and memory size, storage size of
data which we currently have on our on-premise oracle database?

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html

On Wed, 1 Nov 2023 at 04:55, Ben Chobot  wrote:

> veem v wrote on 10/31/23 2:49 PM:
>
> Hello all,
> We are planning to use aurora Postgres for a few applications. But wanted
> some guidance on which instance, class type should we use for lower
> environment/non prod like Dev/Qa/Uat/perf and what should we use for
> production database? Is there some recommendation based on usage etc. for
> this?
>
>
> Regardless of if you are going to be making databases in a cloud provider
> or on physical hardware, the first step is to figure out what your load
> will likely be and what variables will be influencing it. Will you have a
> lot of data? Will it churn very much? Will you have a lot of concurrent
> reads? Will they be simple queries or memory intensive ones?
>
> Only you can answer these questions, and once you have, the many (many)
> options available to you will start to separate out and there will be some
> obvious things to try. One of the great things about making databases in
> the cloud is that you aren't locked into any sizing choice like you can by
> if you go build a server.
>


Question on overall design

2023-12-09 Thread veem v
Hello All,
Although it's not exactly related to opensource postgre but want to ask
this question here to understand colleagues' view, considering  having
decades of experience in the database world, We want some guidance, if the
below design looks okay for our customer use case.

We currently have financial systems transaction data streams to Oracle
exadata(X9) on-premise. This database supports processing of 400million
transactions per day. A single transaction for us is a combination of 7-8
inserts into different transaction tables with Indexes , unique constraints
etc defined on those. The transactions processed/committed in batches(~1000
batch size) in the database. And this system persists data for ~6 months.
We do have all sorts of OLAP(daily/monthly batch reports running)
applications run on the same database along with some user facing UI
applications showing customer transactions. So it's basically currently
serving a hybrid workload and is one stop solution for all use cases.

Many of the applications are moving from on premise to AWS cloud as part of
modernization journey and AWS being chosen cloud partner also the product
is expected to expand across more regions and this system is expected to
serve increase in the transaction volume. And also we have a requirement to
persist transaction data for ~10years to have those available for
analytics/data science use cases.

So the team is thinking of splitting it into two parts
1)OLTP type use case in which we will persist/write the transaction data
faster and show it to the UI related apps , in near real time/quickest
possible time. and this database will store Max 60-90 days of transaction
data. Not sure if we have an option of Oracle exadata equivalent on AWS, so
team planning of using/experimenting with Aurora postgres. Please correct
me, if there are any other options we should use otherwise?

2)Then move the data beyond ~90 days into another database or object
storage S3 which will keep it there for ~10 years and will be queryable
using the necessary API's. That is supposed to cater to Olap/analytics/data
science use cases etc.

Is the above design is okay? and also in regards to the second point above
i.e. persisting the historical data (that to be in queryable state), should
we go for some database like snowflake or should just keep it on S3 as is
and make those queryable through APIs. Please advice?


Re: Question on overall design

2023-12-09 Thread veem v
Ron Johnson 
 wrote:

> "OK" is relative, but it's what we did in a similar situation: two years
> of data on-line and 5 years of data in compressed files in S3.  (We're
> required to keep data for 7 years, but they *never* ask for records more
> than 2 years old.  If they ever do, we'll manually load the relevant data
> back into PG.)
> (I can't imagine that querying billions of unindexed flat-file records via
> S3 would be fast.)
> How often do end users look for data more than 90 days old?  Two years old?
> How quickly does the old data need to be available?
> Maybe a four-tiered system of PG-Snowflake-S3-S3_Glacier would balance
> speed and cost (or maybe Snowflake would just drive up costs).


Thank You so much Ron.

When you said "*two years of data on-line and 5 years of data in compressed
files in S3*." So do you mean two years of data in aurora postgre which
will be OLTP database and rest just dump as is in S3 bucket and when we
need any such query for those data , just simply read and dump those S3
files back in the aurora postgre?

In the currently running oracle exadata system , it has SIX months of data
(which is ~200TB) and the transaction tables are all range partitioned on a
daily basis. And out of that ~2months of data gets frequently queried and
other ~4months of data gets queried less frequently.  However, in the
target architecture which we want to build on cloud here, there are some
requirements for the analytics/data science team to query ~3years history.
Beyond ~3years we may need that data rarely.

We were initially thinking of just having one database to serve both OLTP
and OLAP use cases(somewhat like oracle exadata currently doing for us) but
it seems we don't have such an option here on AWS. Postgre will serve OLTP
use case whereas Snowflake will serve OLAP use case.

So do you suggest having both the databases in use, like recent transaction
data for last 3 months should be streamed to aurora postgre, then  from
3months till 3years of data should be parked in snowflake which will serve
OLAP/analytics use case. and from 3years till 10years will be kept in S3
(as parquet or Iceberg format) so that even Snowflake can query those
directly when needed.

OR

Do you suggest just keeping last ~3months of data on Aurora postgre and
rest everything on snowflake considering it will store those as compressed
format and also storage is cheap(23$ per TB per month)?

Few colleagues are pointing to databricks for the analytics use case. Is
that a good option here?




On Sat, 9 Dec 2023 at 16:43, veem v  wrote:

> Hello All,
> Although it's not exactly related to opensource postgre but want to ask
> this question here to understand colleagues' view, considering  having
> decades of experience in the database world, We want some guidance, if the
> below design looks okay for our customer use case.
>
> We currently have financial systems transaction data streams to Oracle
> exadata(X9) on-premise. This database supports processing of 400million
> transactions per day. A single transaction for us is a combination of 7-8
> inserts into different transaction tables with Indexes , unique constraints
> etc defined on those. The transactions processed/committed in batches(~1000
> batch size) in the database. And this system persists data for ~6 months.
> We do have all sorts of OLAP(daily/monthly batch reports running)
> applications run on the same database along with some user facing UI
> applications showing customer transactions. So it's basically currently
> serving a hybrid workload and is one stop solution for all use cases.
>
> Many of the applications are moving from on premise to AWS cloud as part
> of modernization journey and AWS being chosen cloud partner also the
> product is expected to expand across more regions and this system is
> expected to serve increase in the transaction volume. And also we have a
> requirement to persist transaction data for ~10years to have those
> available for analytics/data science use cases.
>
> So the team is thinking of splitting it into two parts
> 1)OLTP type use case in which we will persist/write the transaction data
> faster and show it to the UI related apps , in near real time/quickest
> possible time. and this database will store Max 60-90 days of transaction
> data. Not sure if we have an option of Oracle exadata equivalent on AWS, so
> team planning of using/experimenting with Aurora postgres. Please correct
> me, if there are any other options we should use otherwise?
>
> 2)Then move the data beyond ~90 days into another database or object
> storage S3 which will keep it there for ~10 years and will be queryable
> using the necessary API's. That is supposed to cater to Olap/analytics/data
> science use cases etc.
>
> Is the above des

Re: Question on overall design

2023-12-09 Thread veem v
Thank you so much for the response.

Got your point, will check if we really need details or summary for the
historical data. But it looks like we will need detailed transaction data
for ~2 years at least.

My understanding was that AWS has two different offerings and "aurora
postgresql" is more performant and costlier than "RDS postgresql". Will
double check on this though.

However , how to prove RDS/aurora postgresql is going to serve our OLTP
requirement here , similar to the current on premise Oracle exadata. For
the OLTP use case we are expecting ~15K TPS write and 2K TPS read and the
response for the UI queries are expected to be within subseconds. But yes,
as i mentioned individual transactions will be batched and then will be
written to the database, so this should have lesser resource consumption
and contention created.

To test if Aurora postgresql will be comparable to cater the above needs
(in regards to the expected performance with nominal cost) ,how should we
test it? As we won't be able to test everything right away, Should we test
basic read and write performance and benchmark to have some confidence and
go ahead with development?

Say for example if one transaction consists of ~8 Inserts we can create a
sample target table on aurora Postgre with required indexes/constraints and
try running those inserts from multiple threads(for concurrency) using
blazemeter and see/compare the response time, CPU, IO, Memory usage for the
Postgresql database with set TPS. Similarly to see read performance, we can
run multiple select queries from blazemeter and compare the response time.
Is this the correct approach for validating the database here or any other
approach exists?

And another question coming to mind, I read in past Vaccum to be a problem
in postgresql, is it going to give trouble in Aurora postgresql too, for
such a highly transactional read/write system? How to test/validate that?

On Sun, 10 Dec 2023 at 01:29, Ron Johnson  wrote:

> On Sat, Dec 9, 2023 at 2:13 PM veem v  wrote:
>
>>
>> Ron Johnson 
>>  wrote:
>>
>>> "OK" is relative, but it's what we did in a similar situation: two years
>>> of data on-line and 5 years of data in compressed files in S3.  (We're
>>> required to keep data for 7 years, but they *never* ask for records
>>> more than 2 years old.  If they ever do, we'll manually load the relevant
>>> data back into PG.)
>>> (I can't imagine that querying billions of unindexed flat-file records
>>> via S3 would be fast.)
>>> How often do end users look for data more than 90 days old?  Two years
>>> old?
>>> How quickly does the old data need to be available?
>>> Maybe a four-tiered system of PG-Snowflake-S3-S3_Glacier would balance
>>> speed and cost (or maybe Snowflake would just drive up costs).
>>
>>
>> Thank You so much Ron.
>>
>> When you said "*two years of data on-line and 5 years of data in
>> compressed files in S3*." So do you mean two years of data in aurora
>> postgre which will be OLTP database and rest just dump as is in S3 bucket
>> and when we need any such query for those data , just simply read and dump
>> those S3 files back in the aurora postgre?
>>
>
> (RDS Postgresql, not Aurora, but that's beside the point.)
>
> Yes.
>
> But keep reading...
>
> In the currently running oracle exadata system , it has SIX months of data
>> (which is ~200TB) and the transaction tables are all range partitioned on a
>> daily basis. And out of that ~2months of data gets frequently queried and
>> other ~4months of data gets queried less frequently.  However, in the
>> target architecture which we want to build on cloud here, there are some
>> requirements for the analytics/data science team to query ~3years history.
>> Beyond ~3years we may need that data rarely.
>>
>> We were initially thinking of just having one database to serve both OLTP
>> and OLAP use cases(somewhat like oracle exadata currently doing for us) but
>> it seems we don't have such an option here on AWS. Postgre will serve OLTP
>> use case whereas Snowflake will serve OLAP use case.
>>
>> So do you suggest having both the databases in use, like recent
>> transaction data for last 3 months should be streamed to aurora postgre,
>> then  from 3months till 3years of data should be parked in snowflake which
>> will serve OLAP/analytics use case. and from 3years till 10years will be
>> kept in S3 (as parquet or Iceberg format) so that even Snowflake can query
>> those directly when needed.
>>
>> OR
>>
>> Do you suggest just keeping last ~3months of data on Aurora postgre and
>> rest every

Re: Question on overall design

2023-12-10 Thread veem v
Thank you so much Ron.  I have some more doubts related to this.

We were thinking , if there is any utility in PG with which we can
create/generate large sample data volume which we can use to run it on our
on premise Oracle exadata box and use the same on the aurora postgresql in
cloud to see the read and write performance comparison. Is there any such
exists, to quickly get some large data volume?  But i think , what you are
saying is, we should use real data to get actual or closest possible
benchmarks, correct me if wrong?

We used to see the data dictionary views (called AWR views) in Oracle to
see the current and historical performance statistics like CPU, IO , Memory
usage, object level contentions etc. in the oracle database. Do we have
such a thing available in Aurora postgre, so as to monitor the
performance and get some idea of how well the load test goes and what
capacity is available or are we saturating it?

When you said "*Beware of excessive partitioning.  We had to "departion"
most tables, because of the impact on performance.*" , as I understand
partitioning helps in reducing IO while reading , as it scans less data
because of partition pruning. And while writing there is almost minimal
overhead in identifying the required partition to which the
INSERTed/Updated/Deleted data will be landing ON. So can you please help me
understand what exact performance impact you are highlighting here? Are you
talking about the size of the partitions or total count of the partitions?
In our case we are planning to do either daily range partition or hourly
based on data data volume, not sure if there exists some sweet spot in
regards to what should be the size of each partition in postgresql be. If
you are pointing to higher count of partitions of table , then in our case
if we persist ~90 days data then for a transaction table it would be ~90
daily range partitions or ~2160 hourly range partitions in the aurora
postgresql. It would be helpful , if you can explain a bit regarding what
exact performance impact you faced in regards to the partitioning in aurora
postgresql.

*"Use ora2pg to export views in the Oracle database. It's very easy; a
small EC2 VM running Linux with enough disk space lets you automate the
extraction from Oracle and importation into AWS Postgresql.)"*

Need to explore a bit more on this I believe. We have an oracle on premise
database, so we can move data directly to aurora postgresql in the cloud.
Another thing , is we have some sample data available in the AWS snowflake
but then not sure if some mechanism is available to move the same data to
the aurora postgresql ?

On Sun, 10 Dec 2023 at 02:27, Ron Johnson  wrote:

> I don't know anything about Aurora, only have experience with RDS
> Postgresql.
>
> We successfully migrated from on-prem Oracle (12c, I think) to RDS
> Postgresql 12, and were very happy: little down time (I take pride in
> designing/implementing that; thanks, ora2pg!), with lower disk (8TB, down
> to 5TB) and CPU usage.
>
> I'm not sure what the TPS was in Oracle, but the server level "we" are on
> (I'm no longer with that company, and don't remember the specifics (48 vCPU
> / 132 GB RAM, I think, with 10,000 IOPs) is over-powered for *our* needs.
>
> You're going to have to spin up a full-sized instance, import a *lot* of
> real data(*) into a production-identical schema and then run your batch
> load process using test data (or copies of real batches).  That's the only
> way you'll *really* know.
>
> Beware of excessive partitioning.  We had to "departion" most tables,
> because of the impact on performance.
>
> (*) Use ora2pg to export views in the Oracle database.  It's *very* easy;
> a small EC2 VM running Linux with enough disk space lets you automate the
> extraction from Oracle and importation into AWS Postgresql.)
>
> On Sat, Dec 9, 2023 at 3:36 PM veem v  wrote:
>
>> Thank you so much for the response.
>>
>> Got your point, will check if we really need details or summary for the
>> historical data. But it looks like we will need detailed transaction data
>> for ~2 years at least.
>>
>> My understanding was that AWS has two different offerings and "aurora
>> postgresql" is more performant and costlier than "RDS postgresql". Will
>> double check on this though.
>>
>> However , how to prove RDS/aurora postgresql is going to serve our OLTP
>> requirement here , similar to the current on premise Oracle exadata. For
>> the OLTP use case we are expecting ~15K TPS write and 2K TPS read and the
>> response for the UI queries are expected to be within subseconds. But yes,
>> as i mentioned individual transactions will be batched and then will be
>> written to the database, so 

Re: Question on overall design

2023-12-11 Thread veem v
Thank you for your response.

Apology if it sounds silly, but is it advisable to use just one database in
this flow, say for e.g snowflake for olap usecase and use a cache
layer(like reddish) on top of it to cater our oltp Usecase, rather having a
dedicated oltp database like Aurora postgresql?

Basically wanted to know, if anythings we would be missing if not having an
oltp database here for our usecase.


On Sun, 10 Dec, 2023, 10:25 pm Ron Johnson,  wrote:

> * PG has pgbench; *maybe* you can hack it to work on Oracle.
> * If you want to know how well an RDBMS will work on your workload, then
> you must provide it with a simulated workload.  Right?
> * AWS RDS Postgresql has a dashboard that *might* be similar to AWR.  Or
> it might not...
> * We departitioned because SELECT statements were *slow*.  All partitions
> were scanned, even when the partition key was specified in the WHERE clause.
>
> On Sun, Dec 10, 2023 at 8:45 AM veem v  wrote:
>
>> Thank you so much Ron.  I have some more doubts related to this.
>>
>> We were thinking , if there is any utility in PG with which we can
>> create/generate large sample data volume which we can use to run it on our
>> on premise Oracle exadata box and use the same on the aurora postgresql in
>> cloud to see the read and write performance comparison. Is there any such
>> exists, to quickly get some large data volume?  But i think , what you are
>> saying is, we should use real data to get actual or closest possible
>> benchmarks, correct me if wrong?
>>
>> We used to see the data dictionary views (called AWR views) in Oracle to
>> see the current and historical performance statistics like CPU, IO , Memory
>> usage, object level contentions etc. in the oracle database. Do we have
>> such a thing available in Aurora postgre, so as to monitor the
>> performance and get some idea of how well the load test goes and what
>> capacity is available or are we saturating it?
>>
>> When you said "*Beware of excessive partitioning.  We had to "departion"
>> most tables, because of the impact on performance.*" , as I understand
>> partitioning helps in reducing IO while reading , as it scans less data
>> because of partition pruning. And while writing there is almost minimal
>> overhead in identifying the required partition to which the
>> INSERTed/Updated/Deleted data will be landing ON. So can you please help me
>> understand what exact performance impact you are highlighting here? Are you
>> talking about the size of the partitions or total count of the partitions?
>> In our case we are planning to do either daily range partition or hourly
>> based on data data volume, not sure if there exists some sweet spot in
>> regards to what should be the size of each partition in postgresql be. If
>> you are pointing to higher count of partitions of table , then in our case
>> if we persist ~90 days data then for a transaction table it would be ~90
>> daily range partitions or ~2160 hourly range partitions in the aurora
>> postgresql. It would be helpful , if you can explain a bit regarding what
>> exact performance impact you faced in regards to the partitioning in aurora
>> postgresql.
>>
>> *"Use ora2pg to export views in the Oracle database. It's very easy; a
>> small EC2 VM running Linux with enough disk space lets you automate the
>> extraction from Oracle and importation into AWS Postgresql.)"*
>>
>> Need to explore a bit more on this I believe. We have an oracle on
>> premise database, so we can move data directly to aurora postgresql in the
>> cloud. Another thing , is we have some sample data available in the AWS
>> snowflake but then not sure if some mechanism is available to move the same
>> data to the aurora postgresql ?
>>
>> On Sun, 10 Dec 2023 at 02:27, Ron Johnson 
>> wrote:
>>
>>> I don't know anything about Aurora, only have experience with RDS
>>> Postgresql.
>>>
>>> We successfully migrated from on-prem Oracle (12c, I think) to RDS
>>> Postgresql 12, and were very happy: little down time (I take pride in
>>> designing/implementing that; thanks, ora2pg!), with lower disk (8TB, down
>>> to 5TB) and CPU usage.
>>>
>>> I'm not sure what the TPS was in Oracle, but the server level "we" are
>>> on (I'm no longer with that company, and don't remember the specifics (48
>>> vCPU / 132 GB RAM, I think, with 10,000 IOPs) is over-powered for *our*
>>>  needs.
>>>
>>> You're going to have to spin up a full-sized instance, import a *lot* of
>>> real data(*) 

Read write performance check

2023-12-18 Thread veem v
Hello All,
Its Aurora postgresql database in AWS. We have a couple of custom tables
created with some ~100 odd columns and required indexes and constraints
created on them and that represents a certain business use case. Currently
we don't have any data in them. Requirement is to test read and write
queries performance for this sample use cases/tables. We need to evaluate
both row by row and batch read/write performance. Also performance with
concurrent read/write from multiple threads/sessions.

As i understand below things to be done here,
1)For write performance , the rows needs to be inserted from multiple
sessions at same time, with required random values as per the data types
i.e. Character, Number, date columns. And this needs to be tested for row
by row insert and batched insert.

2)For the read performance test , the table first has to be populated with
those ~100million rows. Then querying will happen on that table row by row
and batched way from the concurrent session.

I am new to postgresql but mostly worked with Oracle, so I wanted to
understand if the above can be achieved by creating a simple procedure or
will a major effort be needed? And I saw a few blogs , pgbench to be used
for concurrency tests. I want to know if this will still work on Aurora
postgresql from intellij client worksheet.

Regards
Veem


Re: Read write performance check

2023-12-19 Thread veem v
Thank you for the confirmation.

 So at first, we need to populate the base tables with the necessary data
(say 100million rows) with required skewness using random functions to
generate the variation in the values of different data types. Then in case
of row by row write/read test , we can traverse in a cursor loop. and in
case of batch write/insert , we need to traverse in a bulk collect loop.
Something like below and then this code can be wrapped into a procedure and
passed to the pgbench and executed from there. Please correct me if I'm
wrong.

Also can you please guide how the batch(say batch size of ~1000) Insert can
be written ?

-- Row by row write
 FOR i IN 1..total_rows LOOP
data_row := (SELECT
md5(random()::TEXT),
floor(random() * 100)::INT,
random() * 1000::NUMERIC,
NOW()::TIMESTAMP
);
INSERT INTO BASE_TABLE(column1, column2, column3, column4)
VALUES (data_row.column1, data_row.column2, data_row.column3,
data_row.column4);
END LOOP;

--Row by row read
BEGIN
FOR i IN 1..total_rows LOOP
-- Row by row read
SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
END LOOP;
END;

-- Batch read
BEGIN
-- Batch read
OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
CLOSE data_set;
END;






On Tue, 19 Dec 2023 at 12:58, Ilya Kosmodemiansky  wrote:

> Hi Veem,
>
> On Tue, Dec 19, 2023 at 7:36 AM veem v  wrote:
> > 1)For write performance , the rows needs to be inserted from multiple
> sessions at same time, with required random values as per the data types
> i.e. Character, Number, date columns. And this needs to be tested for row
> by row insert and batched insert.
> >
> > 2)For the read performance test , the table first has to be populated
> with those ~100million rows. Then querying will happen on that table row by
> row and batched way from the concurrent session.
> >
> > I am new to postgresql but mostly worked with Oracle, so I wanted to
> understand if the above can be achieved by creating a simple procedure or
> will a major effort be needed? And I saw a few blogs , pgbench to be used
> for concurrency tests.
>
> Yes, you are right, pgbench with customized script is what you are looking
> for
>
> >I want to know if this will still work on Aurora postgresql from intellij
> client worksheet.
>
> pgbench would work with aurora as with normal postgres, it is
> basically a shell script which connects to the database. Not sure if
> idea worksheet would help you in such case however, you can run it
> just from any machine with pgbench installed
>
>
> Best regards,
> Ilya
>
>
> --
> Ilya Kosmodemiansky
> CEO, Founder
>
> Data Egret GmbH
> Your remote PostgreSQL DBA team
> T.: +49 6821 919 3297
> i...@dataegret.com
>


Re: Read write performance check

2023-12-19 Thread veem v
Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we able
to reach with both of these row by row and batch read/write test. And then
afterwards, this figure may be compared with other databases etc with
similar setups.

 So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app, but
here, we are mainly wanted to see the database capability, as network will
always play a similar kind of role across all databases. Do you suggest
some other approach to achieve this objective?


On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:

> On 2023-12-20 00:44:48 +0530, veem v wrote:
> >  So at first, we need to populate the base tables with the necessary
> data (say
> > 100million rows) with required skewness using random functions to
> generate the
> > variation in the values of different data types. Then in case of row by
> row
> > write/read test , we can traverse in a cursor loop. and in case of batch
> write/
> > insert , we need to traverse in a bulk collect loop. Something like
> below and
> > then this code can be wrapped into a procedure and passed to the pgbench
> and
> > executed from there. Please correct me if I'm wrong.
>
> One important point to consider for benchmarks is that your benchmark
> has to be similar to the real application to be useful. If your real
> application runs on a different node and connects to the database over
> the network, a benchmark running within a stored procedure may not be
> very indicative of real performance.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Read write performance check

2023-12-19 Thread veem v
Thank you.

That would really be helpful if such test scripts or similar setups are
already available. Can someone please guide me to some docs or blogs or
sample scripts, on same please.

On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:

> As Rob mentioned, the syntax you posted is not correct. You need to
> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
> at one shot.
>
> But again your uses case seems common one considering you want to compare
> the read and write performance on multiple databases with similar table
> structure as per your usecase. So in that case, you may want to use some
> test scripts which others must have already done rather reinventing the
> wheel.
>
>
> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>
>> Thank you.
>>
>> Yes, actually we are trying to compare and see what maximum TPS are we
>> able to reach with both of these row by row and batch read/write test. And
>> then afterwards, this figure may be compared with other databases etc with
>> similar setups.
>>
>>  So wanted to understand from experts here, if this approach is fine? Or
>> some other approach is advisable?
>>
>> I agree to the point that , network will play a role in real world app,
>> but here, we are mainly wanted to see the database capability, as network
>> will always play a similar kind of role across all databases. Do you
>> suggest some other approach to achieve this objective?
>>
>>
>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>>
>>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>>> >  So at first, we need to populate the base tables with the necessary
>>> data (say
>>> > 100million rows) with required skewness using random functions to
>>> generate the
>>> > variation in the values of different data types. Then in case of row
>>> by row
>>> > write/read test , we can traverse in a cursor loop. and in case of
>>> batch write/
>>> > insert , we need to traverse in a bulk collect loop. Something like
>>> below and
>>> > then this code can be wrapped into a procedure and passed to the
>>> pgbench and
>>> > executed from there. Please correct me if I'm wrong.
>>>
>>> One important point to consider for benchmarks is that your benchmark
>>> has to be similar to the real application to be useful. If your real
>>> application runs on a different node and connects to the database over
>>> the network, a benchmark running within a stored procedure may not be
>>> very indicative of real performance.
>>>
>>> hp
>>>
>>> --
>>>_  | Peter J. Holzer| Story must make more sense than reality.
>>> |_|_) ||
>>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>>> __/   | http://www.hjp.at/ |   challenge!"
>>>
>>


Re: Read write performance check

2023-12-21 Thread veem v
Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

On Wed, 20 Dec, 2023, 10:39 am veem v,  wrote:

> Thank you.
>
> That would really be helpful if such test scripts or similar setups are
> already available. Can someone please guide me to some docs or blogs or
> sample scripts, on same please.
>
> On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:
>
>> As Rob mentioned, the syntax you posted is not correct. You need to
>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
>> at one shot.
>>
>> But again your uses case seems common one considering you want to compare
>> the read and write performance on multiple databases with similar table
>> structure as per your usecase. So in that case, you may want to use some
>> test scripts which others must have already done rather reinventing the
>> wheel.
>>
>>
>> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>>
>>> Thank you.
>>>
>>> Yes, actually we are trying to compare and see what maximum TPS are we
>>> able to reach with both of these row by row and batch read/write test. And
>>> then afterwards, this figure may be compared with other databases etc with
>>> similar setups.
>>>
>>>  So wanted to understand from experts here, if this approach is fine? Or
>>> some other approach is advisable?
>>>
>>> I agree to the point that , network will play a role in real world app,
>>> but here, we are mainly wanted to see the database capability, as network
>>> will always play a similar kind of role across all databases. Do you
>>> suggest some other approach to achieve this objective?
>>>
>>>
>>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>>>
>>>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>>>> >  So at first, we need to populate the base tables with the necessary
>>>> data (say
>>>> > 100million rows) with required skewness using random functions to
>>>> generate the
>>>> > variation in the values of different data types. Then in case of row
>>>> by row
>>>> > write/read test , we can traverse in a cursor loop. and in case of
>>>> batch write/
>>>> > insert , we need to traverse in a bulk collect loop. Something like
>>>> below and
>>>> > then this code can be wrapped into a procedure and passed to the
>>>> pgbench and
>>>> > executed from there. Please correct me if I'm wrong.
>>>>
>>>> One important point to consider for benchmarks is that your benchmark
>>>> has to be similar to the real application to be useful. If your real
>>>> application runs on a different node and connects to the database over
>>>> the network, a benchmark running within a stored procedure may not be
>>>> very indicative of real performance.
>>>>
>>>> hp
>>>>
>>>> --
>>>>_  | Peter J. Holzer| Story must make more sense than reality.
>>>> |_|_) ||
>>>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>>>> __/   | http://www.hjp.at/ |   challenge!"
>>>>
>>>


Moving to Postgresql database

2024-01-14 Thread veem v
Hello Experts,
If some teams are well versed with the Oracle database architecture and its
optimizers working and designing applications on top of this. Now moving
same team to work on AWS aurora postgresql databases design/development
projects. Is any key design/architectural changes should the app
development team or the database design team, should really aware about, so
as to take right decision on any new development project in AWS aurora
postgresql database?
Or
Is there any list of differences(as compared to Oracle database) in key
concepts like for example basic design concepts, Normalization,
Partitioning, clustering, backup and recovery, Indexing strategy, isolation
level, performance which one should definitely be aware of?


Query running longer

2024-02-01 Thread veem v
Hello All,
We have the below query which is running for ~45 seconds on postgres aurora
reader instance. I have captured the explain analyze. Want to understand,
where exactly the resources are getting spent and if we can be able to
optimize it further. It's a UI query showing top 50 rows and is supposed to
finish in sub seconds but it takes around ~45 seconds to finish.
Also seeing multiple workers in the plan, does that mean the query is
running in parallel somehow?

explain (analyze,verbose,costs,buffers)  select TAB1.PRI from SCHEMA1.TAB1
TAB1
inner join SCHEMA1.TAB4 TAB4 on TAB4.PRI = TAB1.PRI
inner join SCHEMA1."TAB2" TAB2 on TAB2.PRI = TAB1.PRI
inner join SCHEMA1.TAB3 a2 on a2.AID = TAB2.AID
   where TAB2.MID = X and TAB4.TAB4_code = 'XX'
 and TAB2.TAB2_time between '2024-01-01' and '2024-01-31'
   order by TAB2.TAB2_time desc
   limit 50;

Limit  (cost=13052924.01..13052924.14 rows=50 width=45) (actual
time=45211.971..45224.720 rows=50 loops=1)
"  Output: TAB1.PRI, TAB2.TAB2_time"
  Buffers: shared hit=1980943 read=2335820
  I/O Timings: shared/local read=112477.014
  ->  Sort  (cost=13052924.01..13052924.19 rows=70 width=45) (actual
time=45211.969..45224.713 rows=50 loops=1)
"Output: TAB1.PRI, TAB2.TAB2_time"
Sort Key: TAB2.TAB2_time DESC
Sort Method: top-N heapsort  Memory: 32kB
Buffers: shared hit=1980943 read=2335820
I/O Timings: shared/local read=112477.014
->  Gather  (cost=92917.38..13052921.87 rows=70 width=45) (actual
time=947.004..45221.915 rows=5428 loops=1)
"  Output: TAB1.PRI, TAB2.TAB2_time"
  Workers Planned: 4
  Workers Launched: 4
  Buffers: shared hit=1980943 read=2335820
  I/O Timings: shared/local read=112477.014
  ->  Nested Loop  (cost=91917.38..13051914.87 rows=18
width=45) (actual time=945.946..45195.224 rows=1086 loops=5)
"Output: TAB1.PRI, TAB2.TAB2_time"
Inner Unique: true
Buffers: shared hit=1980943 read=2335820
I/O Timings: shared/local read=112477.014
Worker 0:  actual time=936.808..45193.518 rows=1036
loops=1
  Buffers: shared hit=382606 read=465076
  I/O Timings: shared/local read=22452.028
Worker 1:  actual time=947.246..45194.168 rows=1055
loops=1
  Buffers: shared hit=383165 read=484189
  I/O Timings: shared/local read=22617.135
Worker 2:  actual time=933.623..45192.534 rows=1145
loops=1
  Buffers: shared hit=415758 read=473182
  I/O Timings: shared/local read=22741.488
Worker 3:  actual time=965.639..45193.603 rows=1078
loops=1
  Buffers: shared hit=398009 read=449053
  I/O Timings: shared/local read=1.094
->  Nested Loop  (cost=91916.81..13051828.80 rows=18
width=81) (actual time=945.917..43729.931 rows=1086 loops=5)
"  Output: TAB1.PRI, TAB2.TAB2_time, TAB2.AID"
  Inner Unique: true
  Join Filter: ((TAB4.PRI)::text = (TAB1.PRI)::text)
  Buffers: shared hit=1962289 read=2328363
  I/O Timings: shared/local read=105246.220
  Worker 0:  actual time=936.781..43732.652
rows=1036 loops=1
Buffers: shared hit=379077 read=463587
I/O Timings: shared/local read=21008.508
  Worker 1:  actual time=947.212..43699.507
rows=1055 loops=1
Buffers: shared hit=379573 read=482704
I/O Timings: shared/local read=21142.572
  Worker 2:  actual time=933.589..43696.710
rows=1145 loops=1
Buffers: shared hit=411836 read=471634
I/O Timings: shared/local read=21266.581
  Worker 3:  actual time=965.608..43768.535
rows=1078 loops=1
Buffers: shared hit=394288 read=447583
I/O Timings: shared/local read=20814.288
  ->  Parallel Hash Join
 (cost=91916.24..13051765.39 rows=18 width=117) (actual
time=945.879..42758.939 rows=1086 loops=5)
"Output: TAB4.PRI, TAB2.TAB2_time,
TAB2.PRI, TAB2.AID"
Hash Cond: ((TAB2.PRI)::text =
(TAB4.PRI)::text)
Buffers: shared hit=1943792 read=2322814
I/O Timings: shared/local read=100496.787
Worker 0:  actual time=936.743..42798.247
rows=1036 loops=1
  Buffers: shared hit=375573 read=462501
  I/O Timings: shared/local 

Re: Query running longer

2024-02-01 Thread veem v
On Fri, 2 Feb 2024 at 02:43, Laurenz Albe  wrote:

> On Fri, 2024-02-02 at 02:27 +0530, veem v wrote:
> > We have the below query which is running for ~45 seconds on postgres
> aurora reader instance.
> > I have captured the explain analyze. Want to understand, where exactly
> the resources are
> > getting spent and if we can be able to optimize it further.
>
> Aurora <> PostgreSQL, but here is what I can see:
>
> - The index scan on SCHEMA1."TAB2" has to check 2 million extra
>   rows because "work_mem" is too small.  Almost the complete time
>   is spent there.
>
> - You may be getting a bad plan, because the statistics on
>   SCHEMA1.TAB4 are either out of date or not detailed enough,
>   which makes PostgreSQL underestimate the result size.
>
>
Thank you so much.

I can read and understand the execution plan in Oracle database easily and
can also interpret database parameters too. But new to postgres, are there
some good resources to quickly get up to speed to understand the postgres
execution path here and also various key parameters(like for e.g. one you
mentioned as work_mem)?

As you mentioned below, So wondering how you got to know, if this is the
step where majority of the DB resources get spent. And as total time the
query ran was ~45 seconds, and out of that how much time it spent in this
step, how can i get that? And to fix this one line with regards to TAB2,
should we increase "work_mem" here?



*- The index scan on SCHEMA1."TAB2" has to check 2 million extra  rows
because "work_mem" is too small.  Almost the complete time  is spent there.*

 ->  Parallel Bitmap Heap Scan on SCHEMA1.""TAB2"" TAB2
 (cost=84860.50..13040301.00 rows=1175611 width=80) (actual
time=713.054..26942.082 rows=956249 loops=5)"
" Output: TAB2.TAB2_time, TAB2.PRI, TAB2.AID"
 Recheck Cond: (TAB2.MID = 'X'::numeric)
 Rows Removed by Index Recheck: 2137395
 Filter: ((TAB2.TAB2_time >= '2024-01-01 00:00:00+00'::timestamp with time
zone) AND (TAB2.TAB2_time <= '2024-01-31 00:00:00+00'::timestamp with time
zone))
 Heap Blocks: exact=5300 lossy=782577
 Buffers: shared hit=1651569 read=2245157
 I/O Timings: shared/local read=29063.286

Another point you mentioned as below , for this , should we run vacuum
analyze on the table TAB4?



*- You may be getting a bad plan, because the statistics on  SCHEMA1.TAB4
are either out of date or not detailed enough,  which makes PostgreSQL
underestimate the result size.*

Output: TAB4.PRI
 Buckets: 262144 (originally 2048)  Batches: 1 (originally 1)  Memory
Usage: 13936kB
 Buffers: shared hit=134917
 Worker 0:  actual time=214.981..214.982 rows=27779 loops=1
Buffers: shared hit=27133
 Worker 1:  actual time=215.455..215.456 rows=27805 loops=1
Buffers: shared hit=27159
 Worker 2:  actual time=215.774..215.774 rows=27330 loops=1
Buffers: shared hit=26706
 Worker 3:  actual time=215.776..215.777 rows=26880 loops=1
Buffers: shared hit=26245
 ->  Parallel Bitmap Heap Scan on SCHEMA1.TAB4 TAB4  (cost=26.39..7042.63
rows=1049 width=37) (actual time=23.650..201.606 rows=27613 loops=5)
Output: TAB4.PRI
Recheck Cond: ((TAB4.TAB4_code)::text = 'XX'::text)
Rows Removed by Index Recheck: 616610
Heap Blocks: exact=11978 lossy=15624
Buffers: shared hit=134917
Worker 0:  actual time=20.627..199.852 rows=27779 loops=1
 Buffers: shared hit=27133
Worker 1:  actual time=21.065..199.786 rows=27805 loops=1
 Buffers: shared hit=27159
Worker 2:  actual time=21.445..198.582 rows=27330 loops=1
 Buffers: shared hit=26706
Worker 3:  actual time=21.470..195.915 rows=26880 loops=1
 Buffers: shared hit=26245
->  Bitmap Index Scan on TAB4_idx1  (cost=0.00..25.95 rows=1784 width=0)
(actual time=23.938..23.938 rows=138067 loops=1)
 Index Cond: ((TAB4.TAB4_code)::text = 'XX'::text)
 Buffers: shared hit=72


>
>


Question on partitioning

2024-02-04 Thread veem v
Hello All,
In postgresql, Is it possible to partition an existing nonpartitioned table
having data already residing in it and indexes and constraints defined in
it, without the need of manually moving the data around, to make it faster?
Similarly merging multiple partitions to one partition or splitting a
single partition into multiple partitions?
Appreciate any reference doc or sample syntax for the same.

Thanks and Regards
Veem


Why doesn't Postgres apply limit on groups when retrieving N results per group using WHERE IN + ORDER BY

2024-02-05 Thread Sean v
This is related to a question I asked on dbs.stackexchange.com:
https://dba.stackexchange.com/questions/335501/why-doesnt-postgres-apply-limit-on-groups-when-retrieving-n-results-per-group

But to reiterate - I have a query like this:

SELECT "orders".*

FROM "orders"

WHERE (user_id IN ?, ?, ?)

ORDER BY "orders"."created_at" LIMIT 50

I have two indexes - `(user_id)` and `(user_id, created_at)`. Only the
first index is ever used with this query.

If I query for a specific user_id, as expected it uses the composite index.
If I use an inner select and trick the query planner into doing a nested
loop for each user, it no longer uses the composite index.

I can understand that the query planner would not know ahead of time which
users would have those 50 newest orders.

I imagined that it would be clever enough to determine that only 50 results
are needed, and that it could use the `(user_id, created_at)` index to get
50 orders for each user. Then sort and filter those few hundred results in
memory.

Instead what I'm seeing is that it gets all orders for each user using the
`user_id` index and then sorts/filters them all in memory.

Here is an example query plan:

Limit  (cost=45271.94..45272.06 rows=50 width=57) (actual
time=13.221..13.234 rows=50 loops=1)
  Buffers: shared hit=12321
  ->  Sort  (cost=45271.94..45302.75 rows=12326 width=57) (actual
time=13.220..13.226 rows=50 loops=1)
  Sort Key: orders.created_at
  Sort Method: top-N heapsort Memory: 36kB
Buffers: shared hit=12321
->  Bitmap Heap Scan on orders orders  (cost=180.85..44862.48
rows=12326 width=57) (actual time=3.268..11.485 rows=12300 loops=1)
Recheck Cond: (orders.user_id = ANY
('{11,1000,3000}'::bigint[]))
Heap Blocks: exact=12300
  Buffers: shared hit=12321
  ->  Bitmap Index Scan on index_orders_on_user_id
 (cost=0.00..177.77 rows=12326 width=0) (actual time=1.257..1.258
rows=12300 loops=1)
  Index Cond: (orders.user_id = ANY
('{11,1000,3000}'::bigint[]))
Buffers: shared hit=21
Planning:
  Buffers: shared hit=6
Execution time: 13.263 ms

The table I'm querying has roughly 50,000,000 orders, with an even
distribution of ~4000 orders per user.

I have found that I can speed this up significantly using CROSS JOIN
LATERAL and it will use the composite index, but I'm struggling to
understand WHY the CROSS JOIN LATERAL is needed here for it to use the
index.

I've tried tweaking costs, disabling bitmap scans, etc, so it seems like
this is a functional limitation rather than something to do with
cost/statistics.

So my question is twofold:
- why doesn't Postgres use the composite index, and then retrieve only the
minimum necessary amount of rows (50 per user) using the query I posted
above?

 - If it is a functional limitation, is it lack of implementation, or is
there a deeper incompatibility with how the query planner works that would
prevent it from being able to do this?

Thanks!


Re: Question on partitioning

2024-02-05 Thread veem v
On Mon, 5 Feb 2024 at 17:52, Laurenz Albe  wrote:

> On Mon, 2024-02-05 at 03:09 +0530, veem v wrote:
> > In postgresql, Is it possible to partition an existing nonpartitioned
> table having data
> > already residing in it and indexes and constraints defined in it,
> without the need of
> > manually moving the data around, to make it faster? Similarly merging
> multiple partitions
> > to one partition or splitting a single partition into multiple
> partitions?
>
> There is no way to do that.
>
> Yours,
> Laurenz Albe


Thank you very much Laurenz.

Actually in other databases (for example like in Oracle) there exists sql
syntax to split one partition into multiple and merge multiple partitions
back to one. So I was hoping there may be some way to do it in postgres.
Anyway, thanks for clarifying my doubt on this.

Which means for any such operation we need to create a new partition table
with that structure and load that with the data from the existing partition
table. Also for making or converting a non partition table to a partitioned
one , we have to first create the blank partition table structure as per
our need and then pump the data from the non partition table to the newly
created partitioned table, which means we need to take some downtime to
switch from non partitioned table to partitioned one. Please correct if
wrong.

Additionally I see a lot of other restrictions like
1)When creating indexes on the partition table "concurrently" keywords are
not allowed.
2)While creating foreign key ,  it does not allow a "not valid" clause if
the table is partitioned.
3) While creating indexes on this table or running any ALTER command, the
SELECT queries running from other processes run longer. Does it take any
lock while doing DDL on the base table in postgres?


On Mon, 5 Feb 2024 at 17:52, Laurenz Albe  wrote:

> On Mon, 2024-02-05 at 03:09 +0530, veem v wrote:
> > In postgresql, Is it possible to partition an existing nonpartitioned
> table having data
> > already residing in it and indexes and constraints defined in it,
> without the need of
> > manually moving the data around, to make it faster? Similarly merging
> multiple partitions
> > to one partition or splitting a single partition into multiple
> partitions?
>
> There is no way to do that.
>
> Yours,
> Laurenz Albe
>


Re: How to do faster DML

2024-02-05 Thread veem v
I have not much experience with postgres on this scale though, others may
provide better suggestions. However, with this scale you will have a single
table size ~30TB+. And someone just pointed out in this thread,  ~180
partitions/table as the good to have limit,  and if that is true it would
be ~170GB per partition. Looks bulky for a system where readresponse time
is expected to be in milliseconds.

On Mon, 5 Feb 2024 at 16:51, Lok P  wrote:

>
>
> On Mon, 5 Feb, 2024, 2:21 am Lok P,  wrote:
>
>
> On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane 
> wrote:
>
> Partitioning will depend a lot on how you access the data. Is it always
> using that transaction_date column in the where clause? Can you share the
> table definition via \d? We've talked about this table quite a bit, but not
> actually seen what it looks like. We are flying blind a little bit. You
> mentioned your queries are slow, but can you share an explain analyze on
> one of these slow queries?
>
> 45 billion rows is quite manageable. How many rows are in each day? You
> may want to do a larger bucket than 24 hours per partition.
>
> 1)Even after partitioning the target table , to speed up the data load on
> this table , Is there an option to disable the primary and foreign keys and
> re-enable them post data load finishes. Will that be a good idea
>
>
> No.
>
> 3)As the size of the table or each partition is going to be very large and
> this will be a read intensive application, compressing the historical
> partition will help us save the storage space and will also help the read
> queries performance
>
>
> I am not sure what this means. If you are not going to need the data
> anymore, dump the data to deep storage and drop the partition.
>
> Cheers,
> Greg
>
>
>
> Thank you.
>
> The table has ~127 columns of different data types , combinations of
> Numeric, varchar, date etc. And is having current size ~1TB holding
> ~3billion rows currently and the row size is ~300bytes.
>
> Currently it has lesser volume , but in future the daily transaction per
> day which will be inserted into this table will be Max ~500million
> rows/day. And the plan is to persist at least ~3months of transaction data
> which will be around 45billion rows in total. And if all works well , we
> may need to persist ~6 months worth of data in this database in future and
> that will be ~90 billion.
>
> This table will always be queried on the transaction_date column as one of
> the filters criteria. But the querying/search criteria can span from a day
> to a month worth of transaction date data.
>
> When you said "*You may want to do a larger bucket than 24 hours per
> partition.*", do you mean to say partition by weekly or so? Currently as
> per math i.e. 1TB of storage for ~3billion rows. So the daily range
> partition size( to hold ~500million transactions/day) will be around
> ~100-150GB. Won't that be too much data for a single partition to operate
> on, and increasing the granularity further(say weekly) will make the
> partition more bulkier?
>
> What I mean was, we will definitely need the data for querying purposes by
> the users, but just to keep the storage space incontrol (and to help the
> read queries), we were thinking of having the historical partitions
> compressed. And for that , if any compression strategy should we follow on
> postgres?
>
> With regards to loading data to the table faster, wondering why you said '
> *NO*' to load the data first and enabling/Creating the Primary key and
> Foreign key constraint later approach. Because this table is a child table
> and the parent is already having data in it, loading data to this table in
> presence of PK and FK makes it too slow as it tries to validate those for
> each set of rows. So we were thinking if doing it at a later stage at
> oneshot will be a faster approach. Please suggest.
>
> I will try to collect some SELECT query and post the explain analyze.
> Currently we are trying to get rid of the duplicates.
>
> Regards
> Lok
>
>
> Any thoughts, based on above usage pattern?
>
> While I did the maths based on the current stats with limited data sets.
> The partitions size coming as 100 to 150gb as I explained above, if we keep
> it daily range partition. Should we have to make it hourly then?
>
>  So want some experts view if this going to work fine for a read latency
> intensive applications or we should have some different strategy?
>


Re: How to do faster DML

2024-02-05 Thread veem v
On Tue, 6 Feb 2024 at 10:45, Lok P  wrote:

> Thank you Greg.
>
> *"and keeping all your active stuff in cache. Since you have 127 columns,
> only pull back the columns you absolutely need for each query."*
>
> Understood the point. As postgres is a "row store" database, so keeping
> the size of the row lesser by making the number of columns to minimum,
> should definitely help as more number of rows can be packed then into one
> page or block (which I believe is 8KB default size per block). And that
> mean more number of blocks can be cached while reading, so better cache hit
> ratio.
>
> As you rightly pointed out, Another thing I noticed the shared_buffer
> parameters set as 2029684 in this instance, which comes to ~21MB and that
> seems very small for a database operating in large scale. And I see we have
> RAM in the instance showing as ~256GB. So thinking of bumping it to
> something as ~40-50GB. Hope that will help to some extent. Not sure if
> there is methods to manually,  cache some objects(tables/indexes) which
> were getting used frequently by the read queries.
>
>
> On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane 
> wrote:
>
>>
>> So when you said *"In that case, and based on the numbers you provided,
>>> daily partitioning seems a decent solution."*
>>> , does it mean that size of the partition (which will be ~100-150GB per
>>> partition here) is not an issue for the postgres to serve read latency
>>> sensitive queries?
>>>
>>
>> Yes, that should be fine. At the end of the day, what's going to be more
>> important is making sure you are using good indexing, and keeping all your
>> active stuff in cache. Since you have 127 columns, only pull back the
>> columns you absolutely need for each query.
>>
>> Cheers,
>> Greg
>>
>>
>
Good discussion!! Don't want to divert the thread. And apology for
comparing with another RDBMS, but curious to know the difference in
behaviour.

With respect to the storage part:- In Oracle database we were supposed to
keep the frequently accessed column first and less frequently accessed
columns towards last in the table while creating table DDL. This used to
help the query performance while accessing those columns as it's a row
store database. Are there any such performance implications exists in
postgres? And there the data types like varchar used to only occupy the
space which the real data/value contains.

But here in postgres, it seems a bit different as it has significant
overhead in storage, as these seem like being fixed length data types and
will occupy the space no matter what the read data holds. So curious to
know, if in this case will there be any performance impact accessing those
columns, if such a column which is frequently accessed but has been put
towards the end of the table because of storage space saving?


Re: Question on partitioning

2024-02-06 Thread veem v
Thank you Laurenz. Got it.

So basically , you mean to say any DDL on a table won't allow the table to
be read by other processes. I was under the assumption that it should allow
the read queries to move ahead at least. I must be wrong here. Thanks for
correcting me.

On Tue, 6 Feb 2024 at 15:46, Laurenz Albe  wrote:

> On Tue, 2024-02-06 at 00:26 +0530, veem v wrote:
> > On Mon, 5 Feb 2024 at 17:52, Laurenz Albe 
> wrote:
> > > On Mon, 2024-02-05 at 03:09 +0530, veem v wrote:
> > > > In postgresql, Is it possible to partition an existing
> nonpartitioned table having data
> > > > already residing in it and indexes and constraints defined in it,
> without the need of
> > > > manually moving the data around, to make it faster?
> > >
> > > There is no way to do that.
> >
> > Which means for any such operation we need to create a new partition
> table with that
> > structure and load that with the data from the existing partition table.
>
> You could use logical replication to keep the downtime short.
>
> > Additionally I see a lot of other restrictions like
> > 1)When creating indexes on the partition table "concurrently" keywords
> are not allowed.
>
> That is no problem.
> You create the index on ONLY the partitioned table,
> then you create indexes CONCURRENTLY on the partitions,
> then you attach these indexes as partitions to the index you created in
> the beginning.
> As soon as the last index is attached, the index on the partitioned table
> becomes valid.
>
> > 2)While creating foreign key ,  it does not allow a "not valid"
> clause if the table
> >   is partitioned.
>
> Create the foreign key on the partitions; the effect is the same.
>
> > 3) While creating indexes on this table or running any ALTER command,
> the SELECT
> >queries running from other processes run longer. Does it take any
> lock while
> >doing DDL on the base table in postgres?
>
> I am not sure what the "base table" is.  Most DDL always locks the table in
> ACCESS EXCLUSIVE mode; that is not specific to partitioned tables.
>
> Yours,
> Laurenz Albe
>


Re: Partitioning options

2024-02-08 Thread veem v
On Thu, 8 Feb 2024 at 20:08, Greg Sabino Mullane  wrote:

> On Thu, Feb 8, 2024 at 12:42 AM sud  wrote:
> ...
>
>> The key transaction table is going to have ~450 Million transactions per
>> day and the data querying/filtering will always happen based on the
>> "transaction date" column.
>>
> ...
>
>> Should we go for simple daily range partitioning on the transaction_date
>> column?
>>
>
> This one gets my vote. That and some good indexes.
>
> Cheers,
> Greg
>
>
Hello Greg,

Out of curiosity, As OP mentioned that there will be Joins and also filters
on column Customer_id column , so why don't you think that subpartition by
customer_id will be a good option? I understand List subpartition may not
be an option considering the new customer_ids gets added slowly in the
future(and default list may not be allowed) and also OP mentioned, there is
skewed distribution of data for customer_id column. However what is the
problem if OP will opt for HASH subpartition on customer_id in this
situation?

Is it because the number of partitions will be higher i.e.

If you go with simple range partitioning, for 5 months you will have ~150
daily range partitions and with each index the count of partition will gets
double, for e.g if you will have 10 indexes, the total partitions will be =
~150 table partition+ (10*150)index partition= 1650 total number of
partitions.

If OP goes for , range-hash, and hash will mostly have to be 2^N, so say 8,
hash sub-partitions , then the total number of partitions will be = (8*150)
table partitions+ (8*150*10) index partitions= ~13200 partitions.

Though there are no theoretical limits to the number of partitions in
postgres, there are some serious issues noted in the past with higher
number of table partitions. One such is below. Is this the reason?

https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits

Regards
Veem


How should we design our tables and indexes

2024-02-10 Thread veem v
 Hello,
We want to have the response time in <1 sec for our UI search query
requirement. These will be pagination queries. These read queries will be
on big transaction tables (will have ~500+ attributes approx will have
approx. rows size of ~1KB) having a continuous stream of inserts consumed
from source. And these tables will be a daily range partitioned on the
processing_date column. Each partition is going to hold approx ~450million
rows when it will serve in full capacity to all the customers transactions.

 The customer will be given the capability to search on a Max date range of
30 days of transaction data i.e ~30 range partitions and are supposed to
get the query response back in <1 sec as it will be UI from which those
queries will be submitted.

1)Is there any way in postgres to influence the optimizer for the
"first_row" optimization, so that it won't go for evaluating all the rows
from such UI search queries. As because these will be pagination queries
and the user will be interested in seeing top 100 transactions in the first
page asap?

 2) For e.g below is one sample query from a similar system but in a
different database. Want to understand , what would be the appropriate
indexes to make this above search query run in the quickest possible time?

  one Index on table1(MID) , one index Table1(CID), one index on
table2(ACN_NBR)?
  OR
  Should we create a composite index here combining PR_ID i.e (PR_ID, MID),
(PR_ID, CID), (PR_ID, ACN_NBR) as that is the most unique attribute here?

select  count(*) over() as total_record, *
from
(select  ...
from   TABLE1
Left join schema1.TABLE2  on TABLE2.PR_ID = TABLE1.PR_ID  and
TABLE2.MID = TABLE1.MID
and TABLE2.processing_date=TABLE1.processing_date
where   TABLE2.processing_date between '2023-04-20' and
'2023-05-21'-- Partition pruning
and TABLE2.ACN_NBR = ''
and ( TABLE1.MID in (XX) OR TABLE1.CID in (XX))
order by   TABLE1.PR_TIME DESC
)
limit  100 offset 0;

The data pattern for the columns used in predicate are as below:-  Table1
will be the driving table.

count(distinct ACN_NBR) - 25million
count(distinct MID) - 223k
count(distinct CID) - 59k
count(*)from table1 and table2- ~350 million
PR_ID is a unique key.

3)One of the use cases is that the customer should be able to search on
certain attributes and should be able to see the transactions in "desc by
processing_date" i.e. latest transactions on the first page on the UI. And
in such scenario, if the search attribute is less unique and the customer
puts a date range of a month i.e. over 30 partitions , it may results in
scanning and sorting billions of rows to get the top/recent ~100
transactions and most likely not going to respond back in <1 sec, even goes
for the index. So how should we handle or design indexes for catering such
queries?  For e.g. if we have the only filter on column "TABLE1.CID" in the
above query, which is very less unique then how to handle it?

 4)And finally, the parameter "settings" which we have currently is as
below for the current system. So I wanted to understand, if they are
accurate or any oddity and we should change those to cater such
requirements?

 For e.g. if we set "max_parallel_workers_per_gather"=4 to speed up the
queries, then we will be able to serve only 32/4=8 concurrent user requests
at any point in time. If we are targeting to serve ~100 concurrent users ,
will it be advisable to change or we should test the system with default
i.e. not setting this parallel parameter?

*Settings: *
effective_cache_size = '176237472kB', maintenance_io_concurrency = '1',
max_parallel_workers = '32', max_parallel_workers_per_gather = '4',
search_path = 'public, public, "$user"', temp_buffers = '16MB', work_mem =
'2GB', enable_partitionwise_join = 'on'

Regards
Veem


Re: How should we design our tables and indexes

2024-02-10 Thread veem v
Thank you So much Greg.

Will try to test the things as max as possible. I was trying to see
basically, if any obvious things we should take care of before designing a
system for satisfying such requirements. As you pointed few things , i am
trying t answer those below

On Sun, 11 Feb 2024 at 10:43, Greg Sabino Mullane 
wrote:

> There is a lot to unpack here. I'm going to take a quick pass, but you
> ought to consider getting some custom expert help.
>
> On Sat, Feb 10, 2024 at 2:39 PM veem v  wrote:
>
>>
>> Pagination is already a hard problem, and does not even make sense when
> combined with "a continuous stream of inserts". What should the user see
> when they click on page 2?
>

When the user clicks to the second page , it will see the next set of rows
i.e 100 to 200 and next will see 200 to 300 and so on till the result set
finishes.


> Also, 500 attributes?! What data types are those? If boolean, you may want
> to look at using bitfields.
>
>> Using LIMIT does allow for this, with certain caveats. The best answer
>> for a lot of these questions is "try it and see" with a simplified version
>> of your queries against some dummy data.
>>
>
>

All those attributes are majorly Varchar and numeric in nature , so not
sure if any options exist there for these?
Additionally, in other databases like Oracle we use hints like
'/*+first_rows*/ to let optimizers favour index paths to favor these types
of UI search queries , so I was wondering if anything like those exists
over here.


>   one Index on table1(MID) , one index Table1(CID), one index on
>> table2(ACN_NBR)?
>>
>
> This. One index for each important column. Especially if they vary between
> queries, as you alluded to later.
>
>
If PR_ID is a must in the Join criteria between these table tables table1,
table2 in all the queries, then is  it advisable to have a composite index
like (pr_id, mid), (pr_id,cid) etc rather than having index on individual
columns?


> select  count(*) over() as total_record, * [large query omitted]
>>
>
> Queries starting with select count(*) are also a red flag, but it's hard
> to tell from this. Probably best to explain what you think the query is
> doing using regular words.
>
>
Actually this inner query is doing the main work, i.e finding the
search results based on the input search criteria. The outer query is just
fetching the results from the inner query along with count(*), to pass on
to the API , so as to calculate and show the user how many pages there
total with a full result set. basically it will count(*)/N records per
page, and that figure will be displayed in the first page of the UI screen.


> 3)One of the use cases is that the customer should be able to search on
>> certain attributes and should be able to see the transactions in "desc by
>> processing_date" i.e. latest transactions on the first page on the UI. And
>> in such scenario, if the search attribute is less unique and the customer
>> puts a date range of a month i.e. over 30 partitions , it may results in
>> scanning and sorting billions of rows to get the top/recent ~100
>> transactions and most likely not going to respond back in <1 sec, even goes
>> for the index. So how should we handle or design indexes for catering such
>> queries?  For e.g. if we have the only filter on column "TABLE1.CID" in the
>> above query, which is very less unique then how to handle it?
>>
>
> Well, there is only so much a database can do if you are pulling from 500
> different attributes. But the whole purpose of an indexed column is to
> prevent having to scan *all* the rows. If it's a common value, then
> hopefully there is something else in the where clause more specific that is
> also indexed. As mentioned above, a LIMIT and ORDER BY, with the
> appropriate indexes, can return early.
>
>
 So here in case , if there is just the filter on column CID and the join
condition on PR_ID column between the two tables, No other filter exists to
help make the result set smaller. Lets say the query really results in
millions of rows. And it's because the CID column is having very less
distinct values. Also the results are getting ordered on the PR_TIME column
to show the latest transactions in the first page. So indexes on CID won't
be very effective. So is there any option to index any other way or a
different set of columns to make this type of query run faster (not sure if
including the column which is there in order by clause 'pr_time' in the
index will make a difference) ? Or any other way to store the results in a
sorted way already in the table, so as to not do heavy sorting to show the
first 100 rows to the customer out of millions satisfying the search
criter

Re: How to do faster DML

2024-02-10 Thread veem v
Thank You so much for the detailed explanation.

On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer  wrote:

>
> Yes. Numbers in Oracle are variable length, so most Oracle tables
> wouldn't contain many fixed length columns. In PostgreSQL must numeric
> types are fixed length, so you'll have quite a lot of them.
>
>
So it means , say in other databases like (oracle database), we were
careless choosing the data length , say for example Varchar2(4000), if the
real data which is inserted into the table holds a varchar string of length
20 bytes then Oracle trimmed it to occupy the 20 bytes length only in the
storage. but in postgre here we need to be cautious and define the length
as what the data attribute can max contains , because that amount of fixed
space is allocated to every value which is inserted into the table for that
attribute/data element. Similarly for Number/Numeric data type.  Please
correct if my understanding is wrong.

Regards
Veem


Re: How to do faster DML

2024-02-11 Thread veem v
On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer  wrote:

> > Similarly for Number/Numeric data type.
>
> Number in Oracle and numeric in PostgreSQL are variable length types.
> But in PostgreSQL you also have a lot of fixed length numeric types
> (from boolean to bigint as well as float4 and float8) and you would
> normally prefer those over numeric (unless you really need a decimal or
> very long type). So padding is something you would encounter in a
> typical PostgreSQL database while it just wouldn't happen in a typical
> Oracle database.
>
>
> When you said *"you would normally prefer those over numeric " *I was
thinking the opposite. As you mentioned integer is a fixed length data type
and will occupy 4 bytes whether you store 15 or .But in case of
variable length type like Number or numeric , it will resize itself based
on the actual data, So is there any downside of going with the variable
length data type like Numeric, Varchar type always for defining the data
elements?

Regards
Veem


Re: How to do faster DML

2024-02-11 Thread veem v
Thank you .

On Mon, 12 Feb 2024 at 03:52, Peter J. Holzer  wrote:

> On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> > On Sun, Feb 11, 2024 at 11:54 AM veem v  wrote:
> > When you said "you would normally prefer those over numeric " I was
> > thinking the opposite. As you mentioned integer is a fixed length
> data type
> > and will occupy 4 bytes whether you store 15 or .But in case
> of
> > variable length type like Number or numeric , it will resize itself
> based
> > on the actual data, So is there any downside of going with the
> variable
> > length data type like Numeric,
> >
> >
> > Consider a table with a bunch of NUMERIC fields.  One of those records
> has
> > small values (aka three bytes).  It fits neatly in 2KiB.
> >
> > And then you update all those NUMERIC fields to big numbers that take 15
> > bytes.  Suddenly (or eventually, if you update them at different times),
> the
> > record does not fit in 2KiB, and so must be moved to its own.page.
> That causes
> > extra IO.
>
> I think that's not much of a concern with PostgreSQL because you can't
> update a row in-place anyway because of MVCC. So in any case you're
> writing a new row. If you're lucky there is enough free space in the same
> page and you can do a HOT update, but that's quite independent on
> whether the row changes size.
>
>
>
Good to know. So it means here in postgres, there is no such concern like
"row chaining", "row migration" etc. which we normally have in a non mvcc
database (like Oracle say). And there its not advisable to have more than
~255 columns in a table even its technically possible. And if such
requirement arises, we normally break the table into 2 different tables
with some columns in common to join them.

https://jonathanlewis.wordpress.com/2015/02/19/255-columns/

So we were thinking, adding many column to a table should be fine in
postgres (as here we have a use case in which total number of columns may
go till ~500+). But then,  considering the access of columns towards the
end of a row is going to add more time as compared to the column which is
at the start of the row. As you mentioned, accessing 100th column may add 4
to 5 times more as compared to the access of first column. So , is it
advisable here to go for similar approach of breaking the table into two ,
if the total number of column reaches certain number/threshold for a table?

Regards
Veem


Re: How should we design our tables and indexes

2024-02-12 Thread veem v
Thank You.

On Mon, 12 Feb 2024 at 22:17, Greg Sabino Mullane 
wrote:

> Sure will try to test and see how it behaves when the number of
>> simultaneous queries (here 32/4=8 concurrent queries) exceed the
>> max_parallel_workers limit. Though I am expecting the further queries
>> exceeding the limit might get serialized.
>>
>
> Yes - if there are not enough workers available, it will run with a
> reduced number of workers, including possibly zero. You can see that when
> you run an explain analyze, it will show you the number of workers it wants
> and the number if actually was able to get.
>
>
Thank you . Got the point.

If these quick queries ran within a certain time frame (say for a duration
of ~1hr) and few of the executions ran longer(which might be because of the
less parallel workers as the max limit exhausted because of concurrent
executions,
OR
It may be because of the change in the execution path for certain execution
of the queries.

Is there any way to track those historical executions and be able to find
the exact root cause of the slow executions confidently?

Regards
Veem


Re: How to do faster DML

2024-02-12 Thread veem v
Thank you so much for the clarification.

On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer  wrote:

> The fixed width types are those that the CPU can directly process:
> Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64
> bits. The CPU can read and write them with a single memory access, it
> can do arithmetic with a single instruction, etc.
>
> Number/Numeric are not native types on any CPU. To read them the CPU
> needs several memory accesses (probably one per byte unless you get
> really clever) and then it can't do any calculations with them
> directly, instead it has run a subroutine which does operations on
> little chunks and then puts those chunks together again - basically the
> same as you do when you're doing long addition or multiplication on
> paper. So that's not very efficient.
>
> So it looks like the fixed length data type(like integer, float) should be
the first choice while choosing the data type of the attributes
wherever possible, as these are native types. (Like choosing
"Integer/float" over "Numeric", "Char" over "Varchar" etc).
However I do see even in Oracle databases, we have Integer type too, but
it's suggesting(For e.g. in below blog) to rather go with Number types over
Integer and Varchar2 over Char, which is opposite of what we are discussing
here. Is the fixed length data type behaves differently in postgres vs
oracle and thus should be treated differently?

https://www.databasestar.com/oracle-data-types/

>From above blog:-


*When to use CHAR: There should be no reason to use the CHAR data type, as
it is similar to a VARCHAR2 and it’s better to be consistent.*
*When to use INTEGER: You should use the NUMBER data type instead.*

Regards
Veem


Re: How to do faster DML

2024-02-12 Thread veem v
On Tue, 13 Feb 2024 at 02:01, Ron Johnson  wrote:

> On Mon, Feb 12, 2024 at 3:23 PM veem v  wrote:
> [snip]
>
>> So it looks like the fixed length data type(like integer, float) should
>> be the first choice while choosing the data type of the attributes
>> wherever possible, as these are native types.
>>
>
> Correct.
>
>
>> (Like choosing "Integer/float" over "Numeric", "Char" over "Varchar"
>> etc).
>>
> However I do see even in Oracle databases, we have Integer type too, but
>> it's suggesting(For e.g. in below blog) to rather go with Number types over
>> Integer and Varchar2 over Char, which is opposite of what we are discussing
>> here. Is the fixed length data type behaves differently in postgres vs
>> oracle and thus should be treated differently?
>>
>> https://www.databasestar.com/oracle-data-types/
>>
>
>>
>> From above blog:-
>>
>
> Oracle is not Postgresql.  WTH are you using an Oracle blog to decide on
> Postgresql data types
>
>
>>
>>
>> *When to use CHAR: There should be no reason to use the CHAR data type,
>> as it is similar to a VARCHAR2 and it’s better to be consistent.*
>> *When to use INTEGER: You should use the NUMBER data type instead.*
>>
>>
> Did you actually read that blog post?
>
> Have you even read the Postgresql documentation on data types?
>
>

My Apology, If interpreting wrongly. My thought was that , as fixed length
data types are native type ones, ideally it should be faster in all the
databases. So was comparing with different databases. And mainly as I had
worked mainly in Oracle database in the past and so it was a key learning
here and seemed totally opposite, so was curious to know.

Regards
Veem


Re: How should we design our tables and indexes

2024-02-13 Thread veem v
On Tue, 13 Feb 2024 at 20:59, Peter J. Holzer  wrote:

> For some kinds of queries a composite index can be dramatically faster.
> While Postgres can combine indexes that means scanning both indexes and
> combining the result, which may need a lot more disk I/O than scanning a
> composite index. Indeed, in the cases where a composite index would be
> useful but doesn't exist, PostgreSQL usually just chooses the best of
> the single column indexes and ignores the rest.
>
> That said, my rule of thumb is to create just single column indexes at
> first and only create composite indexes if they are necessary.
>
>
Thank you so much. As I understand optimizer uses indexed column as "access
criteria" and rest of the predicate as "filter criteria" while evaluating
the query predicate. And if the majority of the rows are getting eliminated
in the filtered step , that means adding that filtered criteria column to
the index could give us better performance.

So  I was trying to understand say in below query having TABLE1 as driving
table ( if we forget about column selectivity for a moment),

Can the optimizer, only scan the TABLE1  using ACCESS criteria " TABLE1.MID
in ()" or "TABLE1.CID in ()" which will be catered by two different
index i.e one index on column "MID" and other on column "CID"?
OR
It can utilize other columns as access criteria those used in join
conditions like MID, PR_ID, in which case a composite index on  the
columns(CID,PR_ID) (MID, PR_ID) will provide better selectivity and faster
access?

Similarly for TABLE2 a composite index on (ACN_NBR,PR_ID,MID) or just an
index on (ACN_NBR)?

select  ...
from   TABLE1
Left join schema1.TABLE2  on TABLE2.PR_ID = TABLE1.PR_ID  and
TABLE2.MID = TABLE1.MID
and TABLE2.processing_date=TABLE1.processing_date
where TABLE1.processing_date between '2023-04-20' and '2023-05-21'
-- Considering processing_date here as partition key.
   and TABLE2.ACN_NBR = ''
and ( TABLE1.MID in () OR TABLE1.CID in ())
order by   TABLE1.PR_TIME DESC


Re: How to do faster DML

2024-02-13 Thread veem v
On Tue, 13 Feb 2024 at 20:32, Peter J. Holzer  wrote:

> Please do not conflate "char(n)" with native machine types like int or
> float. These are very different things. A char(n) is string of fixed but
> arbitrary length. This is not something a CPU can process in a single
> instruction. It has to go over it character by character.
>
> There is almost never a reason to use char(n). Just use varchar(n) or in
> the case of PostgreSQL just varchar or text.
>
> > However I do see even in Oracle databases, we have Integer type too,
>
> Not really. INTEGER is just an alias for NUMBER(38) in Oracle (see for
> example
>
> https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html
> ).
> It's not the same as an INTEGER in PostgreSQL.
>
>
Thank you so much for the clarification.

Basically as i understood, we can follow below steps in serial,

Step-1)First wherever possible use Smallint,Integer,bigint,float data types
rather than numeric. This will give better performance.

Step-2)Use the frequently queried columns first and least frequently
queried columns towards last in the row while creating the table. This is
too intended for better performance.

Step-3)Define the columns with typlen desc as per below formula( column
tetris symptom). This is for better storage space utilization.

SELECT a.attname, t.typname, t.typalign, t.typlen
  FROM pg_class c
  JOIN pg_attribute a ON (a.attrelid = c.oid)
  JOIN pg_type t ON (t.oid = a.atttypid)
 WHERE c.relname = 'user_order'
   AND a.attnum >= 0
 ORDER BY t.typlen DESC;

One question here, if we have defined one column as a fixed length data
type "integer" and slowly we noticed the length of data keeps increasing
(in case of a sequence generated PK column which will keep increasing), and
we want to alter the column to "bigint" now. In such scenario, will it
append/pad the additional spaces to all the existing values which were
already stored  with integer type initially in the table. And that would be
then an increase to the existing table storage. Please correct me if I'm
wrong.

Regards
Veem


Re: How to do faster DML

2024-02-14 Thread veem v
On Wed, 14 Feb, 2024, 10:30 am Ron Johnson,  wrote:

> On Tue, Feb 13, 2024 at 4:17 PM veem v  wrote:
> [sni[]
>
>> One question here, if we have defined one column as a fixed length data
>> type "integer" and slowly we noticed the length of data keeps increasing
>> (in case of a sequence generated PK column which will keep increasing), and
>> we want to alter the column to "bigint" now. In such scenario, will it
>> append/pad the additional spaces to all the existing values which were
>> already stored  with integer type initially in the table. And that would be
>> then an increase to the existing table storage. Please correct me if I'm
>> wrong.
>>
>
> ALTER TABLE foo ALTER COLUMN bar BIGINT; rewrites the whole table.  Do
> yourself a favor, and start with BIGINT.
>


Not yet tried, but that's interesting. I was initially thinking as it will
be increasing the length, so it would be just a metadata change and finish
within seconds.

 But as you mentioned, it seems to be the effect of "fixed length data
type" which is why it's going to rewrite whole table even we just increases
the column length. Hope it won't be the case in variable length data type.

>


Re: How to do faster DML

2024-02-14 Thread veem v
On Thu, 15 Feb 2024 at 00:43, Adrian Klaver 
wrote:

> It depends:
>
> https://www.postgresql.org/docs/current/sql-altertable.html
>
> "Adding a column with a volatile DEFAULT or changing the type of an
> existing column will require the entire table and its indexes to be
> rewritten. As an exception, when changing the type of an existing column,
> if the USING clause does not change the column contents and the old type
> is either binary coercible to the new type or an unconstrained domain over
> the new type, a table rewrite is not needed. However, indexes must always
> be rebuilt unless the system can verify that the new index would be
> logically equivalent to the existing one. For example, if the collation for
> a column has been changed, an index rebuild is always required because the
> new sort order might be different. However, in the absence of a collation
> change, a column can be changed from text to varchar (or vice versa)
> without rebuilding the indexes because these data types sort identically.
> Table and/or index rebuilds may take a significant amount of time for a
> large table; and will temporarily require as much as double the disk space.
>
> "
>
>
> create table int_test(int_fld integer);
>
> insert into int_test select * from generate_series(1, 1, 1);
>
>
> select ctid, int_fld from int_test ;
>
> ctid   | int_fld
> --+-
>  (0,1)|   1
>  (0,2)|   2
>  (0,3)|   3
>  (0,4)|   4
>  (0,5)|   5
>  (0,6)|   6
>  (0,7)|   7
>  (0,8)|   8
>  (0,9)|   9
>  (0,10)   |  10
>
>
> alter table int_test alter column int_fld set data type bigint;
>
> select ctid, int_fld from int_test ;
>
>   ctid   | int_fld
> --+-
>  (0,1)|   1
>  (0,2)|   2
>  (0,3)|   3
>  (0,4)|   4
>  (0,5)|   5
>  (0,6)|   6
>  (0,7)|   7
>  (0,8)|   8
>  (0,9)|   9
>  (0,10)   |  10
>
> update int_test set  int_fld = int_fld;
>
> select ctid, int_fld from int_test  order by int_fld;
>
> (63,1)|   1
>  (63,2)|   2
>  (63,3)|   3
>  (63,4)|   4
>  (63,5)|   5
>  (63,6)|   6
>  (63,7)|   7
>  (63,8)|   8
>  (63,9)|   9
>  (63,10)   |  10
>
>
> Where ctid is:
>
> https://www.postgresql.org/docs/current/ddl-system-columns.html
>
>
>  Thank you so much.
So as I also tested the same as you posted, there has been no change in
"ctid" , when I altered the column data type from 'int' to 'bigint' in the
table, so that means full table rewriting won't happen in such a scenario.

Regards
Veem


Re: How to do faster DML

2024-02-15 Thread veem v
On Thu, 15 Feb 2024 at 22:40, Adrian Klaver 
wrote:

> On 2/15/24 09:00, Greg Sabino Mullane wrote:
> > On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > That is a mixed bag:
> >
> >
> > Ha! Good point. Our contrived example table does suffer from that, so
> > perhaps the test should be:
> >
> > create table int_test(c1 int, c2 int);
>
> Alright now I see:
>
> test=# create table int_test(c1 int, c2 int);
> CREATE TABLE
>
> test=# select pg_relation_filenode('int_test');
>   pg_relation_filenode
> --
>  70021
> (1 row)
>
>
> test=# insert into int_test select a, a+1  from generate_series(1,
> 1, 1) as t(a);
> INSERT 0 1
>
> test=# select pg_relation_size('int_test');
>   pg_relation_size
> --
> 368640
> (1 row)
>
> test=# alter table int_test alter column c2 set data type bigint;
> ALTER TABLE
>
> test=# select pg_relation_filenode('int_test');
>   pg_relation_filenode
> --
>  70024
> (1 row)
>
> test=# select pg_relation_size('int_test');
>   pg_relation_size
> --
> 450560
> (1 row)
>
>
Thank you.

Did a similar test as below using DB fiddle. Same results for fixed length
data type i.e the size is getting increased. However for variable
length types (like numeric) , it remains the same, so it must be just
metadata change and thus should be quick enough even for a big table.
So one learning for me, i.e. one of the downside of fixed length data type
is, with fixed length data types any future changes to it , will be a full
table rewrite. And thus this sort of change for big tables will be a
nightmare.

https://dbfiddle.uk/_gNknf0D

Regards
Veem


Re: How to do faster DML

2024-02-15 Thread veem v
On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer  wrote:

> On 2024-02-15 16:51:56 -0700, David G. Johnston wrote:
> > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer 
> wrote:
> > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> > > On Tuesday, February 13, 2024, veem v  wrote:
> > >
> > > float data types rather than numeric. This will give better
> > > performance.
> > >
> > >
> > > Only use an inexact floating-point data type if you truly
> > > understand what you are getting yourself into.  Quickly getting
> > > the wrong answer isn’t tolerable solution.
> >
> > Do NOT assume that a decimal type (even if it can grow to ridiculuous
> > lengths like PostgreSQL's numeric) is exact in the mathematical
> sense.
> > It isn't. It cannot represent almost all real numbers
> >
> >
>
> Thank You.

So it looks like the use cases where we don't need precision or decimal
point values to be stored in postgres , integer data type is the way to go
without a doubt.

However in cases of precision is required, as you all mentioned there are
certain issues(rounding error etc) with "Float" data type and considering a
normal developers usage point of view, it should be the Numeric type which
we should use. I think the consistent working or functionality of an
application takes precedence over performance. And I believe , in most real
life scenarios, when we need precisions we expect them to behave
consistently across all the application and database platforms(mainly
banking industries), and thus it seems Numeric data type is the safest one
to use as a multi database platform type. Please correct me if I'm wrong.

Regards
Veem


Performance issue debugging

2024-02-22 Thread veem v
Hi All,
As i understand we have pg_stats_activity which shows the real time
activity of sessions currently running in the database. And the
pg_stats_statement provides the aggregated information of the historical
execution of all the queries in the database. But I don't see any sampling
or timing information in those views. For example at a certain point in
time in the past , what queries were getting executed in the database and
overall wait events etc.

So is there any other view which provides such information to dig into the
past to diagnose any historical performance issues ? or should we create
our own table and flush the information from the pg_stats_activity view to
that with the current timestamp and that would be helpful in analyzing
performance issues or any extension available for such?

Also even the explain analyze can only provide the exact run time stats of
a completed query. If we want to see what's going on for a long running
query and at what step in the execution path the query is spending most
resources and time when it keeps running in the database, is there any
available option in postgres database?  for e.g. in a SELECT query index
access path if taking most of the time OR in an INSERT query INDEX block is
causing contention while inserting data into the table , how would we be
able to find that for a currently running query or a for a historical query?

Regards
Veem


Re: Performance issue debugging

2024-02-26 Thread veem v
On Fri, 23 Feb, 2024, 2:54 am Vick Khera,  wrote:

> On Thu, Feb 22, 2024 at 4:03 PM veem v  wrote:
>
>> Hi All,
>> As i understand we have pg_stats_activity which shows the real time
>> activity of sessions currently running in the database. And the
>> pg_stats_statement provides the aggregated information of the historical
>> execution of all the queries in the database. But I don't see any sampling
>> or timing information in those views. For example at a certain point in
>> time in the past , what queries were getting executed in the database and
>> overall wait events etc.
>>
>> So is there any other view which provides such information to dig into
>> the past to diagnose any historical performance issues ? or should we
>> create our own table and flush the information from the pg_stats_activity
>> view to that with the current timestamp and that would be helpful in
>> analyzing performance issues or any extension available for such?
>>
>>
> Look at the auto_explain setting. Taking a random interval snapshot of
> running queries likely will not teach you anything useful.
>
>
>> Also even the explain analyze can only provide the exact run time stats
>> of a completed query. If we want to see what's going on for a long running
>> query and at what step in the execution path the query is spending most
>> resources and time when it keeps running in the database, is there any
>> available option in postgres database?  for e.g. in a SELECT query index
>> access path if taking most of the time OR in an INSERT query INDEX block is
>> causing contention while inserting data into the table , how would we be
>> able to find that for a currently running query or a for a historical query?
>>
>
> You can see locking contention in the pg_locks table. In my experience I
> rarely ever saw anything in there even when I was cranking tens of millions
> of inserts and updates per day. I don't think there's anything for
> historical queries or to probe anything more about a currently running
> query's progress.
>
> Take some time to think about your queries and how you can reduce any
> locking they need. If you do need some locking, consider using the FOR
> UPDATE clause in SELECT to limit what you do lock.
>

Thank you. Actually in Oracle Database there used to be a view
(v$active_session_history)which used to collect the stats of session
activity in certain time period and from that we used to be able to track
and debug historical performance issues at exact point in time. So I was
thinking, if such thing not available by default in postgres, and if we can
do it manually and that will be helpful in long-term.


Aligning grants and privileges

2024-02-27 Thread veem v
Hi,
We have two different types of DBA group in current on-premise Oracle
databases,  one who deploy code in prod and highest level of privileges
(i.e having write access to the database objects, performing backup
recovery, export/import and performing other database maintenance jobs etc).

and other groups, who only have read only privileges (select catalogue role
i.e access to data dictionary views, run time and historical performance
views along) mainly look into the performance issues and more alligns
towards the functional part of the application , understands the database
design working closely with Application Development team and they have read
only access to the databases. They are expert in reading query execution
path and have privilege to run the explain command, reading sqlmonitor
report, creating profiles, baselines , adding hints to the query by
understanding inefficient execution path, fetching AWR report, tracing sql
queries, gathering object statistics, accessing OEM(oracle enterprise
manager ) to monitor performance. Suggesting appropriate indexes and
partitioning strategies for tables etc.

Now that we are moving few of the applications to Postgres on-premise
database and few to aws RDS postgres. What kind of privileges should we
provide to these two groups? Mainly we don't want the performance guys to
have the elevated privileges. I can understand , readonly access to the
table is something we can give to those users. Apart from that, are there
any specific privileges which we should provide,  so as to enable look and
debug into all types of performance issues without any obstruction and at
the same time not giving elevated privileges?

Regards
Veem


Re: Aligning grants and privileges

2024-02-27 Thread veem v
On Wed, 28 Feb, 2024, 2:14 am Adrian Klaver, 
wrote:

> On 2/27/24 12:40, veem v wrote:
> > Hi,
> > We have two different types of DBA group in current on-premise Oracle
> > databases,  one who deploy code in prod and highest level of privileges
> > (i.e having write access to the database objects, performing backup
> > recovery, export/import and performing other database maintenance jobs
> etc).
> >
> > and other groups, who only have read only privileges (select catalogue
> > role i.e access to data dictionary views, run time and historical
> > performance views along) mainly look into the performance issues and
> > more alligns towards the functional part of the application ,
> > understands the database design working closely with Application
> > Development team and they have read only access to the databases. They
> > are expert in reading query execution path and have privilege to run the
> > explain command, reading sqlmonitor report, creating profiles, baselines
> > , adding hints to the query by understanding inefficient execution path,
> > fetching AWR report, tracing sql queries, gathering object statistics,
> > accessing OEM(oracle enterprise manager ) to monitor performance.
> > Suggesting appropriate indexes and partitioning strategies for tables
> etc.
> >
> > Now that we are moving few of the applications to Postgres on-premise
> > database and few to aws RDS postgres. What kind of privileges should we
> > provide to these two groups? Mainly we don't want the performance guys
> > to have the elevated privileges. I can understand , readonly access to
> > the table is something we can give to those users. Apart from that, are
> > there any specific privileges which we should provide,  so as to enable
> > look and debug into all types of performance issues without any
> > obstruction and at the same time not giving elevated privileges?
>
> Read:
>
> https://www.postgresql.org/docs/current/predefined-roles.html
>
> >
> > Regards
> > Veem
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com




Thank you so much.

All the reader roles mentioned in the above url looks appropriate for
someone who needs to have high levels read privilege but no write
privilege. Hope same available in RDS too.

Vacuum analyze looks similar to stats gather , will those need special
privilege or the read privilege like pg_read_all_data, pg_read_all_stats,
pg_read_all_setting will be enough to be able to run vaccum on a table?

Is there some facilities to trace a running session and see the trace files
to analyze performance issues. And if any special role needed for that?

Regards
Veem


Re: Question on Table creation

2024-02-28 Thread veem v
On Wed, 28 Feb 2024 at 01:24, sud  wrote:

> While testing the pg_partman extension I see it by default creates a
> "default partition" even if we provide the parameter p_start_partition as
> "current date". But if someone purposely doesn't want to have the default
> partitions , so as to avoid any unwanted data entering the table , which
> may be required from a data quality perspective. Is there any option in the
> create_parent function to achieve that? Or do we have to manually drop the
> default partition after getting this created through create_parent function?
>
> I am not able to understand the significance of the "p_template_table"
> parameter. When we create the partitions without passing this parameter the
> template table gets created automatically in the partman schema. Checking
> the details of the template table, it doesn't show any indexes or
> constraints present in it, but still the child partitions get created with
> all the necessary indexes and constraints as those are there defined for
> the main table. So wondering , in what real life scenario do we really need
> the template table to be defined and how will it help if the partitions are
> different then from the main table structure?
>
> Lastly , the template table is by default created in the partman schema
> but it also works without error,  if we pass the template table to be
> created in the application schema. So is there any downside of having the
> template table reside in the application schema?
>
>

As per my understanding , you can't control the creation of the default
partition. If you somehow don't want to keep it then, you can drop it post
creation of the partition for the first time i.e after the create_parent
function call.

Template_table is necessary when someone needs the child partitions to have
a difference in structure as compared to main table like difference in
indexes or constraints. But I think the number and data types of columns
should be the same in main table and template tables. I have not tried
though.

I think keeping a template table in an application schema is not a good
idea considering the fact that it's not a business table but a technical
one, and nobody should accidentally alter this.

>
>


When manual analyze is needed

2024-03-03 Thread veem v
Hi,
We see in one of the RDS postgres instances, from pg_stat_user_tables , the
auto vacuum and auto analyze happening on the tables without our manual
intervention.

So is auto vacuum analyze is sufficient  to make sure optimal stats and
unbloated table structure in place or should we do it manually based on the
type of tables like
for e.g.
if a table is having just insert only types and getting similar amount of
data throughout the day 24*7
VS
a table which is getting bulk data load once a day only
VS
a volatile table with truncate load kind of operation
VS
a table with heavy Update/deletes also happening along with inserts
throughout the day 24*7.

Will auto vacuum/analyze take care of all such tables, or we need to do it
manually in certain scenarios?
And if any suboptimal plan is taken by the optimizer for a specific query
suddenly, because of the missing stats , how to catch that. Basically, how
to get the old plan hash and get it compared with the current plan hash?

Regards
Veem


Re: When manual analyze is needed

2024-03-03 Thread veem v
On Mon, 4 Mar 2024 at 09:42, Laurenz Albe  wrote:

>
> The only things that require manual ANALYZE are
>
> 1. partitioned tables (autoanalyze will collect statistics on the
>partitions, but not the partitioned table itself)
>
>
So the partitioned table stats is nothing but the rolledover stats of all
the partitions. As you mentioned, autoanalyze only works for child
partitions but not the partition tables, so does it mean we have to
schedule some jobs manually(through some scheduletr like pg_cron) to
analyze these partitioned tables at certain time intervals to keep those up
to date for partitioned tables? And won't that scan all the child
partitions again to have the stats aggregated/rolled over from all the
child partitions level to the table level?


Re: When manual analyze is needed

2024-03-03 Thread veem v
Additionally if a query was working fine but suddenly takes a
suboptimal plan because of missing stats , do we have any hash value column
on any performance view associated with the queryid which we can refer to
see past vs current plans difference and identify such issues quickly and
fix it?
I am not seeing any such column in pg_stat_activity or pg_stat_statements
to hold hash value of the plan and also the query column is showing
"" for many of the entries, why so?


Re: When manual analyze is needed

2024-03-04 Thread veem v
On Mon, 4 Mar 2024 at 21:46, Greg Sabino Mullane  wrote:

> On Mon, Mar 4, 2024 at 12:23 AM veem v  wrote:
>
>> Additionally if a query was working fine but suddenly takes a
>> suboptimal plan because of missing stats , do we have any hash value column
>> on any performance view associated with the queryid which we can refer to
>> see past vs current plans difference and identify such issues quickly and
>> fix it?
>>
>
> You can use auto_explain; nothing else tracks things at that fine a level.
> You can use pg_stat_statements to track the average and max time for each
> query. Save and reset periodically to make it more useful.
>
> https://www.postgresql.org/docs/current/auto-explain.html
>
> https://www.postgresql.org/docs/current/pgstatstatements.html
>
>

Thank you so much Greg. That helps.

We were planning to have the auto_explain extension added and set the
log_min_duration to ~5 seconds and log_analyze to true. So that all the
queries going above that time period will be logged and provide detailed
information on the exact point of bottleneck. Will it be a good idea to set
it on production DB which is a highly active database? or should we only
have the extension added but only set the parameters while we debug some
performance issue and then reset it back after we are done.


Not able to purge partition

2024-03-20 Thread veem v
Hello All,
 We created two partitioned tables one being parent and other child. Both
are range partitioned and one table has a foreign key to the other parent
table. Now , when we run the partition maintenance job to drop the older
partitions which meet the partition retention criteria, it's not letting us
drop any of the partitions from the parent table. Child table works fine
though.

 And when we try using cascade , it drops the foreign key constraint from
all the partitions.  Can you please let me know how to make the partition
maintenance work in such a scenario while maintaining the foreign key
intact, as we are expecting the foreign key to be attached to the
respective partition only but not the full table?

 And this partition maintenance job which runs through cron scheduler in an
automated way is failing on a daily basis for us without being able to drop
any of the historical partitions from the parent partitioned table.

 Below is the test case:-

CREATE TABLE schema1.test_part_drop_parent
(
c1 varchar(36)  NOT NULL ,
c2_part_date timestamp with time zone  NOT NULL ,
c3  numeric(12,0)  NOT NULL ,
CONSTRAINT test_part_drop_parent_PK PRIMARY KEY (c1,c2_part_date)
) PARTITION BY RANGE (c2_part_date);

CREATE INDEX idx1_test_part_parent ON schema1.test_part_drop_parent( C3
ASC);


CREATE TABLE schema1.test_part_drop_child
(
C1_child   varchar(36)  NOT NULL ,
C1   varchar(36)  NOT NULL ,
c2_part_date timestamp with time zone  NOT NULL ,
C3  numeric(12,0)  NOT NULL ,
CONSTRAINT test_part_drop_child_PK PRIMARY KEY (C1_child,c2_part_date),
CONSTRAINT test_part_drop_child_FK1 FOREIGN KEY (C1,c2_part_date)
REFERENCES schema1.test_part_drop_parent(c1,c2_part_date)
) PARTITION BY RANGE (c2_part_date);

CREATE INDEX test_part_drop_child_fk ON schema1.test_part_drop_child( c1,
c2_part_date);

CREATE INDEX test_part_drop_child_idx ON schema1.test_part_drop_child(
c1_child, c2_part_date);


select partman.create_parent(
   p_parent_table := 'schema1.test_part_drop_child',
   p_control := 'c2_part_date',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 5,
   p_start_partition => '2024-02-01 00:00:00'
);

update partman.part_config set infinite_time_partitions = 'true' ,
retention = '1 months', retention_keep_table='false',
retention_keep_index='false'
where parent_table = 'schema1.test_part_drop_child';


select partman.create_parent(
   p_parent_table := 'schema1.test_part_drop_parent',
   p_control := 'c2_part_date',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 5,
   p_start_partition => '2024-02-01 00:00:00'
);

update partman.part_config set infinite_time_partitions = 'true' ,
retention = '1 months', retention_keep_table='false',
retention_keep_index='false'
where parent_table = 'schema1.test_part_drop_parent';


select partman.run_maintenance('schema1.test_part_drop_child');

select partman.run_maintenance('schema1.test_part_drop_parent');


SQL Error [P0001]: ERROR: cannot drop table
schema1.test_part_drop_parent_p2024_02_01 because other objects depend on it
CONTEXT: SQL statement "DROP TABLE
schema1.test_part_drop_parent_p2024_02_01"
PL/pgSQL function
drop_partition_time(text,interval,boolean,boolean,text,timestamp with time
zone) line 213 at EXECUTE
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 195 at
assignment
DETAIL: constraint test_part_drop_child_fk1 on table
schema1.test_part_drop_child depends on table
schema1.test_part_drop_parent_p2024_02_01
HINT: Use DROP ... CASCADE to drop the dependent objects too.
CONTEXT: PL/pgSQL function
drop_partition_time(text,interval,boolean,boolean,text,timestamp with time
zone) line 308 at RAISE
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 195 at
assignment
DETAIL:
HINT:
Where: PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line
413 at RAISE


Re: Not able to purge partition

2024-03-21 Thread veem v
Also, i see, its not allowing to drop the parent table partition even all
the respective child partitions are dropped and no child record exists in
the child table. But this started working , when we detach parent partition
and then drop the partitions. So it means if we just change the setup of
the parent partition table in the part_config as
retention_keep_table='true', retention_keep_index='true' and
retention_schema=, it will work without giving error. And
then we have to drop the partitions from that retention_schema through
another procedure may be. Correct me if anyone has done with some other
workaround.

However , i have one question now, if we have child record exists in child
partition table, it wont even allow the detach the respective parent
partition, so is the "CALL partman.run_maintenance_proc()" automatically
ensures the ordering of child and parent table which will be passed through
the partition maintenance ? Or for that also we need to write our own
procedure and schedule through the cron?

Something as below,

CREATE PROCEDURE part_maintenance_proc()
LANGUAGE SQL
AS $$
declare
drop_partition_cursor CURSOR FOR
table_name from information_schema.tables where table_name like
'%test_part_drop_parent%' and table_schema like '%retention_schema%';
drop_partition_record RECORD;

Begin
partman.run_maintenance('cpod.test_part_drop_child');
partman.run_maintenance('cpod.test_part_drop_parent');

OPEN drop_partition_cursor
loop

FETCH NEXT FROM drop_partition_cursor INTO drop_partition_record;
EXIT WHEN NOT FOUND;

drop table drop_partition_record.table_name;
end loop;
close drop_partition_cursor;
END;
$$;

SELECT cron.schedule('@hourly',
);


On Thu, 21 Mar, 2024, 11:07 am veem v,  wrote:

> Hello All,
>  We created two partitioned tables one being parent and other child. Both
> are range partitioned and one table has a foreign key to the other parent
> table. Now , when we run the partition maintenance job to drop the older
> partitions which meet the partition retention criteria, it's not letting us
> drop any of the partitions from the parent table. Child table works fine
> though.
>
>  And when we try using cascade , it drops the foreign key constraint from
> all the partitions.  Can you please let me know how to make the partition
> maintenance work in such a scenario while maintaining the foreign key
> intact, as we are expecting the foreign key to be attached to the
> respective partition only but not the full table?
>
>  And this partition maintenance job which runs through cron scheduler in
> an automated way is failing on a daily basis for us without being able to
> drop any of the historical partitions from the parent partitioned table.
>
>  Below is the test case:-
>
> CREATE TABLE schema1.test_part_drop_parent
> (
> c1 varchar(36)  NOT NULL ,
> c2_part_date timestamp with time zone  NOT NULL ,
> c3  numeric(12,0)  NOT NULL ,
> CONSTRAINT test_part_drop_parent_PK PRIMARY KEY (c1,c2_part_date)
> ) PARTITION BY RANGE (c2_part_date);
>
> CREATE INDEX idx1_test_part_parent ON schema1.test_part_drop_parent( C3
> ASC);
>
>
> CREATE TABLE schema1.test_part_drop_child
> (
> C1_child   varchar(36)  NOT NULL ,
> C1   varchar(36)  NOT NULL ,
> c2_part_date timestamp with time zone  NOT NULL ,
> C3  numeric(12,0)  NOT NULL ,
> CONSTRAINT test_part_drop_child_PK PRIMARY KEY (C1_child,c2_part_date),
> CONSTRAINT test_part_drop_child_FK1 FOREIGN KEY (C1,c2_part_date)
> REFERENCES schema1.test_part_drop_parent(c1,c2_part_date)
> ) PARTITION BY RANGE (c2_part_date);
>
> CREATE INDEX test_part_drop_child_fk ON schema1.test_part_drop_child( c1,
> c2_part_date);
>
> CREATE INDEX test_part_drop_child_idx ON schema1.test_part_drop_child(
> c1_child, c2_part_date);
>
>
> select partman.create_parent(
>p_parent_table := 'schema1.test_part_drop_child',
>p_control := 'c2_part_date',
>p_type := 'native',
>p_interval := '1 day',
>p_premake := 5,
>p_start_partition => '2024-02-01 00:00:00'
> );
>
> update partman.part_config set infinite_time_partitions = 'true' ,
> retention = '1 months', retention_keep_table='false',
> retention_keep_index='false'
> where parent_table = 'schema1.test_part_drop_child';
>
>
> select partman.create_parent(
>p_parent_table := 'schema1.test_part_drop_parent',
>p_control := 'c2_part_date',
>p_type := 'native',
>p_interval := '1 day',
>p_premake := 5,
>p_start_partition => '2024-02-01 00:00:00'
> );
>
> update partman.part_config set infinite_time_partitions = 

Re: Not able to purge partition

2024-03-21 Thread veem v
On Thu, 21 Mar, 2024, 6:18 pm Laurenz Albe, 
wrote:


What you *can* do is detach the partition and then drop it, but detatching
will
be slow because PostgreSQL has to check for referencing rows.

The best solution is to create the foreign key *not* between the partitioned
tables, but between the individual table partitions.  That should be easy if
you have the same partition boundaries for both.
Then you can simply drop a partition from both tables at the same time.


Thank you Laurenz.

With the existing foreign key setup, the drop partitions works fine for the
child tables when they are triggered through the cron job by calling
"run_maintenance_proc". But its failing for parent table partition drop.
And as you mentioned if we do the detach and then drop it works fine for
the parent table.

However, we are creating partitions using the create_parent procedure
something as below ,and thus it creates the default template table based on
the main table structure informations and also the subsequent partitions
are created/dropped accordingly using the scheduled cron as below.

So when you mentioned "*to create the foreign keys *not* between the *
*partitioned** table but between the individual partitions*" , can that be
done using the same "partman.create_parent" procedure and automated cron
job schedule or has to be done any other way manually ?

Additionally , do we have to give call to the cron job in orderly fashion,
i.e child first and then parent table? As because, currently we were just ,
scheduling "run_maintenance_proc" once and all the partition tables
maintenance(i.e. creation of new partition and dropping of old partitions)
was being taken care automatically by that one call.

select partman.create_parent(
p_parent_table := 'schema1.test_part_drop_child',
p_control := 'c2_part_date',
p_type := 'native',
p_interval := '1 day',
p_premake := 5,
p_start_partition => '2024-02-01 00:00:00'
);

update partman.part_config set infinite_time_partitions = 'true' ,
retention = '1 months', retention_keep_table='true',
retention_keep_index='true',retention_schema='public'
where parent_table = 'schema1.test_part_drop_child';

SELECT cron.schedule('@hourly',
);


Re: Not able to purge partition

2024-03-21 Thread veem v
On Thu, 21 Mar 2024 at 23:39, Laurenz Albe  wrote:

> On Thu, 2024-03-21 at 22:50 +0530, veem v wrote:
> > So when you mentioned "to create the foreign keys *not* between the
> > partitioned table but between the individual partitions" , can that
> > be done using the same "partman.create_parent" procedure and automated
> > cron job schedule or has to be done any other way manually ?
>
> I don't know the capabilities of partmen, but I would be surprised if
> it could automatically create foreign keys on the partitions.
>
>
Yes, the constraints on each of the child partitions and parent partitions
were getting created automatically. As I see from
information_schema.table_constraints, it shows one foreign key constraint
in each of the child partitions created through the partman procedure.

It works smoothly without any issue, until we noticed this issue while
trying to purge the partition from the parent table partition. But I
believe this extension is extensively used , so I'm just wondering if I am
missing something here with regards to foreign key creation using this
automated partition creation/partman extension functionality.


Re: Not able to purge partition

2024-03-21 Thread veem v
Can someone please confirm if this behavior of foreign key is expected with
the partition created through partman extension and thus we need to have
our manual process written for partition purge (in order of child first and
then parent) , rather relying on partman partition maintenance to take care
drop partition automatically for us?

On Fri, 22 Mar, 2024, 12:42 am veem v,  wrote:

> On Thu, 21 Mar 2024 at 23:39, Laurenz Albe 
> wrote:
>
>> On Thu, 2024-03-21 at 22:50 +0530, veem v wrote:
>> > So when you mentioned "to create the foreign keys *not* between the
>> > partitioned table but between the individual partitions" , can that
>> > be done using the same "partman.create_parent" procedure and automated
>> > cron job schedule or has to be done any other way manually ?
>>
>> I don't know the capabilities of partmen, but I would be surprised if
>> it could automatically create foreign keys on the partitions.
>>
>>
> Yes, the constraints on each of the child partitions and parent partitions
> were getting created automatically. As I see from
> information_schema.table_constraints, it shows one foreign key constraint
> in each of the child partitions created through the partman procedure.
>
> It works smoothly without any issue, until we noticed this issue while
> trying to purge the partition from the parent table partition. But I
> believe this extension is extensively used , so I'm just wondering if I am
> missing something here with regards to foreign key creation using this
> automated partition creation/partman extension functionality.
>
>
>


Re: Not able to purge partition

2024-03-23 Thread veem v
Trying to consolidate the main questions here as below.

1)As we see having foreign key defined is making the detach partition run
for minutes(in our case 5-10minutes for 60 million rows partition), so how
to make the parent table partition detach and drop work fast in such a
scenario while maintaining the foreign key intact?

2)As we are using pg_partman for maintaining the partitions, So do we need
to schedule multiple crons (one for parent table and other for child
table), so that first it will drop the child table partition and then the
parent table partition(As because detaching/dropping parent partition first
will error out) called in order? OR only one cron job call like below can
do the job for us and make sure the dependency between the tables is taken
care of automatically?

SELECT cron.schedule('@daily',partman.run_maintenance());


Re: Not able to purge partition

2024-03-23 Thread veem v
On Sat, 23 Mar 2024 at 23:08, Laurenz Albe  wrote:

> On Sat, 2024-03-23 at 22:41 +0530, veem v wrote:
> > 1)As we see having foreign key defined is making the detach partition run
> >   for minutes(in our case 5-10minutes for 60 million rows partition), so
> >   how to make the parent table partition detach and drop work fast in
> such
> >   a scenario while maintaining the foreign key intact?
>
> I told you: don't do it.
> Instead, use foreign keys between the partitions.
>

I am struggling to understand how to maintain those partitions then? As
because we were planning to use pg_partman for creating and dropping
partitions automatically without much hassle. So do you mean to say do the
partition maintenance(create/drop) by creating our own jobs and not to use
the pg_partman extension for this.

Say for example in our case the parent table has 3-4 child table and all
are partitioned on same keys/columns, so how we can identify the child
partitions and then create all foreign keys to the respective parent table
partitions and attach those partitions to parent table and also make this
process automated? Appreciate any guidance on this.

Actually, using pg_partman was taking care of everything starting from
creating partitions with different names and creating respective indexes,
constraints also with different names for each partitions without us being
worrying anything about those.


Re: Not able to purge partition

2024-03-24 Thread veem v
On Sun, 24 Mar 2024 at 20:29, Laurenz Albe  wrote:

> On Sun, 2024-03-24 at 00:37 +0530, veem v wrote:
> > > Instead, use foreign keys between the partitions.
> >
> > I am struggling to understand how to maintain those partitions then?
> > As because we were planning to use pg_partman for creating and dropping
> > partitions automatically without much hassle.
>
> I understand.
>
> But do you want to pursue a way that is not working well, just because
> the tool you chose cannot handle it?
>
> Yours,
> Laurenz Albe



Thank you so much Laurenz.

Can you please suggest some docs which shows the way we should do the
partition maintenance (if not using pg_partman)?

Actually , I am not able to visualize the steps here. Do you mean to say ,
we will just create the partition tables without any foreign key
constraints first. Then create parent table future partitions first (say 10
days partitions) and then child table future 10 days partitions manually
and during that define the foreign key constraints , indexes etc. Here ,
how to exactly find the exact matching parent partitions so as to create
the foreign keys one to one?


Re: Moving delta data faster

2024-04-06 Thread veem v
On Fri, 5 Apr 2024 at 06:10, Adrian Klaver 
wrote:

>
> > S3 is not a database. You will need to be more specific about '...
> then
> > from the S3 it will be picked and gets merged to the target postgres
> > database.'
> >
> >
> > The data from S3 will be dumped into the stage table and then the
> > upsert/merge from that table to the actual table.
>
> The S3 --> staging table would be helped by having the data as CSV and
> then using COPY. The staging --> final table step could be done as
> either ON CONFLICT or MERGE, you would need to test in your situation to
> verify which works better.
>

Just a thought , in case the delta record changes are really higher(say
>30-40% of the total number of rows in the table) can OP also evaluate the
"truncate target table +load target table" strategy here considering
DDL/Trunc is transactional in postgres so can be done online without
impacting the ongoing read queries and also performance wise, it would be
faster as compared to the traditional Update/Insert/Upsert/Merge?


Question on trigger

2024-04-11 Thread veem v
Hi, We used to use Oracle database in which we had audit triggers(something
as below) mandated for all tables by the control team. Now we are going to
use the postgresql 15.4 database for one of our applications. So,wanted to
understand if there exists any downside of such audit trigger setup for all
the tables? Will it impact the bulk data insert/update/delete OR slowdown
of any of the DML operations significantly (and thus will not be advisable
to use for all tables but selected ones)?

CREATE OR REPLACE TRIGGER TAB_AUD_TRG
  BEFORE DELETE OR INSERT OR UPDATE
  ON tab
  FOR EACH ROW
BEGIN
  IF inserting THEN
:NEW.create_timestamp := systimestamp;
:NEW.create_userid  := sys_context('USERENV','SESSION_USER');
:NEW.update_timestamp := systimestamp;
:NEW.update_userid := sys_context('USERENV','SESSION_USER');
  ELSIF updating THEN
IF  updating('create_userid') OR updating('create_timestamp') THEN
:new.create_userid   := :old.create_userid;
:new.create_timestamp  := :old.create_timestamp;
END IF;
:NEW.update_timestamp := systimestamp;
:NEW.update_userid := sys_context('USERENV','SESSION_USER');
  END IF;
  END;
/

Regards
Veem


Re: Question on trigger

2024-04-13 Thread veem v
Thank you Adrian.

So it seems the heavy DML tables will see an impact if having triggers
(mainly for each row trigger) created on them.

And also the bulk DML/array based insert (which inserts multiple rows in
one short or one batch) , in those cases it seems the trigger will not make
that happen as it will force it to make it happen row by row, as the
trigger is row based. Will test anyway though.

On Thu, 11 Apr 2024 at 22:00, Adrian Klaver 
wrote:

> On 4/11/24 07:31, veem v wrote:
> > Hi, We used to use Oracle database in which we had audit
> > triggers(something as below) mandated for all tables by the control
> > team. Now we are going to use the postgresql 15.4 database for one of
> > our applications. So,wanted to understand if there exists any downside
> > of such audit trigger setup for all the tables? Will it impact the bulk
> > data insert/update/delete OR slowdown of any of the DML operations
> > significantly (and thus will not be advisable to use for all tables but
> > selected ones)?
>
> Triggers are overhead in Postgres as they where in Oracle. If they
> didn't cause an issue in Oracle I would suspect that would also be the
> case in Postgres. To confirm you would need to create a test setup and
> run some common operations and see what the overhead is.
>
> Some potential performance improvements:
>
> https://www.postgresql.org/docs/current/sql-createtrigger.html
>
> "...a trigger that is marked FOR EACH STATEMENT only executes once for
> any given operation, regardless of how many rows it modifies (in
> particular, an operation that modifies zero rows will still result in
> the execution of any applicable FOR EACH STATEMENT triggers)."
>
> <...>
>
> "The REFERENCING option enables collection of transition relations,
> which are row sets that include all of the rows inserted, deleted, or
> modified by the current SQL statement. This feature lets the trigger see
> a global view of what the statement did, not just one row at a time.
> This option is only allowed for an AFTER trigger that is not a
> constraint trigger; also, if the trigger is an UPDATE trigger, it must
> not specify a column_name list. OLD TABLE may only be specified once,
> and only for a trigger that can fire on UPDATE or DELETE; it creates a
> transition relation containing the before-images of all rows updated or
> deleted by the statement. Similarly, NEW TABLE may only be specified
> once, and only for a trigger that can fire on UPDATE or INSERT; it
> creates a transition relation containing the after-images of all rows
> updated or inserted by the statement."
>
>
> As example:
>
> https://www.postgresql.org/docs/current/plpgsql-trigger.html
>
> Example 43.7. Auditing with Transition Tables
>
> >
> > CREATE OR REPLACE TRIGGER TAB_AUD_TRG
> >BEFORE DELETE OR INSERT OR UPDATE
> >ON tab
> >FOR EACH ROW
> > BEGIN
> >IF inserting THEN
> >  :NEW.create_timestamp := systimestamp;
> >  :NEW.create_userid  := sys_context('USERENV','SESSION_USER');
> >  :NEW.update_timestamp := systimestamp;
> >  :NEW.update_userid := sys_context('USERENV','SESSION_USER');
> >ELSIF updating THEN
> >  IF  updating('create_userid') OR updating('create_timestamp')
> THEN
> >  :new.create_userid   := :old.create_userid;
> >  :new.create_timestamp  := :old.create_timestamp;
> >  END IF;
> >  :NEW.update_timestamp := systimestamp;
> >  :NEW.update_userid := sys_context('USERENV','SESSION_USER');
> >END IF;
> >END;
> > /
> >
> > Regards
> > Veem
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Question on trigger

2024-04-16 Thread veem v
On Sat, 13 Apr 2024 at 21:44, Adrian Klaver 
wrote:

> On 4/13/24 00:03, veem v wrote:
> > Thank you Adrian.
> >
> > So it seems the heavy DML tables will see an impact if having triggers
> > (mainly for each row trigger) created on them.
> >
> > And also the bulk DML/array based insert (which inserts multiple rows in
> > one short or one batch) , in those cases it seems the trigger will not
> > make that happen as it will force it to make it happen row by row, as
> > the trigger is row based. Will test anyway though.
>
> You said you have triggers in the Oracle database and I assumed they
> worked and where not a show stopping issue there. What makes you think
> that would be different in Postgres?
>
> What type of triggers where there in Oracle, per row, per statement or a
> mix?
>
>
Actually we have row level triggers  in oracle which are running for
smaller volume DML and are making the direct path inserts to happen in
conventional row by row insert, in presence of trigger. So was wondering if
it postgres we will be encountering a similar issue and batch inserts may
be converted back to row by row automatically. And here we are going to
process higher volume DMLS in postgresql database.


How to create efficient index in this scenario?

2024-06-08 Thread veem v
Hi ,
It's postgres version 15.4. A table is daily range partitioned on a column
transaction_timestamp. It has a unique identifier which is the ideal for
primary key (say transaction_id) , however as there is a limitation in
which we have to include the partition key as part of the primary key, so
it has to be a composite index. Either it has to be
(transaction_id,transaction_timestamp) or ( transaction_timestamp,
transaction_id). But which one should we go for, if both of the columns get
used in all the queries?

We will always be using transaction_timestamp as mostly a range predicate
filter/join in the query and the transaction_id will be mostly used as a
join condition/direct filter in the queries. So we were wondering, which
column should we be using  as a leading column in this index?

There is a blog below (which is for oracle), showing how the index should
be chosen and it states ,  "*Stick the columns you do range scans on last
in the index, filters that get equality predicates should come first.* ",
and in that case we should have the PK created as in the order
(transaction_id,transaction_timestamp). It's because making the range
predicate as a leading column won't help use that as an access predicate
but as an filter predicate thus will read more blocks and thus more IO.
Does this hold true in postgres too?

https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/

Additionally there is another scenario in which we have the requirement to
have another timestamp column (say create_timestamp) to be added as part of
the primary key along with transaction_id and we are going to query this
table frequently by the column create_timestamp as a range predicate. And
ofcourse we will also have the range predicate filter on partition key
"transaction_timestamp". But we may or may not have join/filter on column
transaction_id, so in this scenario we should go for
 (create_timestamp,transaction_id,transaction_timestamp). because
"transaction_timestamp" is set as partition key , so putting it last
doesn't harm us. Will this be the correct order or any other index order is
appropriate?

Thanks and Regards
Veem


Re: How to create efficient index in this scenario?

2024-06-08 Thread veem v
On Sun, 9 Jun 2024 at 09:45, Lok P  wrote:

>
> On Sat, Jun 8, 2024 at 7:03 PM veem v  wrote:
>
>>
>> There is a blog below (which is for oracle), showing how the index should
>> be chosen and it states ,  "*Stick the columns you do range scans on
>> last in the index, filters that get equality predicates should come first.*
>> ", and in that case we should have the PK created as in the order
>> (transaction_id,transaction_timestamp). It's because making the range
>> predicate as a leading column won't help use that as an access predicate
>> but as an filter predicate thus will read more blocks and thus more IO.
>> Does this hold true in postgres too?
>>
>>
>> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/
>>
>
> I believe the analogy holds true here in postgres too and the index in
> this case should be on (transaction_id, transaction_timestamp).
>
>
>>
>>
>> Additionally there is another scenario in which we have the requirement
>> to have another timestamp column (say create_timestamp) to be added as part
>> of the primary key along with transaction_id and we are going to query this
>> table frequently by the column create_timestamp as a range predicate. And
>> ofcourse we will also have the range predicate filter on partition key
>> "transaction_timestamp". But we may or may not have join/filter on column
>> transaction_id, so in this scenario we should go for
>>  (create_timestamp,transaction_id,transaction_timestamp). because
>> "transaction_timestamp" is set as partition key , so putting it last
>> doesn't harm us. Will this be the correct order or any other index order is
>> appropriate?
>>
>>
>>
> In this case , the index should be on (
> create_timestamp,transaction_id,transaction_timestamp), considering the
> fact that you will always have queries with "create_timestamp" as predicate
> and may not have transaction_id in the query predicate.
>

So in the second scenario, if we keep the create_timestamp as the leading
column ,is it not against the advice which the blog provides i.e. to not
have the range predicate as the leading column in the index?


Does trigger only accept functions?

2024-06-10 Thread veem v
Hi, It's version 15.4 of postgres. We have a requirement to have the audit
enabled for the delete queries on the base table. And for that we are
planning to have one audit table created for each base table and have
triggers on each of the base tables to be fired on delete which will insert
records into the audit table.

But I see the trigger is not accepting the insert query directly, rather
it's asking to call a function and to put the business logic inside that
function, something as below. So does that mean, to enable audit on the ~50
base table ,  we will have ~50 functions to be created and also they need
to be called from ~50 triggers? or any other better approach exists to
handle this?

CREATE OR REPLACE FUNCTION log_deletes_source_table()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO source_table_delete_history (record_id, delete_timestamp,
col1, col2,col3)
VALUES (OLD.id, current_timestamp, old.col1,old.col2,old.col3);
RETURN OLD;
END;
$$ LANGUAGE plpgsql;

-- Trigger for deletes
CREATE TRIGGER before_delete
BEFORE DELETE ON source_table
FOR EACH ROW EXECUTE FUNCTION  log_deletes_source_table  ();

Regards
Veem


Re: Does trigger only accept functions?

2024-06-11 Thread veem v
On Tue, 11 Jun 2024 at 17:03, hubert depesz lubaczewski 
wrote:

> On Tue, Jun 11, 2024 at 12:47:14AM +0530, veem v wrote:
> > to be called from ~50 triggers? or any other better approach exists to
> > handle this?
>
> pgaudit extension?
>
> Or just write all the changes to single table?
>
> Or use dynamic queries that will build the insert based on the name of
> table the event happened on?
>
> Or pass arguments?
>
> Best regards,
>
> depesz
>
>
Thank you so much. I hope you mean something as below when you say making
it dynamic. Because we have the audit tables having more number of columns
as compared to the source table and for a few the column name is a bit
different.

-- Trigger for deletes
CREATE TRIGGER before_delete
BEFORE DELETE ON source_table
FOR EACH ROW EXECUTE FUNCTION log_deletes();


-- Trigger for source_table1
CREATE TRIGGER before_delete_source_table1
BEFORE DELETE ON source_table1
FOR EACH ROW EXECUTE FUNCTION log_deletes();

-- Trigger for source_table2
CREATE TRIGGER before_delete_source_table2
BEFORE DELETE ON source_table2
FOR EACH ROW EXECUTE FUNCTION log_deletes();


CREATE OR REPLACE FUNCTION log_deletes()
RETURNS TRIGGER AS $$
BEGIN
IF TG_TABLE_NAME = 'source_table1' THEN
INSERT INTO delete_audit1 ( col1, col2, col3)
VALUES (OLD.col1, OLD.col2, OLD.col3);
ELSIF TG_TABLE_NAME = 'source_table2' THEN
INSERT INTO delete_audit2 ( col4, col5, col6)
VALUES (OLD.col4, OLD.col5, OLD.col6);
-- Add more conditions for other tables
ELSE
RAISE EXCEPTION 'Audit table not defined for %', TG_TABLE_NAME;
END IF;
RETURN OLD;
END;
$$ LANGUAGE plpgsql;


Re: Does trigger only accept functions?

2024-06-11 Thread veem v
On Wed, 12 Jun 2024 at 00:26, hubert depesz lubaczewski 
wrote:

>
>
> No, I meant building dynamic queries and then EXECUTE-ing, like docs
> show:
>
> https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
>
> Best regards,
>
> depesz
>
>
My apology, if interpreting it wrong way. It doesn't make much difference
though, but do you mean something like below?

CREATE OR REPLACE FUNCTION log_deletes()
RETURNS TRIGGER AS $$
DECLARE
audit_table_name TEXT;
audit_query TEXT;
BEGIN
IF TG_TABLE_NAME = 'source_table1' THEN
audit_table_name := 'delete_audit1';
audit_query := 'INSERT INTO ' || audit_table_name || ' (record_id,
delete_timestamp, col1, col2, col3) VALUES ( $2, $3, $4)';
EXECUTE audit_query USING OLD.id, OLD.col1, OLD.col2, OLD.col3;
ELSIF TG_TABLE_NAME = 'source_table2' THEN
audit_table_name := 'delete_audit2';
audit_query := 'INSERT INTO ' || audit_table_name || ' ( col4,
col5, col6) VALUES ( $2, $3, $4)';
EXECUTE audit_query USING OLD.id, OLD.col4, OLD.col5, OLD.col6;

ELSE
RAISE EXCEPTION 'Audit table not defined for %', TG_TABLE_NAME;
END IF;

RETURN OLD;
END;
$$ LANGUAGE plpgsql;


Issue while creating index dynamically

2024-07-23 Thread veem v
Hi,
It's postgres version 15.4. We have a requirement to create an index on a
big partition table and want to do it online. And to make the script run in
an automated way on any day , through our ci/cd pipeline we were trying to
write it as below inside a begin/end block. I.e. create index using "ON
ONLY" option and then create index on each partition using 'CONCURRENTLY"
key word and then attach the index partitions to the main index, something
as below.

But we are getting an error while executing saying it cant be executed in
transaction block with "CONCURRENTLY". So I want to understand , is there
any alternate way to get away with this?

  EXECUTE format('CREATE INDEX %I ON ONLY %I (%I);', index_name,
table_name, column_name);

  FOR partition_name IN
SELECT inhrelid::regclass::text
FROM pg_inherits
WHERE inhparent = table_name::regclass
LOOP
partition_index_name := partition_name || '_' || index_name ||
'_idx';

EXECUTE format('
CREATE INDEX CONCURRENTLY %I ON %I (%I);',
partition_index_name, partition_name, column_name);

EXECUTE format('
ALTER INDEX %I ATTACH PARTITION %I;', index_name,
partition_index_name);
END LOOP;


ERROR:  CREATE INDEX CONCURRENTLY cannot run inside a transaction block
CONTEXT:  SQL statement "
CREATE INDEX CONCURRENTLY partitioned_table_0_index1_idx ON
partitioned_table_0 (id);"
PL/pgSQL function inline_code_block line 20 at EXECUTE

Regards
Veem


Re: Issue while creating index dynamically

2024-07-24 Thread veem v
On Wed, 24 Jul 2024 at 02:02, Tom Lane  wrote:

> Ron Johnson  writes:
> > On Tue, Jul 23, 2024 at 4:10 PM veem v  wrote:
> >> But we are getting an error while executing saying it cant be executed
> in
> >> transaction block with "CONCURRENTLY". So I want to understand , is
> there
> >> any alternate way to get away with this?
>
> > I'd write that in bash, not in a DO block.
>
> Yeah.  I thought for a bit about using contrib/dblink to carry out
> the commands in a different session, but I don't think that'll work:
> CREATE INDEX CONCURRENTLY would think it has to wait out the
> transaction running the DO block at some steps.  Shove the logic
> over to the client side and you're good to go.
>
> regards, tom lane
>
>
>
Thank you .
I was thinking the individual statement will work fine if I pull out those
from the begin..end block, as those will then be not bounded by any outer
transaction.
 However, When I was trying it from dbeaver by extracting individual index
creation statements rather from within the "begin ..end;" block, it still
failed with a different error as below. Why is it so?

 "SQL Error [25001]: Error: create index concurrently cannot be executed
within a pipeline "


Getting specific partition from the partition name

2024-08-08 Thread veem v
Hi ,
We are using postgres version 15.4. We have a range partition table and the
partition naming convention is generated by pg_partman and is something
like "table_name>_p_MM_DD".

We have a requirement of extracting specific partitions ordered by the date
criteria and also do some operations on that specific date. But I am
struggling and it's not working as expected.I tried something as below but
it's not working.Can somebody guide me here please.

 to_date( substring('table_part_p2024_08_08' from
'_p(\d{4})_(\d{2})_(\d{2})'),  '_MM_DD'
) < current_date

or is there any ready-made data dictionary which will give us the order of
the partitions by the date and we can get hold of the specific nth
partition in that table?

Regards
Veem


Re: Getting specific partition from the partition name

2024-08-09 Thread veem v
This helps. Thank you very much.

On Fri, 9 Aug 2024 at 02:15, Greg Sabino Mullane  wrote:

> _MM_DD is already setup for sorting, so just do:
>
> SELECT table_name FROM information_schema.tables WHERE table_name ~
> 'table_part_p' ORDER BY 1 DESC;
>
> If you need to grab the numbers:
>
> SELECT substring('table_part_p2022_03_04' from '([\d_]+)$');
>
> Cheers,
> Greg
>
>


Re: Column type modification in big tables

2024-08-13 Thread veem v
On Tue, 13 Aug 2024 at 19:39, Greg Sabino Mullane 
wrote:

> On Sat, Aug 10, 2024 at 5:06 PM Lok P  wrote:
>
>
>> Can someone through some light , in case we get 5-6hrs downtime for this
>> change , then what method should we choose for this Alter operation?
>>
>
> We can't really answer that. Only you know what resources you have, what
> risk/reward you are willing to handle, and how long things may take. For
> that latter item, your best bet is to try this out on the same/similar
> hardware and see how long it takes. Do a smaller table and extrapolate if
> you need to. Or promote one of your replicas offline and modify that. I've
> given you a low-risk / medium-reward option with check constraints, but for
> the ALTER TABLE options you really need to try it and see (on non-prod).
>
>>
>>
What about if the OP opt a strategy something as below,
1) Detaching the partitions 2)Altering individual partitions with required
column type and length 3)Altering the table 4)Attaching the partitions back
to the main table

This should be faster and also a controlled fashion for each partition
individually.


Partitioning and unique key

2024-08-31 Thread veem v
Hello,

We have our transaction tables daily range partitioned based on
transaction_timestamp column which is timestamptz data type and these are
having composite primary key on (transaction_id, transaction_timestamp).
And we were using an "insert on conflict" for loading data to our system ,
which means if another record comes to the system with the same
transaction_id and transaction_timestamp, it will get updated. This way we
already have 60 days worth of data stored in our system with approx. 70
million transactions per day.

But we just got to know from business that the data should be unique by
only transaction_id but not transaction_timestamp. Any incoming data with
the same transaction_id(even different transaction_timestamp) should get
updated but not inserted.

Also these daily partitions are going to hold 400million rows in future and
will be queried on the transaction_timestamp filter so we can't really
avoid the partitioning option here considering future growth.

But due to postgres limitations we are unable to have this unique
constraint or primary key only on the transaction_id column, we have to
include transaction_timestamp with it as a composite key. So I want to
understand from experts if there is any possible way to satisfy both
partitioning on transaction_timestamp column and unique key or pk just on
*trans*action_id only?

Note-its 15.4 postgres database.


Regards

Veem


Re: Partitioning and unique key

2024-09-01 Thread veem v
On Sun, 1 Sept 2024 at 11:38, veem v  wrote:

>
> On Sun, 1 Sept 2024 at 10:03, veem v  wrote:
>
>>
>> On Sun, 1 Sept 2024 at 09:13, David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>> On Saturday, August 31, 2024, veem v  wrote:
>>>
>>>>
>>>>
>>>> iii)And then alter the datatype of the partition key transaction_date
>>>> to DATE in one shot at the table level(which should be fast as its having
>>>> more granularity as compare to existing timestamptype, so should be catalog
>>>> or dictionary change only), and that will remain the part of composite PK
>>>> (transaction_id,transaction_date).
>>>>
>>>
>>> While this might seem logical, in reality date and timestamptz are
>>> different fixed-width data types and thus any attempt to change from one to
>>> the other will involve a table rewrite.  Best you could do is leave the
>>> timestamptz in place and just truncate to day so the time is always
>>> midnight UTC.
>>>
>>>
>>>>
>>>>
>> Here  , if we keep the PK column as is i.e. the transaction_timestamp as
>> timestamptz but truncate the time component , in that case again in future
>> if someone tries to insert(using insert on conflict) data into the table
>> with time component , it will get consumed and will not be restricted by
>> the PK constraint. So I was trying to make the data type also as DATE for
>> the transaction_timestap column.
>>
>> As in this case anyway we have to create another column to populate the
>> date+timestamp values as we cant throw those values away per business need,
>> so we will be kind of rewriting the table.So is it okay if if we will
>>
>> 1) Detach all the partitions.
>> 2)Do the alter using "only" key word in table level. (For adding new
>> column transaction_timestamp_new  to hold date+timestamp value and also
>> altering the existing transaction_timestamp column to DATE from type
>> timestamptz).
>> 3)Then do the data fix(delete the duplicates) and alter the column, one
>> partition at a time for all of the partitions and once done , attach  those
>> partitions one by one.
>> 5)Rename the columns at table level.Hope this won't need any table
>> rewrite.
>>
>> Is there any downside if we go by the above approach?
>>
>
> Or do you mean to say there is no way we can modify the data type of a
> partition key even by detaching the partitions one by one? And thus we may
> have only way left is to create the table from scratch with partitions and
> populate the data to it? I was avoiding this because we have many indexes
> also in it , so creating from scratch means creating those indexes again.
> So I wanted to achieve it by detaching partitions, doing the required
> change and attaching it again.
>
>
I tried  by detaching the partitions, but still then it's not allowing me
to alter the DAT TYPE of the partition key and throwing error as below.

ERROR:  cannot alter column "" because it is part of
the partition key of relation ""

Now I am thinking if it's really going to get too complex if we try to
stick with the partition detach and attach strategy. As a few teammates say
, having a new column added with just a date type and then drop the
existing FK and PK first and then detach all the partitions, and attach the
partitions back using the new DATE column. and then recreate the PK again.
Btw we have ~5 partition tables with parent child relationship on which
this fix has to be applied. So I'm still wondering the best way possible
for fixing this issue.


Re: Partitioning and unique key

2024-08-31 Thread veem v
On Sun, 1 Sept 2024 at 03:58, Adrian Klaver 
wrote:

>
> The model is at odds with itself and untenable. If the tables hold
> multiple rows for a given transaction_id  then you cannot have a
> PK/Unique constraint on that column. Seems there is a decided lack of
> any planning. The only way I can see this happening is consolidating all
> the duplicate transaction_id rows into a single row for each
> transaction_id. That then leads to the question of how to do that and
> retain the 'correct' information from the selection of rows for each
> transaction_id.
>
>
Yes we had messed up the data for now and have multiple records for each
transaction_id persisted and thus we need to fix the data. But more than
that , as I stated , I wanted to understand first
1) if it's technically possible to have a unique key on only the
transaction_id column having the partition key on the
transaction_timestamp, because the table is going to be queried/purged
based on the transaction_timestamp?

2) Additionally we were thinking if above is technically not possible, then
the maximum granularity which we can have for each transaction_id will be a
day, so the partition key transaction_timestmp can be truncated to have
only date component but no time component. So the primary key will
be (transaction_id, transaction_date). But we also don't want to lose the
time component and persist the existing data of transaction_timestmp (which
will have a time component in it, in a separate column).

And in above case , for fixing the existing data in least disruptive way,
as we have currently duplicate transaction_id inserted into the table
already because of the composite primary key(transaction_id,
transaction_timestmp).Can we simply
 i)rename the existing column transaction_timestmp to transaction_date and
then add new column transaction_timestmp using the values of existing
column partition by partition.
ii)And then delete the duplicate data using query something as below , each
partition by partition.
iii)And then alter the datatype of the partition key transaction_date to
DATE in one shot at the table level(which should be fast as its having more
granularity as compare to existing timestamptype, so should be catalog or
dictionary change only), and that will remain the part of composite PK
(transaction_id,transaction_date).
iv) Repeat this step for all child partition tables and then for the parent
partition tables.

Will this technique be the most efficient way of fixing this mess?

WITH ranked_records AS (
SELECT column1_id, column2_timestamptz,
   ROW_NUMBER() OVER (PARTITION BY column1_id,
date_trunc('day', column2_timestamptz)
  ORDER BY column2_timestamptz DESC) AS
rn
FROM partition_name
)
DELETE FROM partition_name T1
WHERE EXISTS (
SELECT 1
FROM ranked_records T2
WHERE T1.column1_id = T2.column1_id
  AND T1.column2_timestamptz = T2.column2_timestamptz
  AND T2.rn > 1
)


Re: Partitioning and unique key

2024-08-31 Thread veem v
On Sun, 1 Sept 2024 at 09:13, David G. Johnston 
wrote:

> On Saturday, August 31, 2024, veem v  wrote:
>
>>
>>
>> iii)And then alter the datatype of the partition key transaction_date to
>> DATE in one shot at the table level(which should be fast as its having more
>> granularity as compare to existing timestamptype, so should be catalog or
>> dictionary change only), and that will remain the part of composite PK
>> (transaction_id,transaction_date).
>>
>
> While this might seem logical, in reality date and timestamptz are
> different fixed-width data types and thus any attempt to change from one to
> the other will involve a table rewrite.  Best you could do is leave the
> timestamptz in place and just truncate to day so the time is always
> midnight UTC.
>
>
>>
>>
Here  , if we keep the PK column as is i.e. the transaction_timestamp as
timestamptz but truncate the time component , in that case again in future
if someone tries to insert(using insert on conflict) data into the table
with time component , it will get consumed and will not be restricted by
the PK constraint. So I was trying to make the data type also as DATE for
the transaction_timestap column.

As in this case anyway we have to create another column to populate the
date+timestamp values as we cant throw those values away per business need,
so we will be kind of rewriting the table.So is it okay if if we will

1) Detach all the partitions.
2)Do the alter using "only" key word in table level. (For adding new column
transaction_timestamp_new  to hold date+timestamp value and also altering
the existing transaction_timestamp column to DATE from type timestamptz).
3)Then do the data fix(delete the duplicates) and alter the column, one
partition at a time for all of the partitions and once done , attach  those
partitions one by one.
5)Rename the columns at table level.Hope this won't need any table rewrite.

Is there any downside if we go by the above approach?


Re: Partitioning and unique key

2024-08-31 Thread veem v
On Sun, 1 Sept 2024 at 10:03, veem v  wrote:

>
> On Sun, 1 Sept 2024 at 09:13, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Saturday, August 31, 2024, veem v  wrote:
>>
>>>
>>>
>>> iii)And then alter the datatype of the partition key transaction_date to
>>> DATE in one shot at the table level(which should be fast as its having more
>>> granularity as compare to existing timestamptype, so should be catalog or
>>> dictionary change only), and that will remain the part of composite PK
>>> (transaction_id,transaction_date).
>>>
>>
>> While this might seem logical, in reality date and timestamptz are
>> different fixed-width data types and thus any attempt to change from one to
>> the other will involve a table rewrite.  Best you could do is leave the
>> timestamptz in place and just truncate to day so the time is always
>> midnight UTC.
>>
>>
>>>
>>>
> Here  , if we keep the PK column as is i.e. the transaction_timestamp as
> timestamptz but truncate the time component , in that case again in future
> if someone tries to insert(using insert on conflict) data into the table
> with time component , it will get consumed and will not be restricted by
> the PK constraint. So I was trying to make the data type also as DATE for
> the transaction_timestap column.
>
> As in this case anyway we have to create another column to populate the
> date+timestamp values as we cant throw those values away per business need,
> so we will be kind of rewriting the table.So is it okay if if we will
>
> 1) Detach all the partitions.
> 2)Do the alter using "only" key word in table level. (For adding new
> column transaction_timestamp_new  to hold date+timestamp value and also
> altering the existing transaction_timestamp column to DATE from type
> timestamptz).
> 3)Then do the data fix(delete the duplicates) and alter the column, one
> partition at a time for all of the partitions and once done , attach  those
> partitions one by one.
> 5)Rename the columns at table level.Hope this won't need any table rewrite.
>
> Is there any downside if we go by the above approach?
>

Or do you mean to say there is no way we can modify the data type of a
partition key even by detaching the partitions one by one? And thus we may
have only way left is to create the table from scratch with partitions and
populate the data to it? I was avoiding this because we have many indexes
also in it , so creating from scratch means creating those indexes again.
So I wanted to achieve it by detaching partitions, doing the required
change and attaching it again.


Re: Partitioning and unique key

2024-09-02 Thread veem v
On Mon, 2 Sept 2024 at 19:13, Laurenz Albe  wrote:

> On Sun, 2024-09-01 at 01:32 +0530, veem v wrote:
> > due to postgres limitations we are unable to have this unique constraint
> or primary key
> > only on the transaction_id column, we have to include
> transaction_timestamp with it as
> > a composite key. So I want to understand from experts if there is any
> possible way to
> > satisfy both partitioning on transaction_timestamp column and unique key
> or pk just on
> > transaction_id only?
>
> No, you cannot have both.
>
> Usually the solution is to *not* create a primary key on the partitioned
> table
> and instead create a primary key on each partition.
>
> That won't guarantee global uniqueness (and there is no way to do that),
> but it
> goes a long way by ensuring that the column is unique within each
> partition.
>
> Yours,
> Laurenz Albe
>

Thank you so much.

So it means in our case the existing PK on table level on column (txn_id
and txn_timestamp), we should drop that and create a unique index on each
partition level and also the same way the foreign key also maps to the
parent table partitions. And in that case , can we follow this as best
practices to not have the primary keys defined at the tabe level at all,
but on the partition level only, or there exist any down side to it too?
Please suggest.

Also then what I see is, it will make the data load query fail which uses
"insert on conflict" to insert data into the table and that requires the
primary key on both the columns to have on table level. Also the partition
maintenance job which uses partman extension uses the template table which
in turn uses table level properties for creating new partitions and they
will not have these unique indexes created for the new partitions as
because the unique index property is not on the table level but partition
level. Can you share your thoughts on these?


Re: Partitioning and unique key

2024-09-02 Thread veem v
On Tue, 3 Sept 2024 at 01:14, Laurenz Albe  wrote:

>
> You can keep the primary key defined on both columns if it is good enough
> for you.
> But it will give you lower guarantees of uniqueness: with that primary
> key, there could
> be two rows with a different timestamp, but the same "txn_id", and these
> two rows could
> be in the same partition...
>
> Also, if you need a foreign key pointing *to* the partitioned table, you
> cannot do without
> a primary key.  But I recommend that you do *not* define such foreign
> keys: they will make
> it more difficult to detach a partition.
>
> If you partition two tables in the same way, you can use foreign keys
> between the partitions
> instead of foreign keys between the partitioned tables.  Such foreign keys
> won't be a problem.
>
>
Thank You so much.
As you rightly said *"they will make it more difficult to detach a
partition." , *we are really seeing a longer time when detaching parent
table partitions.It runs forever sometimes. So do you mean it's because we
have primary key defined table level or it's because we have FK defined in
table level (for multiple child tables which are also partitioned)?

 We were thinking it's because we have FK defined on tablelevel , so we
were planning to make the FK on partition level. But as you just pointed
now , even keeping the PK on table level will also make the detach
partition slow? I understand, for detaching partitions , it may be scanning
while child because of the FK defined on the table level. but i am unable
to understand how the table level PK impacts the detach partition from
parent here.

My understanding is PK can only be created on table level but not on the
partition level. On the partition level we only can have a "unique index"
defined. Correct me if my understanding is wrong.


IO related waits

2024-09-16 Thread veem v
Hi,
One of our application using RDS postgres. In one of our streaming
applications(using flink) which processes 100's of millions of transactions
each day, we are using row by row transaction processing for inserting data
into the postgres database and commit is performed for each row. We are
seeing heavy IO:XactSynch wait events during the data load and also high
overall response time.

Architecture team is suggesting to enable asynch io if possible, so that
the streaming client will not wait for the commit confirmation from the
database. So I want to understand , how asynch io can be enabled and if any
downsides of doing this?

Regards
Veem


Re: IO related waits

2024-09-16 Thread veem v
On Tue, 17 Sept 2024 at 03:41, Adrian Klaver 
wrote:

>
> Are you referring to this?:
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
>
> If not then you will need to be more specific.
>
>
Yes, I was referring to this one. So what can be the caveats in this
approach, considering transactions meant to be ACID compliant as financial
transactions.Additionally I was not aware of the parameter
"synchronous_commit" in DB side which will mimic the synchronous commit.

Would both of these mimic the same asynchronous behaviour and achieves the
same, which means the client data load throughput will increase because the
DB will not wait for those data to be written to the WAL and give a
confirmation back to the client and also the client will not wait for the
DB to give a confirmation back on the data to be persisted in the DB or
not?. Also, as in the backend the flushing of the WAL to the disk has to
happen anyway(just that it will be delayed now), so can this method cause
contention in the database storage side if the speed in which the data gets
ingested from the client is not getting written to the disk , and if it can
someway impact the data consistency for the read queries?


Re: IO related waits

2024-09-17 Thread veem v
On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane 
wrote:

>
> This is a better place to optimize. Batch many rows per transaction.
> Remove unused indexes.
>
> flushing of the WAL to the disk has to happen anyway(just that it will be
>> delayed now), so can this method cause contention in the database storage
>> side if the speed in which the data gets ingested from the client is not
>> getting written to the disk , and if it can someway impact the data
>> consistency for the read queries?
>>
>
> Not quite clear what you are asking here re data consistency. The data
> will always be consistent, even if synchronous_commit is disabled. The only
> danger window is on a server crash.
>
> (Keep in mind that RDS is not Postgres, so take tuning recommendations and
> advice with a grain of salt.)
>
>
>
Thank you Greg.

Yes, our Java application was doing row by row commit and we saw that from
pg_stat_database from the column "xact_commit" which was closely the same
as the sum of tup_inserted, tup_updated, tup_deleted column. And also we
verified in pg_stats_statements the number against the "calls" column were
matching to the "rows" column for the INSERT queries, so it means also we
are inserting exactly same number of rows as the number of DB calls, so it
also suggest that we are doing row by row operations/dmls.

 And we then asked the application tema to make the inserts in batches, but
still seeing those figures in these above two views are not changing much
the number "xact_commit" is staying almost same and also even the "calls"
and the "rows" column in pg_stats_statements also staying almost same. So
does it mean that the application code is somehow still doing the same row
by row processing or we are doing something wrong in the above analysis?

And another thing we noticed , even after the data load finished , even
then the "xact_commit" was keep increasing along with "tup_fetched", so
does it mean that its doing some kind of implicit commit even for the fetch
type queries which must be "select" queries i believe? Also not sure if its
expected, but here in postgres i have seen unless we put a code within
begin and end block , it's default gets committed even we just run it on
the console , it doesn't ask for a explicit commit/rollback ,so not sure if
that is someway playing a role here.

Regards
Veem



>
>


Question on pg_stat* views

2024-10-06 Thread veem v
Hi,
As per my understanding , the pg_stat_statements stores the aggregated
statistics of the query execution history(with almost near real time
executions) and the max number of statements or queries it can store
depends on the pg_stat_statement.max value(which I am seeing as ~5000 in
pg_settings). I have below questions,

1)Doing a count(*) on pg_stat_statements giving ~4818. But still
pg_stat_statements_info showing ~1990 as "dealloc" which means there are
more sql queries coming up and they are getting flushed out of the
pg_stat_statements, so does it mean that we should increase the
pg_stat_statement.max to further higher value?

2)The stats_reset column in pg_stat_statements_info view is showing as 16th
august , so does it mean that, whatever query stats are getting logged in
the pg_stat_statements, those are the aggregated stats for all the
executions that have happened since the day 16th august till today? Also as
we have not done any "stats reset" manually , so does it happen
automatically when the DB instance restarts or with any other database
events?

3)As pg_stat_statements holds the aggregated stats of all the execution for
a particular sql query ,so it's not easy to identify if in the past at some
point in time the same query suffered and thus went for high response time.
So to debug such performance  issues scenarios , is it advisable to insert
the records from this pg_stat* views to another table manually periodically
through a cron job?

Regards
Veem


Re: IO related waits

2024-09-18 Thread veem v
On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
wrote:

> On 9/17/24 12:34, veem v wrote:
> >
>
> It does if autocommit is set in the client, that is common to other
> databases also:
>
> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>
>
> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>
>
> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>
> You probably need to take a closer look at the client/driver you are
> using and the code that interacting with it.
>
> In fact I would say you need to review the entire data transfer process
> to see if there are performance gains that can be obtained without
> adding an entirely new async component.
>
>
>
You were spot on. When we turned off the "auto commit" we started seeing
less number of commits as per the number of batches.

However we also started seeing deadlock issues. We have foreign key
relationships between the tables and during the batch we do insert into the
parent first and then to the child , but this does happen from multiple
sessions for different batches. So why do we see below error, as we
ensure in each batch we first insert into parent and then into the child
tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 10443 waits for ShareLock on transaction 220972157;
blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked by
process 10443.
  Hint: See server log for query details.
  Where: while inserting index tuple (88736,28) in relation
"TAB1_p2024_08_29"


Re: IO related waits

2024-09-18 Thread veem v
On Thu, 19 Sept 2024 at 02:01, veem v  wrote:

>
> On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
> wrote:
>
>> On 9/17/24 12:34, veem v wrote:
>> >
>>
>> It does if autocommit is set in the client, that is common to other
>> databases also:
>>
>> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>>
>>
>> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>>
>>
>> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>>
>> You probably need to take a closer look at the client/driver you are
>> using and the code that interacting with it.
>>
>> In fact I would say you need to review the entire data transfer process
>> to see if there are performance gains that can be obtained without
>> adding an entirely new async component.
>>
>>
>>
> You were spot on. When we turned off the "auto commit" we started seeing
> less number of commits as per the number of batches.
>
> However we also started seeing deadlock issues. We have foreign key
> relationships between the tables and during the batch we do insert into the
> parent first and then to the child , but this does happen from multiple
> sessions for different batches. So why do we see below error, as we
> ensure in each batch we first insert into parent and then into the child
> tables?
>
> caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
>   Detail: Process 10443 waits for ShareLock on transaction 220972157;
> blocked by process 10454.
> Process 10454 waits for ShareLock on transaction 220972155; blocked by
> process 10443.
>   Hint: See server log for query details.
>   Where: while inserting index tuple (88736,28) in relation
> "TAB1_p2024_08_29"
>

As we are able to get hold of one session, we see "insert into " was blocked by "insert into ". And
the "insert into   " was experiencing a "client
read" wait event. Still unable to understand why it's happening and how to
fix it?


Re: IO related waits

2024-09-21 Thread veem v
On Sat, 21 Sept 2024 at 03:47, Peter J. Holzer  wrote:

> On 2024-09-20 14:11:38 -0700, Adrian Klaver wrote:
> > On 9/20/24 1:01 PM, veem v wrote:
> > > Able to reproduce this deadlock graph as below.  Now my question is ,
> > > this is a legitimate scenario in which the same ID can get inserted
> from
> > > multiple sessions and in such cases it's expected to skip that (thus
> "On
> > > conflict Do nothing" is used) row. But as we see it's breaking the code
> >
> > Yeah, as I see it that would not work with concurrent uncommitted
> sessions
> > as it would be unresolved whether a conflict actually exists until at
> least
> > one of the sessions completes.
> >
> > > with deadlock error during race conditions where a lot of parallel
> > > threads are operating. So how should we handle this scenario? Will
> > > setting the "lock_timeout" parameter at session level will help us
> > > anyway here?
> >
> > Serializable transaction?:
> >
> >
> https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE
>
> Doesn't help here, at least not directly. It would help indirectly
> because isolation level serializable makes it very proable that
> serialization errors occur. So an application designed for serializable
> would have some kind of retry logic already in place.
>
> SO that leads as to another solution:
>
> Retry each batch (possibly after reducing the batch size) until it
> succeeds.
>
>
Actually here the application is using kafka and  flink stream and is using
one of existing code in which it was doing row by row commit which is now
changed to Batch commit i.e. the commit point is shifted from row by row to
batch now. There are multiple sessions spawning at the same time to process
the incoming messages 24/7. And also as described in another ongoing thread
and also we saw in the past we did not see much difference between "batch
commit" and "Batch insert" performance. We only realized the row by row
commit is having worse performance.

Now, in such a scenario when the messages are coming from a streaming
framework using kafka +flink and also the insert is happening using row by
row only (but just the commit is shifted to batch), i don't see any way to
sort the ID columns in this streaming process, so that they won't overlap
across session.

In such a situation , the only way seems to have the messages replayed for
which the deadlock error happens , as I think during a deadlock error, one
session gets terminated by postgres and that messages perhap we can save in
some exception table and then replay?


  1   2   >