Re: Read write performance check

2023-12-19 Thread Lok P
As Rob mentioned, the syntax you posted is not correct. You need to process
or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one
shot.

But again your uses case seems common one considering you want to compare
the read and write performance on multiple databases with similar table
structure as per your usecase. So in that case, you may want to use some
test scripts which others must have already done rather reinventing the
wheel.


On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:

> Thank you.
>
> Yes, actually we are trying to compare and see what maximum TPS are we
> able to reach with both of these row by row and batch read/write test. And
> then afterwards, this figure may be compared with other databases etc with
> similar setups.
>
>  So wanted to understand from experts here, if this approach is fine? Or
> some other approach is advisable?
>
> I agree to the point that , network will play a role in real world app,
> but here, we are mainly wanted to see the database capability, as network
> will always play a similar kind of role across all databases. Do you
> suggest some other approach to achieve this objective?
>
>
> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>
>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>> >  So at first, we need to populate the base tables with the necessary
>> data (say
>> > 100million rows) with required skewness using random functions to
>> generate the
>> > variation in the values of different data types. Then in case of row by
>> row
>> > write/read test , we can traverse in a cursor loop. and in case of
>> batch write/
>> > insert , we need to traverse in a bulk collect loop. Something like
>> below and
>> > then this code can be wrapped into a procedure and passed to the
>> pgbench and
>> > executed from there. Please correct me if I'm wrong.
>>
>> One important point to consider for benchmarks is that your benchmark
>> has to be similar to the real application to be useful. If your real
>> application runs on a different node and connects to the database over
>> the network, a benchmark running within a stored procedure may not be
>> very indicative of real performance.
>>
>> hp
>>
>> --
>>_  | Peter J. Holzer| Story must make more sense than reality.
>> |_|_) ||
>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>> __/   | http://www.hjp.at/ |   challenge!"
>>
>


Re: Read write performance check

2023-12-22 Thread Lok P
As I mentioned your scenario looks generic one, but I don't have any sample
scripts/docs to share, sorry for that. Other people may suggest any sample
scripts etc if any. or you may post it on performance group, if someone has
done similar stuff in the past.

 But as per me, the performance test scripts will look like 1) row by row
insert/select in cursor loop as you were showing earlier in this thread and
another one with batch/bulk inserts. And then calling it through pgbench or
any other scheduler for creating concurrency.

On Thu, Dec 21, 2023 at 7:00 PM veem v  wrote:

> Can someone please guide me, if any standard scripting is available for
> doing such read/write performance test? Or point me to any available docs?
>
> On Wed, 20 Dec, 2023, 10:39 am veem v,  wrote:
>
>> Thank you.
>>
>> That would really be helpful if such test scripts or similar setups are
>> already available. Can someone please guide me to some docs or blogs or
>> sample scripts, on same please.
>>
>> On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:
>>
>>> As Rob mentioned, the syntax you posted is not correct. You need to
>>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
>>> at one shot.
>>>
>>> But again your uses case seems common one considering you want to
>>> compare the read and write performance on multiple databases with similar
>>> table structure as per your usecase. So in that case, you may want to use
>>> some test scripts which others must have already done rather reinventing
>>> the wheel.
>>>
>>>
>>> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>>>
>>>> Thank you.
>>>>
>>>> Yes, actually we are trying to compare and see what maximum TPS are we
>>>> able to reach with both of these row by row and batch read/write test. And
>>>> then afterwards, this figure may be compared with other databases etc with
>>>> similar setups.
>>>>
>>>>  So wanted to understand from experts here, if this approach is fine?
>>>> Or some other approach is advisable?
>>>>
>>>> I agree to the point that , network will play a role in real world app,
>>>> but here, we are mainly wanted to see the database capability, as network
>>>> will always play a similar kind of role across all databases. Do you
>>>> suggest some other approach to achieve this objective?
>>>>
>>>>
>>>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, 
>>>> wrote:
>>>>
>>>>> On 2023-12-20 00:44:48 +0530, veem v wrote:
>>>>> >  So at first, we need to populate the base tables with the necessary
>>>>> data (say
>>>>> > 100million rows) with required skewness using random functions to
>>>>> generate the
>>>>> > variation in the values of different data types. Then in case of row
>>>>> by row
>>>>> > write/read test , we can traverse in a cursor loop. and in case of
>>>>> batch write/
>>>>> > insert , we need to traverse in a bulk collect loop. Something like
>>>>> below and
>>>>> > then this code can be wrapped into a procedure and passed to the
>>>>> pgbench and
>>>>> > executed from there. Please correct me if I'm wrong.
>>>>>
>>>>> One important point to consider for benchmarks is that your benchmark
>>>>> has to be similar to the real application to be useful. If your real
>>>>> application runs on a different node and connects to the database over
>>>>> the network, a benchmark running within a stored procedure may not be
>>>>> very indicative of real performance.
>>>>>
>>>>> hp
>>>>>
>>>>> --
>>>>>_  | Peter J. Holzer| Story must make more sense than reality.
>>>>> |_|_) ||
>>>>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>>>>> __/   | http://www.hjp.at/ |   challenge!"
>>>>>
>>>>


How to do faster DML

2024-02-03 Thread Lok P
Hello All,
A non partitioned table having ~4.8 billion rows in it and having data size
as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got
approx ~1billion+ duplicate rows inserted in it and we want to get the
duplicate data removed for this table and create a PK/unique constraint
back so as to not have the duplicate values in future. We are struggling to
do the same.

Teammates suggested doing this using CTAS method, i.e. create a new table
with the unique record set and then drop the main table. Something as below

create table TAB1_New
as
SELECT  * from TAB1 A
where ID in
  (select min(ID) from TAB1
  group by ID having count(ID)>=1 );

But for the above to work faster , they mentioned to have an index created
on the column using which the duplicate check will be performed i.e ID
column. So, creating the index itself took ~2hrs+ and the index size now
shows as ~116GB.


*Create index idx1 on TAB1(ID)*
And now running the SELECT subquery part of the CTAS statement to see if
its giving the correct unique records count. It ran for 2.5 hrs and then we
killed it. Below is the plan for the same.

explain
*select  min(ID) from TAB1 Agroup by ID having count(ID)>=1*

GroupAggregate  (cost=0.71..6025790113.87 rows=29432861 width=46)
  Group Key: ID
  Filter: (count(ID) >= 1)
  ->  Index Only Scan using idx1 on TAB1 a  (cost=0.71..5988060903.17
rows=4883397120 width=14)

I want to understand if by any way this can be done faster . Also I am
worried that creating PK constraint/index back after deleting the duplicate
is also going to run forever. Is there any way we can make these heavy
operations faster on postgre by facilitating more database resources
through some parameter setup, like parallel hint etc? We have pg_hint_plan
extension added, but not seeing the parallel hint enforced when adding it
to the query.

In Oracle we have Parallel hints, Direct path read/write for faster
read/write operations, parallel index scan etc. available, if anything
similar to that available in aurora postgre to facilitate more
horsepower and speed up the batch operations. And , how can we monitor
progress of any running query ?
 Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU,
256GB RAM. PG version 15.4.

Regards
Lok


Re: How to do faster DML

2024-02-03 Thread Lok P
Apology. One correction, the query is like below. I. E filter will be on on
ctid which I believe is equivalent of rowid in oracle and we will not need
the index on Id column then.

 But, it still runs long, so thinking any other way to make the duplicate
removal faster?

Also wondering , the index creation which took ~2.5hrs+ , would that have
been made faster any possible way by allowing more db resource through some
session level db parameter setting?

create table TAB1_New
as
SELECT  * from TAB1 A
where CTID in
  (select min(CTID) from TAB1
  group by ID having count(ID)>=1 );


On Sat, Feb 3, 2024 at 5:50 PM Lok P  wrote:

> Hello All,
> A non partitioned table having ~4.8 billion rows in it and having data
> size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has
> got approx ~1billion+ duplicate rows inserted in it and we want to get the
> duplicate data removed for this table and create a PK/unique constraint
> back so as to not have the duplicate values in future. We are struggling to
> do the same.
>
> Teammates suggested doing this using CTAS method, i.e. create a new table
> with the unique record set and then drop the main table. Something as below
>
> create table TAB1_New
> as
> SELECT  * from TAB1 A
> where ID in
>   (select min(ID) from TAB1
>   group by ID having count(ID)>=1 );
>
> But for the above to work faster , they mentioned to have an index created
> on the column using which the duplicate check will be performed i.e ID
> column. So, creating the index itself took ~2hrs+ and the index size now
> shows as ~116GB.
>
>
> *Create index idx1 on TAB1(ID)*
> And now running the SELECT subquery part of the CTAS statement to see if
> its giving the correct unique records count. It ran for 2.5 hrs and then we
> killed it. Below is the plan for the same.
>
> explain
> *select  min(ID) from TAB1 Agroup by ID having count(ID)>=1*
>
> GroupAggregate  (cost=0.71..6025790113.87 rows=29432861 width=46)
>   Group Key: ID
>   Filter: (count(ID) >= 1)
>   ->  Index Only Scan using idx1 on TAB1 a  (cost=0.71..5988060903.17
> rows=4883397120 width=14)
>
> I want to understand if by any way this can be done faster . Also I am
> worried that creating PK constraint/index back after deleting the duplicate
> is also going to run forever. Is there any way we can make these heavy
> operations faster on postgre by facilitating more database resources
> through some parameter setup, like parallel hint etc? We have pg_hint_plan
> extension added, but not seeing the parallel hint enforced when adding it
> to the query.
>
> In Oracle we have Parallel hints, Direct path read/write for faster
> read/write operations, parallel index scan etc. available, if anything
> similar to that available in aurora postgre to facilitate more
> horsepower and speed up the batch operations. And , how can we monitor
> progress of any running query ?
>  Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU,
> 256GB RAM. PG version 15.4.
>
> Regards
> Lok
>


Re: How to do faster DML

2024-02-03 Thread Lok P
Ron Johnson 
7:37 PM (1 hour ago)
to *pgsql-general*
On Sat, Feb 3, 2024 at 7:37 PM Ron Johnson  wrote:

> On Sat, Feb 3, 2024 at 8:55 AM Lok P  wrote:
>
>> Apology. One correction, the query is like below. I. E filter will be on
>> on ctid which I believe is equivalent of rowid in oracle and we will not
>> need the index on Id column then.
>>
>>  But, it still runs long, so thinking any other way to make the duplicate
>> removal faster?
>>
>> Also wondering , the index creation which took ~2.5hrs+ , would that have
>> been made faster any possible way by allowing more db resource through some
>> session level db parameter setting?
>>
>> create table TAB1_New
>> as
>> SELECT  * from TAB1 A
>> where CTID in
>>   (select min(CTID) from TAB1
>>   group by ID having count(ID)>=1 );
>>
>>
>> On Sat, Feb 3, 2024 at 5:50 PM Lok P  wrote:
>>
>>> Hello All,
>>> A non partitioned table having ~4.8 billion rows in it and having data
>>> size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has
>>> got approx ~1billion+ duplicate rows inserted in it and we want to get the
>>> duplicate data removed for this table and create a PK/unique constraint
>>> back so as to not have the duplicate values in future. We are struggling to
>>> do the same.
>>>
>>> Teammates suggested doing this using CTAS method, i.e. create a new
>>> table with the unique record set and then drop the main table. Something as
>>> below
>>>
>>> create table TAB1_New
>>> as
>>> SELECT  * from TAB1 A
>>> where ID in
>>>   (select min(ID) from TAB1
>>>   group by ID having count(ID)>=1 );
>>>
>>> But for the above to work faster , they mentioned to have an index
>>> created on the column using which the duplicate check will be performed i.e
>>> ID column. So, creating the index itself took ~2hrs+ and the index size now
>>> shows as ~116GB.
>>>
>>>
>>> *Create index idx1 on TAB1(ID)*
>>> And now running the SELECT subquery part of the CTAS statement to see if
>>> its giving the correct unique records count. It ran for 2.5 hrs and then we
>>> killed it. Below is the plan for the same.
>>>
>>> explain
>>> *select  min(ID) from TAB1 Agroup by ID having count(ID)>=1*
>>>
>>> GroupAggregate  (cost=0.71..6025790113.87 rows=29432861 width=46)
>>>   Group Key: ID
>>>   Filter: (count(ID) >= 1)
>>>   ->  Index Only Scan using idx1 on TAB1 a  (cost=0.71..5988060903.17
>>> rows=4883397120 width=14)
>>>
>>> I want to understand if by any way this can be done faster . Also I am
>>> worried that creating PK constraint/index back after deleting the duplicate
>>> is also going to run forever. Is there any way we can make these heavy
>>> operations faster on postgre by facilitating more database resources
>>> through some parameter setup, like parallel hint etc? We have pg_hint_plan
>>> extension added, but not seeing the parallel hint enforced when adding it
>>> to the query.
>>>
>>> In Oracle we have Parallel hints, Direct path read/write for faster
>>> read/write operations, parallel index scan etc. available, if anything
>>> similar to that available in aurora postgre to facilitate more
>>> horsepower and speed up the batch operations. And , how can we monitor
>>> progress of any running query ?
>>>  Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU,
>>> 256GB RAM. PG version 15.4.
>>>
>>
>  Aurora is not Postgresql, so configurations might not work.  Having said
> that...
> https://www.postgresql.org/docs/15t/how-parallel-query-works.html
>
> And have you analyzed the table lately?  Also, what's your work_mem
> and maintenance_work_mem?
>

 Thank you .

Below are the values of the default parameters in this instance

SHOW max_worker_processes; - 128
show max_parallel_workers_per_gather;- 4
show max_parallel_workers;- 32
show max_parallel_maintenance_workers; - 2
show maintenance_work_mem; - 4155MB
show work_mem; - 8MB
show shared_buffers ; -22029684

When I ran the CTAS queries and index creation process , I had not a very
clear idea of how these are related to each other and help each of the
operations,  but I set a few of those as below before triggering those in
the same session.

set max_parallel_workers_per_gather=16;
SET max_parallel_maintenance_workers TO 16;
SET maintenance_work_mem TO '16 GB';

The instance has a total ~256 GB memory, so how should I adjust/bump these
values when running heavy SELECT queries doing a large sequential scan  OR
large index creation process OR any Select query with heavy sorting/"order
by" operations OR heavy JOINS?

I have not analyzed the table manually though , but seeing the auto_vaccum
and auto_analyze column getting populated in the pg_stat_user_tables , I
thought it must be doing that automatically.

By the way if we run "analyze tab1' on this 1.5TB table , will that run
longer and will any of the above parameters help to expedite that ANALYZE
operation too, if I run the ANALYZE manually?

Regards
Lok

>


Re: How to do faster DML

2024-02-03 Thread Lok P
On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte 
wrote:

> On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane 
> wrote:
> ...
> > Given the size of your table, you probably want to divide that up.
> > As long as nothing is changing the original table, you could do:
> >
> > insert into mytable2 select * from mytable1 order by ctid limit
> 10_000_000 offset 0;
> > insert into mytable2 select * from mytable1 order by ctid limit
> 10_000_000 offset 10_000_000;
> > insert into mytable2 select * from mytable1 order by ctid limit
> 10_000_000 offset 20_000_000;
>
> Is it me or does this have the faint smell of quadratic behaviour? I
> mean, you need to read and discard the first 10M to do offset 10M (
> also I believe ctid order is implied in sequential scan, but no so
> sure, if not it would need a full table sort on each pass ).
>
> When doing things like this, I normally have some kind of unique index
> and do it by selecting with limit above the last read value( stored
> when doing it via code, as I normally load first, index later so I
> cannot select max() fast on the target ). Or, with some kind of
> "sparse" index (meaning, groups much less than the batch size ) and a
> little code you can select where index_col > last order by index_col
> limit 10M, remember last received index_col and reselect discarding
> missing ( or just reselect it when doing your on conflict do nothing,
> which also has a fast select max(id) due to the PK, it will work if it
> has an index on id column on the original even if not unique ) to
> avoid that.
>
> Also, I'm not sure if ctid is ordered and you can select where
> ctid>last ordered, if that works it probably is faster for immutable
> origins.
>
> Francisco Olarte.



Thank you.

Yes  , I think the "on conflict do nothing;" option looks promising as it
will remove the duplicate in case of PK violation but keepte load continue
for subsequent rows.

However , as we have ~5billion rows in the base table and out of that , we
were expecting almost half i.e. ~2billion would be duplicates. And you
said, doing the inserts using the "order by CTID Offset" approach must
cause one full sequential scan of the whole table for loading each
chunk/10M of rows and that would take a long time I believe.

I am still trying to understand the other approach which you suggested. Not
able to understand "y*ou can select where index_col > last order by
index_col **limit 10M," .*
However, to get the max ID value of the last 10M loaded rows in target, do
you say that having an PK index created on that target table column(ID)
will help, and we can save the max (ID) value subsequently in another table
to fetch and keep loading from the source table (as ID>Max_ID stored in
temp table)?
OR
Would it be better to do it in one shot only , but by setting a higher
value of some parameters like "maintenance_work_mem" or
"max_parallel_workers"?


Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 8:14 PM Dennis White  wrote:

> I'm surprised no one has mentioned perhaps it's a good idea to partition
> this table while adding the pk. By your own statements the table is
> difficult to work with as is. Without partitioning the table, row inserts
> would need to walk the pk index and could be a factor. If this is static
> table then perhaps that's ok but if not...
>
> Anyway I don't recall what type the ID was or how it's set but i suggest
> you seriously investigate using it to partition this table into manageable
> smaller tables.
> Best of luck.
>
> On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro  wrote:
>
>> insert into mytable2 select * from mytable1 order by ctid limit
>>> 10_000_000 offset 20_000_000;
>>>
>>
>> You can use min/max values grouping them by 10_000_000 records, so you
>> don´t need that offset, then generate commands and run them.
>>
>> select format('insert into mytable2 select * from mytable1 where i
>> between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>>
>
Thank you so much.

You are correct. It was seemingly difficult to operate on this table. Every
read query is struggling and so partitioning is something we must have to
think of. And hoping that, postgres will be able to handle this scale, with
proper partitioning and indexing strategy.

I have a few related questions.

1)Even after partitioning the target table , to speed up the data load on
this table , Is there an option to disable the primary and foreign keys and
re-enable them post data load finishes. Will that be a good idea or will it
be cumbersome/resource intensive to re-enable the constraints , after
persisting all the data in the table?

2)I understand there is no limitation theoretically on the number or size
of partitions a table can have in postgres. But I want to know from experts
here, from their real life experience, if there exists any such thing which
we should take care of before deciding the partitioning strategy, so as to
have the soft limit (both with respect to size and number of partitions)
obeyed.
Note:- Currently this table will be around ~1TB in size and will hold
Approx ~3billion rows(post removal of duplicates). But as per business need
it may grow up to ~45 billion rows in future.

3)As the size of the table or each partition is going to be very large and
this will be a read intensive application, compressing the historical
partition will help us save the storage space and will also help the read
queries performance. So, Can you please throw some light on the compression
strategy which we should follow here (considering a daily range partition
table based on transaction_date as partition key)?

Regards
Lok


Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane 
wrote:

> Partitioning will depend a lot on how you access the data. Is it always
> using that transaction_date column in the where clause? Can you share the
> table definition via \d? We've talked about this table quite a bit, but not
> actually seen what it looks like. We are flying blind a little bit. You
> mentioned your queries are slow, but can you share an explain analyze on
> one of these slow queries?
>
> 45 billion rows is quite manageable. How many rows are in each day? You
> may want to do a larger bucket than 24 hours per partition.
>
> 1)Even after partitioning the target table , to speed up the data load on
>> this table , Is there an option to disable the primary and foreign keys and
>> re-enable them post data load finishes. Will that be a good idea
>>
>
> No.
>
> 3)As the size of the table or each partition is going to be very large and
>> this will be a read intensive application, compressing the historical
>> partition will help us save the storage space and will also help the read
>> queries performance
>
>
> I am not sure what this means. If you are not going to need the data
> anymore, dump the data to deep storage and drop the partition.
>
> Cheers,
> Greg
>
>

Thank you.

The table has ~127 columns of different data types , combinations of
Numeric, varchar, date etc. And is having current size ~1TB holding
~3billion rows currently and the row size is ~300bytes.

Currently it has lesser volume , but in future the daily transaction per
day which will be inserted into this table will be Max ~500million
rows/day. And the plan is to persist at least ~3months of transaction data
which will be around 45billion rows in total. And if all works well , we
may need to persist ~6 months worth of data in this database in future and
that will be ~90 billion.

This table will always be queried on the transaction_date column as one of
the filters criteria. But the querying/search criteria can span from a day
to a month worth of transaction date data.

When you said "*You may want to do a larger bucket than 24 hours per
partition.*", do you mean to say partition by weekly or so? Currently as
per math i.e. 1TB of storage for ~3billion rows. So the daily range
partition size( to hold ~500million transactions/day) will be around
~100-150GB. Won't that be too much data for a single partition to operate
on, and increasing the granularity further(say weekly) will make the
partition more bulkier?

What I mean was, we will definitely need the data for querying purposes by
the users, but just to keep the storage space incontrol (and to help the
read queries), we were thinking of having the historical partitions
compressed. And for that , if any compression strategy should we follow on
postgres?

With regards to loading data to the table faster, wondering why you said '
*NO*' to load the data first and enabling/Creating the Primary key and
Foreign key constraint later approach. Because this table is a child table
and the parent is already having data in it, loading data to this table in
presence of PK and FK makes it too slow as it tries to validate those for
each set of rows. So we were thinking if doing it at a later stage at
oneshot will be a faster approach. Please suggest.

I will try to collect some SELECT query and post the explain analyze.
Currently we are trying to get rid of the duplicates.

Regards
Lok

>


Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:25 PM Ron Johnson  wrote:

> 1. Load the children before attaching them to the parent.
> 2. Create the child indices, PK and FKs before attaching to the parent.
> 3. Do step 2 in multiple parallel jobs.  (cron is your friend.)
> 4. Attach the children to the "naked" (no PK, no FK, no indices) parent.
> 5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the
> indices, PK and FK to the parent *after* step 4.
>
>>
>>
>>
>>
Thank You.

Can you please help me to understand these steps a bit  more accurately.

 Say we have a parent table already having data in it and also a primary
key defined.We will load the child table first , by dropping the Primary
key and Foreign keys, so that data load will be faster.

Then we will create the primary key index on the child table. When you said
using multiple parallel jobs, do you mean creating the PK index on each
partition separately from different sessions rather than creating on the
table using a single statement or some other faster way to create the PK
index?

Now the last step is attaching the PK and FK of the above child table to
the parent. This will validate each and every row for the uniqueness and
also to ensure the foreign key is present in the parent table. Won't this
take a longer time in this step?


Re: How to do faster DML

2024-02-05 Thread Lok P
On Mon, 5 Feb, 2024, 2:21 am Lok P,  wrote:


On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane 
wrote:

Partitioning will depend a lot on how you access the data. Is it always
using that transaction_date column in the where clause? Can you share the
table definition via \d? We've talked about this table quite a bit, but not
actually seen what it looks like. We are flying blind a little bit. You
mentioned your queries are slow, but can you share an explain analyze on
one of these slow queries?

45 billion rows is quite manageable. How many rows are in each day? You may
want to do a larger bucket than 24 hours per partition.

1)Even after partitioning the target table , to speed up the data load on
this table , Is there an option to disable the primary and foreign keys and
re-enable them post data load finishes. Will that be a good idea


No.

3)As the size of the table or each partition is going to be very large and
this will be a read intensive application, compressing the historical
partition will help us save the storage space and will also help the read
queries performance


I am not sure what this means. If you are not going to need the data
anymore, dump the data to deep storage and drop the partition.

Cheers,
Greg



Thank you.

The table has ~127 columns of different data types , combinations of
Numeric, varchar, date etc. And is having current size ~1TB holding
~3billion rows currently and the row size is ~300bytes.

Currently it has lesser volume , but in future the daily transaction per
day which will be inserted into this table will be Max ~500million
rows/day. And the plan is to persist at least ~3months of transaction data
which will be around 45billion rows in total. And if all works well , we
may need to persist ~6 months worth of data in this database in future and
that will be ~90 billion.

This table will always be queried on the transaction_date column as one of
the filters criteria. But the querying/search criteria can span from a day
to a month worth of transaction date data.

When you said "*You may want to do a larger bucket than 24 hours per
partition.*", do you mean to say partition by weekly or so? Currently as
per math i.e. 1TB of storage for ~3billion rows. So the daily range
partition size( to hold ~500million transactions/day) will be around
~100-150GB. Won't that be too much data for a single partition to operate
on, and increasing the granularity further(say weekly) will make the
partition more bulkier?

What I mean was, we will definitely need the data for querying purposes by
the users, but just to keep the storage space incontrol (and to help the
read queries), we were thinking of having the historical partitions
compressed. And for that , if any compression strategy should we follow on
postgres?

With regards to loading data to the table faster, wondering why you said '
*NO*' to load the data first and enabling/Creating the Primary key and
Foreign key constraint later approach. Because this table is a child table
and the parent is already having data in it, loading data to this table in
presence of PK and FK makes it too slow as it tries to validate those for
each set of rows. So we were thinking if doing it at a later stage at
oneshot will be a faster approach. Please suggest.

I will try to collect some SELECT query and post the explain analyze.
Currently we are trying to get rid of the duplicates.

Regards
Lok


Any thoughts, based on above usage pattern?

While I did the maths based on the current stats with limited data sets.
The partitions size coming as 100 to 150gb as I explained above, if we keep
it daily range partition. Should we have to make it hourly then?

 So want some experts view if this going to work fine for a read latency
intensive applications or we should have some different strategy?


Re: How to do faster DML

2024-02-05 Thread Lok P
On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane 
wrote:

> On Sun, Feb 4, 2024 at 3:52 PM Lok P  wrote:
>
>> This table will always be queried on the transaction_date column as one
>> of the filters criteria. But the querying/search criteria can span from a
>> day to a month worth of transaction date data.
>>
>
> In that case, and based on the numbers you provided, daily partitioning
> seems a decent solution.
>
> What I mean was, we will definitely need the data for querying purposes by
>> the users, but just to keep the storage space incontrol (and to help the
>> read queries), we were thinking of having the historical partitions
>> compressed. And for that , if any compression strategy should we follow on
>> postgres?
>>
>
> There is no compression strategy, per se. There are ways Postgres
> internally compresses the data (see "TOAST"), but in general, the table is
> either there or not. If space is a concern you should examine if you truly
> need 127 columns, make sure they are arranged correctly (search for
> 'postgres column tetris'), and move unused and older partitions elsewhere -
> like a separate cheaper Postgres server, or something offline.
>
> With regards to loading data to the table faster, wondering why you said '
>> *NO*' to load the data first and enabling/Creating the Primary key and
>> Foreign key constraint later approach.
>>
>
> For an initial load, this is fine, if you do things very carefully. I'd
> build the child table indexes post load but still feed things into the main
> table as an initial tradeoff, but YMMV.
>
> Just looked back and saw this is actually Aurora, not Postgres. Most of
> the advice on this thread should still apply, but be aware that things are
> not the same as Postgres, especially at the storage layer. For all the
> money you are giving them, don't forget to bug them as well.
>
> Cheers,
> Greg
>
>
Thank you so much Greg. That helps.

So when you said *"In that case, and based on the numbers you provided,
daily partitioning seems a decent solution."*
, does it mean that size of the partition (which will be ~100-150GB per
partition here) is not an issue for the postgres to serve read latency
sensitive queries?

And yes I think you are spot on in regards to the storage. Actually we
loaded csv files which were having size summed to ~200GB and I was
wondering howcome the table size becomes ~1TB when that data is loaded to
the database. I was not aware about the "column tetris" and how postgres
padding additional spaces to the column data based on subsequent columns
type, still trying to digest the concept. Though we have many columns
with NULL values in them. Will try to reorder the column in the table and
hopefully , we will be able to really reduce the storage space through
that. Will raise a case too.

Regards
Lok


Re: How to do faster DML

2024-02-05 Thread Lok P
Thank you Greg.

*"and keeping all your active stuff in cache. Since you have 127 columns,
only pull back the columns you absolutely need for each query."*

Understood the point. As postgres is a "row store" database, so keeping the
size of the row lesser by making the number of columns to minimum, should
definitely help as more number of rows can be packed then into one page or
block (which I believe is 8KB default size per block). And that mean more
number of blocks can be cached while reading, so better cache hit ratio.

As you rightly pointed out, Another thing I noticed the shared_buffer
parameters set as 2029684 in this instance, which comes to ~21MB and that
seems very small for a database operating in large scale. And I see we have
RAM in the instance showing as ~256GB. So thinking of bumping it to
something as ~40-50GB. Hope that will help to some extent. Not sure if
there is methods to manually,  cache some objects(tables/indexes) which
were getting used frequently by the read queries.


On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane 
wrote:

>
> So when you said *"In that case, and based on the numbers you provided,
>> daily partitioning seems a decent solution."*
>> , does it mean that size of the partition (which will be ~100-150GB per
>> partition here) is not an issue for the postgres to serve read latency
>> sensitive queries?
>>
>
> Yes, that should be fine. At the end of the day, what's going to be more
> important is making sure you are using good indexing, and keeping all your
> active stuff in cache. Since you have 127 columns, only pull back the
> columns you absolutely need for each query.
>
> Cheers,
> Greg
>
>


Re: How to do faster DML

2024-02-07 Thread Lok P
> On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane 
> wrote:
>
>> On Sun, Feb 4, 2024 at 3:52 PM Lok P  wrote:
>>
>>> What I mean was, we will definitely need the data for querying purposes
>>> by the users, but just to keep the storage space incontrol (and to help the
>>> read queries), we were thinking of having the historical partitions
>>> compressed. And for that , if any compression strategy should we follow on
>>> postgres?
>>>
>>
>> There is no compression strategy, per se. There are ways Postgres
>> internally compresses the data (see "TOAST"), but in general, the table is
>> either there or not. If space is a concern you should examine if you truly
>> need 127 columns, make sure they are arranged correctly (search for
>> 'postgres column tetris'), and move unused and older partitions elsewhere -
>> like a separate cheaper Postgres server, or something offline.
>>
>> Cheers,
>> Greg
>>
>>
Rearranging the table columns by typlen desc, didn't give much storage
space reduction.

So, I was trying TOAST compression by creating the table from scratch and
using the LZ4 algorithm defined on the column definition level just for
varchar type columns , as it seems this compression only works for varchar
and text columns. And the table had 7 columns defined as varchar out of
total 12 columns. I write the DDL something as below

Column1 varchar(50) compression(lz4) not null

However , when i loaded the table using INSERT AS SELECT from the main
table(which is uncompressed one) , i see the size of the compressed table
remains same and also i applied the function "pg_column_compression()" to
see if any column value is compressed using lz4, it returns all "null",
which means not compressed.

So it seems the compression does not apply for the rows inserted using
"CTAS" or "INSERT AS SELECT". Does that mean it is only applicable for the
row by row inserts but not batch inserts(which a bigger system normally
will have)? I was not expecting this though, so it was disappointing.

Regards
Lok


Users and object privileges maintenance

2024-02-17 Thread Lok P
Hello All,
 We were having past experience in Oracle and are newly getting moved to
postgres database. In there we have schema which alternately also called as
Users and the super user was sys/system through which the dev team never
login but only DBA does. And DBA used to create some functional users to
which write access is provided to the table/view when they are created
through Grant command. These functional users are used by applications to
do the DML etc. Other users are given read only privileges only so as to
not do DML but just view the data.

 Again these privileges are not given directly to the objects but given
through roles for better maintenance and control purposes. We had some
users also which were not supposed to see some sensitive attribute, so we
created views(by excluding the sensitive column) on top of those and only
gave the read-only access to the views but not to the underlying tables.

So I just wanted to understand if these grants and privileges for
objects/users are given and maintained in a similar way in postgres
database Or any different strategy is needed. And if it's different in RDS
Postgres as compared to opensource one? If any document to follow for the
same.

Regards
Lok


Re: Question on Table creation

2024-02-20 Thread Lok P
*"1)In this situation , do we have to drop the  "Schema_Owner"  and
recreate it with all small letters? And then create the schema with small
letters again?"*

As per above question goes, I believe OP is not required to drop and
recreate but has to just Rename the user something as below and that would
remove the case sensitiveness making all lower case. and the same thing can
be done for the schema too.

ALTER USER "*Schema_Owner*"   RENAME TO  *Schema_Owner* ;
ALTER schema "Schema_Name"   RENAME TO Schema_Name;


Re: Creating table and indexes for new application

2024-02-22 Thread Lok P
My 2 cents...
Foreign key indexes are required for avoiding locking when deleting and
updating the pk in parent. But insert only table may not have any issue.
And we used to do this in other relational databases like oracle but not
sure how different is this in PG.
However considering your high transaction volume it might have some
negative impact on your data inserts having foreign keys on the txn tables
and also related indexes in first place I think. So better evaluate.
Others may comment on this.

On Fri, 23 Feb, 2024, 2:05 am yudhi s,  wrote:

> Hello Friends,
> We are newly creating tables for a system which is going to consume
> transaction data from customers and store in postgres version 15+ database.
> And it would be ~400million transactions/rows per day in the main
> transaction table and almost double in the multiple child tables and some
> child tables will hold lesser records too.
>
> Say TABLE1 is the main or parent transaction table which will hold
> ~400million transactions.
> TABLE2 is the child table which is going to hold ~800million rows/day. It
> has one to many relationships with table1.
> TABLe3 is another child table which is going to hold ~200million rows per
> day.
>
> We are considering all of these tables for partitioning by the same
> transaction_date column and it would be daily partitions. We have some
> questions,
>
> 1)While creating these tables and related indexes, do we need to be
> careful of defining any other storage parameters like tablespaces etc Or
> its fine to make those table/indexes aligned to the default tablespace
> only? and are there any constraints on tablespace size , as we will have
> 100's GB of data going to be stored in each of the daily partitions?
>
> 2)Should we be creating composite indexes on each foreign key for table2
> and table3, because any update or delete on parent is going to take lock on
> all child tables?
>
> 3)We were thinking of simple Btree indexes to be created on the columns
> based on the search criteria of the queries. but the indexes doc i see in
> postgres having INCLUDE keywords also available in them. So I'm struggling
> to understand a bit, how it's adding value to the read query performance if
> those additional columns are not added explicitly to the index but are part
> of the INCLUDE clause? Will it give some benefit in regards to the index
> storage space? or should we always keep all the columns in the index
> definition itself other than some exception scenario? Struggling to
> understand the real benefit of the INCLUDE clause.
>
> Below is a sample DDL of what it will look like.
>
>  Create table syntax:-
>
> CREATE TABLE TABLE1
> (
> TXN_ID varchar(36)  NOT NULL ,
> txn_timestamp date  NOT NULL ,
> CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
> ) partition by range (txn_timestamp);
>
>
> CREATE TABLE TABLE2
> (
> table2_id varchar(36) not null,
> TXN_ID varchar(36)  NOT NULL ,
> txn_timestamp date  NOT NULL ,
> CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
> ) partition by range (txn_timestamp);
>
> alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references
>  table1(TXN_ID,txn_timestamp);
> Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key
>
> CREATE TABLE TABLE3
> (
> table3_id varchar(36) not null,
> TXN_ID varchar(36)  NOT NULL ,
> txn_timestamp date  NOT NULL ,
> CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
> ) partition by range (txn_timestamp);
>
> alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references
>  table1(TXN_ID,txn_timestamp);
> Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign
> key.
>
> Thanks And Regards
> Yudhi
>


Re: Creating table and indexes for new application

2024-02-25 Thread Lok P
On Sun, 25 Feb, 2024, 1:05 am yudhi s,  wrote:

>
> On Fri, Feb 23, 2024 at 5:26 PM sud  wrote:
>
>>
>>
>> On Fri, 23 Feb, 2024, 1:28 pm yudhi s, 
>> wrote:
>>
>>>
>>>
>>> On Fri, 23 Feb, 2024, 1:20 pm sud,  wrote:
>>>


 On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, 
 wrote:

> On Fri, 2024-02-23 at 02:05 +0530, yudhi s
>
> > 2)Should we be creating composite indexes on each foreign key for
> table2 and table3, because
> >   any update or delete on parent is going to take lock on all child
> tables?
>
> Every foreign key needs its own index.  A composite index is only
> appropriate if the foreign
> key spans multiple columns.
>
>

 From the DDL which OP posted it's using composite foreign key thus a
 composite index would be needed.
 However, if someone doesn't delete or update the parent table PK   , is
 it still advisable to have all the FK indexed? Like in general I think
 transaction id should not get updated in a normal scenario unless some
 special case.



>>> Thank you. I can double check if we have confirmed use case of deleting
>>> the parent table or updating PK in the parent table. But anyway it can
>>> happen for data fix for sure in some scenario.
>>>
>>> But yes, we are certainly going to drop/purge partition from all the
>>> parent and child table after specific days. So isn't that need the FK to be
>>> indexed or else it will scan whole parent table partition?
>>>
>>
>>
>> I am not sure if drop partition of parent table, will have a lock or will
>> do a full scan on the child table while doing the partition maintenance or
>> dropping the partitions, in absence of foreign key index. Others may
>> comment here.
>>
>>>

> Can you please help me understand, If it's true that all the statements
> like Delete, Update and Drop partition of parent table will take lock on
> the child table and Full scan the child table , and thus foreign key index
> on all the child table is necessary irrespective of the performance
> overhead it has on all the INSERT queries into the child tables?
>


Not sure of the lock but I think you should be able to see a full scan on
child table while doing delete or update of parent table PK. Explain
Analyze should show that I believe. Not sure if explain analyze will work
for drop partition too.

>


Re: Is partition pruning impacted by data type

2024-03-05 Thread Lok P
On Tue, Mar 5, 2024 at 1:09 AM sud  wrote:

>
> However the question we have is ,
> 1)If there is any downside of having the partition key with "timestamp
> with timezone" type? Will it impact the partition pruning of the queries
> anyway by appending any run time "time zone" conversion function during the
> query planning/execution phase?
> 2) As it will take the default server times , so during daylight saving
> the server time will change, so in that case, can it cause any unforeseen
> issue?
> 3)Will this cause the data to be spread unevenly across partitions and
> make the partitions unevenly sized? If will go for UTC/GMT as db time, the
> user's one day transaction might span across two daily partitions.
>
>
My 2 cents.
We have cases which use the "timestamp with timezone" column as partition
key  and the partition pruning happens for the read queries without any
issue, so we don't see any conversion functions applied to the predicate as
such which is partition key. I think if the users go global it's better to
have the database time in UTC time zone. and it's obvious that, In case of
global users the data ought to be span across multiple days as the days
won't be as per the users time zone rather UTC.


Re: Not able to purge partition

2024-03-24 Thread Lok P
On Sun, Mar 24, 2024 at 12:38 AM veem v  wrote:

> On Sat, 23 Mar 2024 at 23:08, Laurenz Albe 
> wrote:
>
>> On Sat, 2024-03-23 at 22:41 +0530, veem v wrote:
>> > 1)As we see having foreign key defined is making the detach partition
>> run
>> >   for minutes(in our case 5-10minutes for 60 million rows partition), so
>> >   how to make the parent table partition detach and drop work fast in
>> such
>> >   a scenario while maintaining the foreign key intact?
>>
>> I told you: don't do it.
>> Instead, use foreign keys between the partitions.
>>
>
> I am struggling to understand how to maintain those partitions then? As
> because we were planning to use pg_partman for creating and dropping
> partitions automatically without much hassle. So do you mean to say do the
> partition maintenance(create/drop) by creating our own jobs and not to use
> the pg_partman extension for this.
>
> Say for example in our case the parent table has 3-4 child table and all
> are partitioned on same keys/columns, so how we can identify the child
> partitions and then create all foreign keys to the respective parent table
> partitions and attach those partitions to parent table and also make this
> process automated? Appreciate any guidance on this.
>
> Actually, using pg_partman was taking care of everything starting from
> creating partitions with different names and creating respective indexes,
> constraints also with different names for each partitions without us being
> worrying anything about those.
>

This appears to be a major issue , if it's taking minutes for dropping the
parent table partitions and not allowing read operation during that time on
the child table by taking locks on them. We have many databases in Oracle
with such referential key constraints existing on partitioned tables and we
were planning to move those to postgres. I think in Oracle, they were by
default created partition to partition without need to check the whole
table or all the child table partitions while dropping the parent
partitions .


Cron not running

2024-03-28 Thread Lok P
Hello All,
In RDS postgres(version 15.4), we have scheduled partition maintenance
through pg_partman and it's scheduled through pg_cron as below. The same
script has been executed in dev and test environments, and we are seeing
the cron job is scheduled in both environments because we see one record in
table cron.job.

But we are not seeing any entries in cron.job_run_details for the test
database and also partitions are not getting created/dropped as we have
expected on the test database. So it means it's not running in one of the
databases. Wondering why it is so, if we are missing anything? How to debug
this issue ,find the cause if it's failing internally for some reason and
fix it ?

select partman.create_parent(
   p_parent_table := 'TAB1',
   p_control := 'part_col',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 5,
   p_start_partition => '2024-02-01 00:00:00'
);

update partman.part_config set infinite_time_partitions = 'true' ,
retention = '1 months', retention_keep_table='true',
retention_keep_index='true'
where parent_table = 'TAB1';

SELECT cron.schedule('@daily',partman.run_maintenance());

Regards
Lok


Issue with date/timezone conversion function

2024-04-09 Thread Lok P
Hi All,
It's version 15.4 of postgresql database. Every "date/time" data type
attribute gets stored in the database in UTC timezone only. One of the
support persons local timezone is "asia/kolkata" and  that support person
needs to fetch the count of transactions from a table- transaction_tab and
share it with another person/customer who is in the EST timezone, so
basically the transaction has to be shown or displayed the EST timezone.

We are using below three queries for displaying each 15 minutes , hourly
and daily interval transaction counts from the table based on the
create_timestamp column (in which data is stored in UTC timezone in the
database but it has to be displayed to the user in EST timezone).

These tables are INSERT only tables and the data in the create_timestamp
column is populated using the now() function from the application, which
means it will always be incremental, and the historical day transaction
count is going to be the same. However surprisingly the counts are changing
each day when the user fetches the result using the below query. So my
question was , if there is any issue with the way we are fetching the data
and it's making some date/time shift which is why the transaction count
looks to be changing even for the past days data? And also somehow this
conversion function "DATE_TRUNC('hour', create_timestamp AT TIME ZONE
'EST')" is showing time in CST but not EST, why so?

**For fetching 15 minutes interval transaction count

SELECT
DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') +
(EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE 'EST') / 15 * 15) *
INTERVAL '15 minute' AS sample_time_interval,
COUNT(*) AS count_1
from transaction_tab
WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day'
GROUP BY
DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') +
(EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE 'EST') / 15 * 15) *
INTERVAL '15 minute'
ORDER BY sample_time_interval;

**For fetching hourly interval transaction count

SELECT
DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') AS
sample_time_interval,
COUNT(*) AS count_1
from  transaction_tab
WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day'
GROUP BY
DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST')
ORDER BY sample_time_interval;

**For fetching daily interval transaction count

SELECT
DATE_TRUNC('day', create_timestamp AT TIME ZONE 'EST') AS
sample_time_interval,
COUNT(*) AS count_1
from  transaction_tab
WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day'
GROUP BY
DATE_TRUNC('day', create_timestamp AT TIME ZONE 'EST')
ORDER BY sample_time_interval;


Re: Issue with date/timezone conversion function

2024-04-09 Thread Lok P
On Tue, Apr 9, 2024 at 9:26 PM Adrian Klaver 
wrote:

> On 4/9/24 08:43, Lok P wrote:
> > Hi All,
> > It's version 15.4 of postgresql database. Every "date/time" data type
> > attribute gets stored in the database in UTC timezone only. One of the
> > support persons local timezone is "asia/kolkata" and  that support
> > person needs to fetch the count of transactions from a table-
> > transaction_tab and share it with another person/customer who is in the
> > EST timezone, so basically the transaction has to be shown or displayed
> > the EST timezone.
>
> What is the datatype for the create_timestamp?
>
> What does SHOW timezone; return on the server?
>
>
>
Thank you for the quick response.

The column data type for "create_timestamp" is "timestamptz'.
Show timezone from the support users client machine UI showing
"Asia/Calcutta".
Not having access to run "Show timezone" on the server currently, I will
try to get it.
output from pg_setting showing setting as "Asia/Calcutta', reset_val as
"Asia/Calcutta", boot_val as "GMT"


Re: Issue with date/timezone conversion function

2024-04-09 Thread Lok P
On Tue, Apr 9, 2024 at 10:33 PM Tom Lane  wrote:

> Lok P  writes:
> > These tables are INSERT only tables and the data in the create_timestamp
> > column is populated using the now() function from the application, which
> > means it will always be incremental, and the historical day transaction
> > count is going to be the same. However surprisingly the counts are
> changing
> > each day when the user fetches the result using the below query. So my
> > question was , if there is any issue with the way we are fetching the
> data
> > and it's making some date/time shift which is why the transaction count
> > looks to be changing even for the past days data?
>
> Well, your cutoff time "CURRENT_TIMESTAMP - INTERVAL '10 day'" is
> constantly moving, so that'd account for shifts in what's perceived
> to belong to the oldest day.  Maybe you want "CURRENT_DATE - 10"
> instead?
>
> > And also somehow this
> > conversion function "DATE_TRUNC('hour', create_timestamp AT TIME ZONE
> > 'EST')" is showing time in CST but not EST, why so?
>
> 'EST' is going to rotate to UTC-5, but that's probably not what
> you want in the summer.  I'd suggest AT TIME ZONE 'America/New_York'
> or the like.  See
>
>
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
>
> regards, tom lane
>


 Thank you so much. You are correct. The AT TIME ZONE 'America/New_York' is
giving correct EST time conversion.

But I think regarding why it looks to be shifting i.e. the same time
duration appears to be holding a different count of transactions while the
base table is not getting updated/inserted/deleted for its historical
create_timestamps, I suspect the below conversion part.

The task is to count each ~15minutes duration transaction and publish in
ordered fashion i.e. something as below, but the way it's been written
seems wrong. It's an existing script. It first gets the date component with
truncated hour and then adds the time component to it to make it ~15minutes
interval. Can it be written in some simple way?

9-apr-2024 14:00 12340
9-apr-2024 14:15 12312
9-apr-2024 14:30 12323
9-apr-2024 14:45 12304

*DATE_TRUNC('hour', create_timestamp AT TIME ZONE '*America/New_York'*) +*
*(EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE '*America/New_York*') /
15 * 15) * INTERVAL '15 minute'*


Logging statement having any threat?

2024-04-20 Thread Lok P
Hello All,
Its postgres version 15.4 and its RDS, in which our dev team gets the
infrastructure code from another third party team which provides us base
infrastructure code to build a postgres database, in which we will be able
to do change DB parameter values etc whatever is mentioned in the file with
possible values. But surprisingly we don't see log_statement there. Below
was our requirement,

For debugging and evaluating performance we were having pg_stat_statements
but it contains aggregated information about all the query execution. But
in case just want to debug any point in time issues where the selected few
queries were performing bad (may be because of plan change), we were
planning to have the auto_explain extension added and set the
log_min_duration to ~5 seconds, So that, all the queries going above that
time period(5 seconds) will be logged and provide detailed information on
the exact point of bottleneck. But we see the log_statement parameter has
been removed from the base infrastructure script/terraform script given by
the database team here, so that means we will get it as default which is
"NONE", which means no statement(SELECT/DML/DDL etc) can be logged.

Now when we reach out to the infrastructure team , they are saying these
variables(pg_cluster_log_statement,pg_instance_log_statement) were removed
due to potential security threat. So I want to understand from experts here
, how this is really a security threat and if any option to get this
logging enabled (which will help us debug performance issues) at same time
addressing the threat too?

Regards
Lok


Re: Logging statement having any threat?

2024-04-21 Thread Lok P
On Sat, Apr 20, 2024 at 10:02 PM Adrian Klaver 
wrote:

>
> Have you tried?:
>
>
> https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
>
> "
> log_statement (enum)
>
><...>
>
> The default is none. Only superusers and users with the appropriate SET
> privilege can change this setting.
> "
>
> Or
>
>
> https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET
>
> set_config ( setting_name text, new_value text, is_local boolean ) → text
>
>
> >
> > Now when we reach out to the infrastructure team , they are saying these
> > variables(pg_cluster_log_statement,pg_instance_log_statement) were
>
> Where are those variables coming from? I can not find them in RDS or
> Terraform docs.
>
>
 Thank You Adrian.

Actually I was trying to understand if the auto_explain can only work and
help us see the slow sql statements in the log, only after we set the
"log_statement" parameter to non default values (like all, mod, ddl)?

And what is the exact threat with the logging these queries , and i think
,I got the point as you mentioned , having access to database  itself is
making someone to see the object details, however do you agree that in case
of RDS logs are available through different mediums like cloud watch, data
dog agent etc , so that may pose additional threats as because , may be
some person doesn't have access to database directly but still having
permission to see the logs, so the appropriate access control need to put
in place?

And additionally I was trying to execute the "SELECT
set_config('log_statement', 'all', true);" but it says "*permission denied
to set parameter "log_statement*".".So might be it needs a higher
privileged user to run it.

To answer your question on the variable those we have on the
terraform module, the terraform module is customized by the database infra
team so that might be why we are seeing those there which may not be
exactly the same as its showing in RDS docs for postgres.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_LogAccess.Concepts.PostgreSQL.html


How you make efficient design for CDC and book marking

2024-04-25 Thread Lok P
Hello,
My understanding is that the replication tools normally rely on the
database transaction logs to find the CDC/delta
changes(Insert/Update/Delete) for tables and then move those delta changes
to the target system/databases. Whatever may be the source database (It
might be open source postgres or aws RDS). And never done though, but i am
assuming , manually scanning the DB logs must not be easy ones and also
might not be given access to DB logs because of security reasons too. Hope
my understanding is correct here.

Thus, in absence of such replication tool(may be because of the additional
cost associated etc) if someone wants to find the delta changes
(insert/update/delete) in a database as efficiently as possible and move
those to the target database in a continuous data streaming setup, I can
only think of below option

i.e maintaining audit columns like create_timestamp/update_timestamp
columns in every source table so that they can be utilized to get the CDC
for Insert and Update statements and also for bookmarking. But to find the
delta for the deletes , there is not much option but to have row level
triggers created on the base table which will populate another audit table
with the deleted rows, and this is going to crawl if we get a lot of
deletes(in millions) on the source tables.

Want to know from experts, if there exists any other way to have these
manual CDC and book marking more efficient for such continuous delta data
movement scenarios?

Regards
Lok


Re: Question on roles and privileges

2024-05-09 Thread Lok P
For the initial installation the extensions may need superuser privileges.

On Fri, May 10, 2024 at 10:04 AM yudhi s 
wrote:

> Hello All,
> We want to make sure to keep minimal privileges for the users based on
> their roles and responsibility. We have one user group who will be working
> on analyzing/debugging into performance issues in the databases. Basically
> this group will be operating on extensions like apg_plan_management,
> pg_hint_plan, auto_explain, plprofiler, pg_repack. So these extensions will
> already be installed for the group, but they will just need to use those
> appropriately. For example pg_hint_plan will not need any write privilege
> because the user just has to put the hint in the query and run it to see
> any performance variation.
>
> So like that , what kind of minimal privileges will each of these
> extensions need to make them work for this performance group? Basically if
> any of these will need write privilege or all works can be performed using
> Readonly roles/privilege only?
>
> And I understand pg_monitor role wraps up most of the key read only
> privileges within it to work on performance issues and also its a readonly
> privilege only. So I wanted to know from experts here , if it's true and
> pg_monitor role will suffice for all the above work?
>
> Regards
> Yudhi
>


Creating big indexes

2024-06-08 Thread Lok P
Hello,
We have a few tables having size ~5TB and are partitioned on a timestamp
column. They have ~90 partitions in them and are storing 90 days of data.
We want to create a couple of indexes on those tables. They are getting the
incoming transactions(mainly inserts) 24/7 , which are mostly happening on
the current day/live partition. Its RDS postgres version 15.4. So in this
situation

Should we go with below i.e one time create index command on the table..

CREATE INDEX CONCURRENTLY idx1 ON tab(column_name);
Or
create index on individual partitions from different sessions, say for
example create indexes on 30 partitions each from three different sessions
so as to finish all the 90 partitions faster?
CREATE INDEX CONCURRENTLY idx1 ON tab_part1(column_name);
CREATE INDEX CONCURRENTLY idx1 ON tab_part2(column_name);
.
.

Basically I have three questions:
1)If we can do this index creation activity online without impacting the
incoming transactions or do we have to take down time for this activity?
2)If we can't do it online then , what is the fastest method to do this
index creation activity ?
3)Should we change the DB parameters in a certain way to make the process
faster? We have currently set below parameters

max_parallel_workers-16
max_parallel_maintenance_workers-2
maintenance_work_mem- 4GB

Regards
Lok


Re: How to create efficient index in this scenario?

2024-06-08 Thread Lok P
On Sat, Jun 8, 2024 at 7:03 PM veem v  wrote:

> Hi ,
> It's postgres version 15.4. A table is daily range partitioned on a column
> transaction_timestamp. It has a unique identifier which is the ideal for
> primary key (say transaction_id) , however as there is a limitation in
> which we have to include the partition key as part of the primary key, so
> it has to be a composite index. Either it has to be
> (transaction_id,transaction_timestamp) or ( transaction_timestamp,
> transaction_id). But which one should we go for, if both of the columns get
> used in all the queries?
>
> We will always be using transaction_timestamp as mostly a range predicate
> filter/join in the query and the transaction_id will be mostly used as a
> join condition/direct filter in the queries. So we were wondering, which
> column should we be using  as a leading column in this index?
>
> There is a blog below (which is for oracle), showing how the index should
> be chosen and it states ,  "*Stick the columns you do range scans on last
> in the index, filters that get equality predicates should come first.* ",
> and in that case we should have the PK created as in the order
> (transaction_id,transaction_timestamp). It's because making the range
> predicate as a leading column won't help use that as an access predicate
> but as an filter predicate thus will read more blocks and thus more IO.
> Does this hold true in postgres too?
>
> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/
>

I believe the analogy holds true here in postgres too and the index in this
case should be on (transaction_id, transaction_timestamp).


>
>
> Additionally there is another scenario in which we have the requirement to
> have another timestamp column (say create_timestamp) to be added as part of
> the primary key along with transaction_id and we are going to query this
> table frequently by the column create_timestamp as a range predicate. And
> ofcourse we will also have the range predicate filter on partition key
> "transaction_timestamp". But we may or may not have join/filter on column
> transaction_id, so in this scenario we should go for
>  (create_timestamp,transaction_id,transaction_timestamp). because
> "transaction_timestamp" is set as partition key , so putting it last
> doesn't harm us. Will this be the correct order or any other index order is
> appropriate?
>
>
>
In this case , the index should be on (
create_timestamp,transaction_id,transaction_timestamp), considering the
fact that you will always have queries with "create_timestamp" as predicate
and may not have transaction_id in the query predicate.


Re: Creating big indexes

2024-06-08 Thread Lok P
On Sun, Jun 9, 2024 at 10:36 AM sud  wrote:

>
>  You can first create the index on the table using the "On ONLY"keyword,
> something as below.
>
> CREATE INDEX idx ON ONLY tab(col1);
>
> Then create indexes on each partition in "concurrently" from multiple
> sessions in chunks.
>
> CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1);
> CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1);
>
> After this step finishes the table level index which was created in the
> first step will be in valid state automatically.
>
>
Thank you so much.
Should we also tweak the parameters related to the parallelism and memory
as I mentioned in the first post?


Re: Creating big indexes

2024-06-09 Thread Lok P
On Sun, Jun 9, 2024 at 10:39 AM Lok P  wrote:

>
>
> On Sun, Jun 9, 2024 at 10:36 AM sud  wrote:
>
>>
>>  You can first create the index on the table using the "On ONLY"keyword,
>> something as below.
>>
>> CREATE INDEX idx ON ONLY tab(col1);
>>
>> Then create indexes on each partition in "concurrently" from multiple
>> sessions in chunks.
>>
>> CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1);
>> CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1);
>>
>> After this step finishes the table level index which was created in the
>> first step will be in valid state automatically.
>>
>>
> Thank you so much.
> Should we also tweak the parameters related to the parallelism and memory
> as I mentioned in the first post?
>

Additionally ,is it also possible to drop the indexes also from the big
partition table efficiently? To avoid the "transaction id wrap around" or
"table bloat" when the index drop runs for longer duration?


Design strategy for table with many attributes

2024-07-04 Thread Lok P
Hello,
In one of the applications we are getting transactions in messages/events
format and also in files and then they are getting parsed and stored into
the relational database. The number of attributes/columns each transaction
has is ~900+. Logically they are part of one single transaction and should
be stored in one table as one single row. There will be ~500million such
transactions each day coming into the system. And there will be approx ~10K
peak write TPS and 5K read TPS in target state. This system has a postgres
database as a "source of truth" or OLTP store. And then data moves to
snowflakes for the olap store.

Initially when the system was designed the number of attributes per
transaction was <100 but slowly the business wants to keep/persist other
attributes too in the current system and the number of columns keep growing.

However, as worked with some database systems , we get few suggestions from
DBA's to not have many columns in a single table. For example in oracle
they say not to go beyond ~255 columns as then row chaining and row
migration type of things are going to hunt us. Also we are afraid
concurrent DMLS on the table may cause this as a contention point. So I
wanted to understand , in such a situation what would be the best design
approach we should use irrespective of databases? Or say, what is the
maximum number of columns per table we should restrict? Should we break the
single transaction into multiple tables like one main table and other
addenda tables with the same primary key to join and fetch the results
wherever necessary?

Regards
Lok


Re: Design strategy for table with many attributes

2024-07-04 Thread Lok P
On Fri, Jul 5, 2024 at 1:26 AM David G. Johnston 
wrote:

> On Thu, Jul 4, 2024 at 12:38 PM Lok P  wrote:
>
>>
>> Should we break the single transaction into multiple tables like one main
>> table and other addenda tables with the same primary key to join and fetch
>> the results wherever necessary?
>>
>>
> I would say yes.  Find a way to logically group sets of columns together
> and place those groups into separate tables.  I'd also be looking for cases
> where multiple columns really should be multiple rows.  This is not
> uncommon.
>
> David J.
>
>
Thank you David.

As you said, to logically break this into multiple tables so i believe it
means it should be such that there will be no need to query multiple tables
and join them most of the time for fetching the results. It should just
fetch the results from one table at any point in time.

But do you also suggest keeping those table pieces related to each other
through the same primary key ? Won't there be a problem when we load the
data like say for example , in normal scenario the data load will be to one
table but when we break it to multiple tables it will happen to all the
individual pieces, won't that cause additional burden to the data load?

Also I understand the technical limitation of the max number of columns per
table is ~1600. But should you advise to restrict/stop us to some low
number long before reaching that limit , such that we will not face any
anomalies when we grow in future. And if we should maintain any specific
order in the columns from start to end column in the specific table?


Re: Design strategy for table with many attributes

2024-07-04 Thread Lok P
On Fri, Jul 5, 2024 at 10:45 AM Guyren Howe  wrote:

> On Jul 4, 2024, at 22:07, Lok P  wrote:
>
> If you stick to the principle of grouping columns in a table when you use
> those columns together, you should be good.
>
> Note that you might want to split up the “parent” table if that naturally
> groups its columns together for certain uses. In that case, you could have
> the same pk on all the 1:1 tables you then have. In that case, the pk for
> each of those tables is also the fk.
>


Thank you.

When you said below,

*"Note that you might want to split up the “parent” table if that naturally
groups its columns together for certain uses. In that case, you could have
the same pk on all the 1:1 tables you then have. In that case, the pk for
each of those tables is also the fk."*
Do you mean having a real FK created through DDL and maintaining it or just
assume it and no need to define it for all the pieces/tables. Only keep the
same PK across all the pieces and as we know these are related to the same
transaction and are logically related?


Re: Design strategy for table with many attributes

2024-07-05 Thread Lok P
Some folks in the team suggested to have key business attributes or say
frequently queried attributes in individual columns and others in a column
in same table clubbed in JSON format. Is that advisable or any issues can
occur with this approach? Also not sure how effectively postgres processes
JSON (both read and write perspective) as compared to normal column in a
oltp environment. Please advise.

As David suggested it breaks if a row exceeds the 8k limit I. E a single
page size , will that still holds true if we have a column with JSON in it?

On Fri, 5 Jul, 2024, 12:04 pm Guyren Howe,  wrote:

> On Jul 4, 2024, at 23:28, Lok P  wrote:
>
>
>
> *"Note that you might want to split up the “parent” table if that
> naturally groups its columns together for certain uses. In that case, you
> could have the same pk on all the 1:1 tables you then have. In that case,
> the pk for each of those tables is also the fk."*
> Do you mean having a real FK created through DDL and maintaining it or
> just assume it and no need to define it for all the pieces/tables. Only
> keep the same PK across all the pieces and as we know these are related to
> the same transaction and are logically related?
>
>
> A primary key *names something*. Often it’s a kind of platonic
> representation of a real thing — say, a person.
>
> I might use a person’s login columns in some functions, and the person’s
> name, birth date, etc in other functions.
>
> Rather than have one table, I should split this into two, but use the same
> primary key (I would either name both id or both, say, person_id,
> irrespective of the name of the table, so it’s clear you’re doing this).
>
> You can just do a join on the mutual primary keys as you’d expect. In
> fact, if you name them the same, you can just use NATURAL JOIN.
>
> So you’d have person_details and person_login tables, and have a person_id
> pk for both.
>


Re: Design strategy for table with many attributes

2024-07-05 Thread Lok P
On Fri, 5 Jul, 2024, 1:44 pm David Rowley,  wrote:

> On Fri, 5 Jul 2024 at 19:53, Lok P  wrote:
> > As David suggested it breaks if a row exceeds the 8k limit I. E a single
> page size , will that still holds true if we have a column with JSON in it?
>
> You wouldn't be at risk of the same tuple length problem if you
> reduced the column count and stored the additional details in JSON.
> Each varlena column is either stored in the tuple inline, or toasted
> and stored out of line. Out of line values need an 18-byte pointer to
> the toasted data. That pointer contributes to the tuple length.
>
>
> David
>

Got it. Thank you very much.

So there would be performance overhead with JSON and we need to validate
that carefully, if at all going in that direction.

However out of curiosity, if the roasted/compressed component or column
which is JSON itself goes beyond 8k post compression, will it break then?


Partition boundary messed up

2024-07-24 Thread Lok P
Hello All,
We normally operate on UTC timezone so we normally create partitions in UTC
timezone so that each day partition starts from today's midnight UTC to
next day's midnight UTC. The script looks something like below. And also
that way reference partition tables are also created in a similar way. Say
for example the 29th august partition in parent partition reference to 29th
august child partition as FK. This was working fine even for partition
maintenance(mainly while dropping historical partitions) too without any
issue.

For one of the newly introduced partition tables, by mistake somebody
created the initial partition definition without setting the timezone to
UTC and all the partitions created are in different time zones. And as this
table refers to another partition table (which is the parent and have
partitions created in UTC timezone) the rows are spanning across two
partitions and it's breaking the partition maintenance process while we try
to drop the historical partition.

Now the issue is that the newly introduced table already has billions of
rows pumped into it spanning across 40 partitions. So is there an easy way
to just alter the partition boundary to make it UTC midnight to midnight
range?
or
The only way is to create a new table from scratch with the correct
partition boundary in UTC timezone and then move the data and then create
foreign key on that(which I believe is going to take a lot of time too)?

Another thing we noticed, it shows initial partitions having boundaries in
NON UTC (which we understand because of the missing timezone syntax) but
then suddenly the subsequent partitions are getting created UTC too, not
sure how it happened. And I believe it will create issues while rows come
into the database which falls in between these ranges? Wondering if there
is any easy way to correct this mess now? Note- This is postgres version
15+.

Below partition creation script we use:-
set timesozne='UTC';
SELECT partman.create_parent(
   p_parent_table := 'schema1.tab1',
   p_control := 'TRAN_DATE',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 90,
   p_start_partition => '2024-02-15 00:00:00'
);
UPDATE partman.part_config SET infinite_time_partitions = 'true' ,
premake=20 WHERE parent_table = 'schema1.tab1';
CALL partman.run_maintenance_proc();

Below details i fetched from  pg_class for the table which is messed up:-

Partition_name   Partition Expressions
TAB1_p2024_08_29 FOR VALUES FROM ('2024-08-29 00:00:00+05:30') TO
('2024-08-30 00:00:00+05:30')
TAB1_p2024_08_30 FOR VALUES FROM ('2024-08-30 00:00:00+05:30') TO
('2024-08-31 00:00:00+05:30')
TAB1_p2024_08_31 FOR VALUES FROM ('2024-08-31 00:00:00+05:30') TO
('2024-09-01 00:00:00+05:30')
TAB1_p2024_09_01 FOR VALUES FROM ('2024-09-01 00:00:00+05:30') TO
('2024-09-02 00:00:00+05:30')

*TAB1_p2024_09_02 FOR VALUES FROM ('2024-09-02 00:00:00+05:30') TO
('2024-09-03 00:00:00+05:30')*
*TAB1_p2024_09_03 FOR VALUES FROM ('2024-09-03 05:30:00+05:30') TO
('2024-09-04 05:30:00+05:30')*TAB1_p2024_09_04 FOR VALUES FROM ('2024-09-04
05:30:00+05:30') TO ('2024-09-05 05:30:00+05:30')
TAB1_p2024_09_05 FOR VALUES FROM ('2024-09-05 05:30:00+05:30') TO
('2024-09-06 05:30:00+05:30')
TAB1_p2024_09_06 FOR VALUES FROM ('2024-09-06 05:30:00+05:30') TO
('2024-09-07 05:30:00+05:30')

Regards
Lok


Re: Partition boundary messed up

2024-07-27 Thread Lok P
Thank you. I understand that is going to take a lot of time as we
already have billions of rows in the main table spread across 40+
partitions.
Also this table is child to another parent partition table and so,  it will
take a lot of time to validate the FK constraint back for the new table.
Is there a less disruptive way(zero to minimal downtime) possible to fix
this mess?

On Sat, Jul 27, 2024 at 2:08 PM yudhi s  wrote:

> If you were thinking of some ALTER command which will just alter the
> boundaries of the incorrect partition and make it correct ,  I don't think
> there exists any such. You may have to create a whole new table and run the
> partman.create_parent block with the UTC time zone set so that all the
> subsequent partitions will be created with correct boundaries and then move
> the data into it from the existing table.
>
>
>


Column type modification in big tables

2024-08-07 Thread Lok P
Hello all,
We have a postgres table which is a range partitions on a timestamp column
having total size ~3TB holding a total ~5billion rows spanning across ~150
daily partitions and there are ~140+columns in the table. Also this table
is a child to another partition table. And we have partition creation
handled through pg_partman extension on this table.

We have a requirement of modifying the existing column lengths as below .
So doing it directly through a single alter command will probably scan and
rewrite the whole table which may take hours of run time.

So trying to understand from experts what is the best possible way to
tackle such changes in postgres database? And if any side effects we may
see considering this table being child to another one and also dependency
with pg_partman extension.

two of the columns from varchar(20) to numeric(3)
one of the columns from varchar(20) to varchar(2)
one of the columns from Number(10,2) to Numeric(8,2)

Regards
Lok


Re: Column type modification in big tables

2024-08-07 Thread Lok P
On Wed, Aug 7, 2024 at 4:51 PM sud  wrote:

>
>
> Others may correct but i think, If you don't have the FK defined on these
> columns you can do below.
>
>
> --Alter table add column which will be very fast within seconds as it will
> just add it to the data dictionary.
>
> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
> varchar2(3);
>
>
> *-- Back populate the data partition wise and commit, if it's really
> needed*
>
> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> .
>
>
> *--Alter table drop old columns which will be very fast within seconds as
> it will just drop it from the data dictionary.*
> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;
>



Thank you so much.

I understand this will be the fastest possible way to achieve the column
modification.

But talking about the dropped column which will be sitting in the table and
consuming storage space, Is it fine to leave as is or auto vacuum will
remove the column values behind the scene and also anyway , once those
partitions will be purged they will be by default purged. Is this
understanding correct?

 And also will this have any impact on the partition maintenance which is
currently done by pg_partman as because the template table is now different
internally(not from outside though). Will it cause conflict because of
those dropped columns from the main table?


Re: Column type modification in big tables

2024-08-08 Thread Lok P
On Thu, Aug 8, 2024 at 1:06 AM sud  wrote:

>
>
> On Wed, Aug 7, 2024 at 5:00 PM Lok P  wrote:
>
>>
>>
>> On Wed, Aug 7, 2024 at 4:51 PM sud  wrote:
>>
>>>
>>>
>>> Others may correct but i think, If you don't have the FK defined on
>>> these columns you can do below.
>>>
>>>
>>> --Alter table add column which will be very fast within seconds as it
>>> will just add it to the data dictionary.
>>>
>>> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
>>> varchar2(3);
>>>
>>>
>>> *-- Back populate the data partition wise and commit, if it's really
>>> needed*
>>>
>>> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>>> commit;
>>> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>>> commit;
>>> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>>> commit;
>>> .
>>>
>>>
>>> *--Alter table drop old columns which will be very fast within seconds
>>> as it will just drop it from the data dictionary.*
>>> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;
>>>
>>
>>
>>
>> Thank you so much.
>>
>> I understand this will be the fastest possible way to achieve the column
>> modification.
>>
>> But talking about the dropped column which will be sitting in the table
>> and consuming storage space, Is it fine to leave as is or auto vacuum will
>> remove the column values behind the scene and also anyway , once those
>> partitions will be purged they will be by default purged. Is this
>> understanding correct?
>>
>>  And also will this have any impact on the partition maintenance which is
>> currently done by pg_partman as because the template table is now different
>> internally(not from outside though). Will it cause conflict because of
>> those dropped columns from the main table?
>>
>
> I think leaving the table as is after the dropping column will be fine for
> you because your regular partition maintenance/drop will slowly purge the
> historical partitions and eventually they will be removed. But if you
> update those new columns with the old column values, then autovacuum should
> also take care of removing the rows with older column values (which are
> dead actually) .
>
> Not sure if pg_partman will cause any issue ,as because the table now has
> the column data type/length changed. Others may confirm.
>

Thank you so much.

Can anybody suggest any other possible way here. As, we also need to have
the existing values be updated to the new column value here using update
command (even if it will update one partition at a time). And as I see we
have almost all the values in the column not null, which means it will
update almost ~5billion rows across all the partitions. So my question is ,
is there any parameter(like work_mem,maintenance_work_mem etc) which we can
set to make this update faster?
 or any other way to get this column altered apart from this method?

>


Re: Column type modification in big tables

2024-08-09 Thread Lok P
On Fri, Aug 9, 2024 at 2:06 AM Greg Sabino Mullane 
wrote:

> On Thu, Aug 8, 2024 at 2:39 PM Lok P  wrote:
>
>> Can anybody suggest any other possible way here.
>>
>
> Sure - how about not changing the column type at all?
>
> > one of the columns from varchar(20) to varchar(2)
>
> ALTER TABLE foobar ADD CONSTRAINT twocharplease CHECK (length(mycol) <= 2)
> NOT VALID;
>
> > one of the columns from Number(10,2) to Numeric(8,2)
>
> ALTER TABLE foobar ADD CONSTRAINT eightprecision CHECK (mycol <= 10^8) NOT
> VALID;
>
> > two of the columns from varchar(20) to numeric(3)
>
> This one is trickier, as we don't know the contents, nor why it is going
> to numeric(3) - not a terribly useful data type, but let's roll with it and
> assume the stuff in the varchar is a number of some sort, and that we don't
> allow nulls:
>
> ALTER TABLE foobar ADD CONSTRAINT onekorless CHECK (mycol::numeric(3) is
> not null) NOT VALID;
>
> You probably want to check on the validity of the existing rows: see the
> docs on VALIDATE CONSTRAINT here:
>
> https://www.postgresql.org/docs/current/sql-altertable.html
>
>
>
Thank you so much. Will definitely try to evaluate this approach. The Only
concern I have is , as this data is moving downstream with exactly the same
data type and length , so will it cause the downstream code to break while
using this column in the join or filter criteria. Also I believe the
optimizer won't be able to utilize this information while preparing the
execution plan.

 Another thing , correct me if wrong, My understanding is  , if we want to
run the "validate constraint" command after running this "check constraint
with not valid" command, this will do a full table scan across all the
partitions , but it's still beneficial as compared to updating the columns
values for each rows. Correct me if I'm wrong.


Re: Column type modification in big tables

2024-08-09 Thread Lok P
On Fri, Aug 9, 2024 at 9:19 PM Greg Sabino Mullane 
wrote:

> On Fri, Aug 9, 2024 at 6:39 AM Lok P  wrote:
>
>> Thank you so much. Will definitely try to evaluate this approach. The
>> Only concern I have is , as this data is moving downstream with exactly the
>> same data type and length , so will it cause the downstream code to break
>> while using this column in the join or filter criteria. Also I believe the
>> optimizer won't be able to utilize this information while preparing the
>> execution plan.
>>
>
> Yes, this is not as ideal as rewriting the table, but you asked for
> another approaches :) As to the impact of your downstream stuff, I think
> you have to try and see. Not clear what you mean by the optimizer, it's not
> going to really care about numeric(10) versus numeric(8) or varchar(20) vs
> varchar(2). It's possible the varchar -> numeric could cause issues, but
> without real-world queries and data we cannot say.
>
>
>>  Another thing , correct me if wrong, My understanding is  , if we want
>> to run the "validate constraint" command after running this "check
>> constraint with not valid" command, this will do a full table scan across
>> all the partitions , but it's still beneficial as compared to updating the
>> columns values for each rows. Correct me if I'm wrong.
>>
>
> Yes, it needs to scan the entire table, but it's a lightweight lock, won't
> block concurrent access, will not need to detoast, and makes no table or
> index updates. Versus an entire table rewrite which will do heavy locking,
> take up tons of I/O, update all the indexes, and generate quite a lot of
> WAL.
>
>
Thank you so much Greg.

Considering the option, if we are able to get large down time to get this
activity done.

Some teammates suggested altering the column with "USING" Clause. I am not
really able to understand the difference,  also when i tested on a simple
table, it seems the "USING" clause takes more time as compared to normal
ALTER. But again I don't see any way to see the progress and estimated
completion time. Can you share your thoughts on this?

ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING
mycol::NUMERIC(3);
VS
ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;

*
Another thing also comes to my mind whether we should just create a new
partition table(say new_part_table) from scratch from the DDL of the
existing table(say old_part_table) and then load the data into it using
command (insert into new_part_table.. select..from old_part_table). Then
create indexes and constraints etc, something as below.

Will this approach be faster/better as compared to the simple "alter table
alter column approach" as above, considering we will have 4-6 hours of
downtime for altering three different columns on this ~5TB table?


*-- Steps*
Create table exactly same as existing partition table but with the modified
column types/lengths.

drop indexes ;  (Except PK and FK indexes may be..)
drop constraints;

insert into new_part_table (...) select (...) from old_part_table;

create indexes concurrently ;
create constraints; (But this table is also a child table to another
partition table, so creating the foreign key may be resource consuming here
too).

drop the old_part_table;
rename the new_part_table to old_part_table;
rename all the partitions;

VACUUM  old_part_table  ;
ANALYZE  old_part_table  ;


Re: Column type modification in big tables

2024-08-10 Thread Lok P
On Sat, Aug 10, 2024 at 5:47 PM sud  wrote:

>
>
> My 2cents.
> If you have enough time then from a simplicity point of view, your single
> line alter command may look good, but how are you going to see  the amount
> of progress it has made so far and how much time it's going to take to
> finish. And you got ~6hrs of down time but if it fails at 5th hour then you
> will be in a bad position.
>
>
>
Does it mean that , if we get enough downtime then , we should rather go
with the option of recreating the table from scratch and populating the
data from the existing table and then rename it back? It does look more
complicated considering many steps like creating indexes, constraints back
and renaming it and then running vacuum and analyze etc.

Can someone through some light , in case we get 5-6hrs downtime for this
change , then what method should we choose for this Alter operation?


Re: Column type modification in big tables

2024-08-13 Thread Lok P
On Tue, Aug 13, 2024 at 7:39 PM Greg Sabino Mullane 
wrote:

> On Sat, Aug 10, 2024 at 5:06 PM Lok P  wrote:
>
>
>> Can someone through some light , in case we get 5-6hrs downtime for this
>> change , then what method should we choose for this Alter operation?
>>
>
> We can't really answer that. Only you know what resources you have, what
> risk/reward you are willing to handle, and how long things may take. For
> that latter item, your best bet is to try this out on the same/similar
> hardware and see how long it takes. Do a smaller table and extrapolate if
> you need to. Or promote one of your replicas offline and modify that. I've
> given you a low-risk / medium-reward option with check constraints, but for
> the ALTER TABLE options you really need to try it and see (on non-prod).
>
>
*"Do a smaller table and extrapolate if you need to. Or promote one of your
replicas offline and modify that. I've given you a low-risk / medium-reward
option with check constraints, but for the ALTER TABLE options you really
need to try it and see (on non-prod)."*

Is there any possible method(maybe by looking into the data dictionary
tables/views etc) to see the progress of the Alter statement by which we
can estimate the expected completion time of the "Alter" command? I
understand pg_stat_activity doesn't show any completion percentage of a
statement, but wondering if by any other possible way we can estimate the
amount of time it will take in prod for the completion of the ALTER command.


Re: Column type modification in big tables

2024-08-15 Thread Lok P
On Tue, Aug 13, 2024 at 7:39 PM Greg Sabino Mullane 
wrote:

> On Sat, Aug 10, 2024 at 5:06 PM Lok P  wrote:
>
>
>> Can someone through some light , in case we get 5-6hrs downtime for this
>> change , then what method should we choose for this Alter operation?
>>
>
> We can't really answer that. Only you know what resources you have, what
> risk/reward you are willing to handle, and how long things may take. For
> that latter item, your best bet is to try this out on the same/similar
> hardware and see how long it takes.* Do a smaller table and extrapolate
> if you need to. *
>

Hello Greg,

In terms of testing on sample data and extrapolating, as i picked the avg
partition sizeof the table (which is ~20GB) and i created a non partitioned
table with exactly same columns and populated with similar data and also
created same set of indexes on it and the underlying hardware is exactly
same as its on production. I am seeing it's taking ~5minutes to alter all
the four columns on this table. So we have ~90 partitions in production
with data in them and the other few are future partitions and are blank.
(Note- I executed the alter with "work_mem=4GB, maintenance_work_mem=30gb,
max_parallel_worker_per_gather=8, max_parallel_maintenance_worker =16" )

So considering the above figures , can i safely assume it will take
~90*5minutes= ~7.5hours in production and thus that many hours of downtime
needed for this alter OR do we need to consider any other factors or
activity here?


Re: Column type modification in big tables

2024-08-15 Thread Lok P
On Thu, Aug 15, 2024 at 9:18 PM Alban Hertroys  wrote:

>
> > On 15 Aug 2024, at 14:15, Lok P  wrote:
>
> (…)
> Are all those partitions critical, or only a relative few?
>
> If that’s the case, you could:
> 1) detach the non-critical partitions
> 2) take the system down for maintenance
> 3) update the critical partitions
> 4) take the system up again
> 5) update the non-critical partitions
> 6) re-attach the non-critical partitions
>
> That could shave a significant amount of time off your down-time. I would
> script the detach and re-attach processes first, to save some extra.
>
>
Thank you so much.

The partition table which we are planning to apply the ALTER script is a
child table to another big partition table. And we have foreign key
defined on table level but not partition to partition. So will detaching
the partitions and then altering column of each detached partition and then
re-attaching will revalidate the foreign key again? If that is the case
then the re-attaching partition step might consume a lot of time. Is my
understanding correct here?


Re: Column type modification in big tables

2024-08-15 Thread Lok P
On Fri, Aug 16, 2024 at 2:04 AM Lok P  wrote:

>
> On Thu, Aug 15, 2024 at 9:18 PM Alban Hertroys  wrote:
>
>>
>> > On 15 Aug 2024, at 14:15, Lok P  wrote:
>>
>> (…)
>> Are all those partitions critical, or only a relative few?
>>
>> If that’s the case, you could:
>> 1) detach the non-critical partitions
>> 2) take the system down for maintenance
>> 3) update the critical partitions
>> 4) take the system up again
>> 5) update the non-critical partitions
>> 6) re-attach the non-critical partitions
>>
>> That could shave a significant amount of time off your down-time. I would
>> script the detach and re-attach processes first, to save some extra.
>>
>>
> Thank you so much.
>
> The partition table which we are planning to apply the ALTER script is a
> child table to another big partition table. And we have foreign key
> defined on table level but not partition to partition. So will detaching
> the partitions and then altering column of each detached partition and then
> re-attaching will revalidate the foreign key again? If that is the case
> then the re-attaching partition step might consume a lot of time. Is my
> understanding correct here?
>

Additionally , if we are okay with the 7.5hrs of down time , is my
calculation/extrapolation of total time consumption based on a sample
table,  for direct alter, accurate? Because, in that case , I was thinking
it's less complex and also less error prone to just do it in a single alter
command rather than going for multiple steps of detach, alter, attach
partition.


Re: Column type modification in big tables

2024-08-22 Thread Lok P
On Thu, 15 Aug, 2024, 9:18 pm Alban Hertroys,  wrote:

>
> > On 15 Aug 2024, at 14:15, Lok P  wrote:
>
> (…)
>
> > Hello Greg,
> >
> > In terms of testing on sample data and extrapolating, as i picked the
> avg partition sizeof the table (which is ~20GB) and i created a non
> partitioned table with exactly same columns and populated with similar data
> and also created same set of indexes on it and the underlying hardware is
> exactly same as its on production. I am seeing it's taking ~5minutes to
> alter all the four columns on this table. So we have ~90 partitions in
> production with data in them and the other few are future partitions and
> are blank. (Note- I executed the alter with "work_mem=4GB,
> maintenance_work_mem=30gb, max_parallel_worker_per_gather=8,
> max_parallel_maintenance_worker =16" )
> >
> > So considering the above figures , can i safely assume it will take
> ~90*5minutes= ~7.5hours in production and thus that many hours of downtime
> needed for this alter OR do we need to consider any other factors or
> activity here?
>
> Are all those partitions critical, or only a relative few?
>
> If that’s the case, you could:
> 1) detach the non-critical partitions
> 2) take the system down for maintenance
> 3) update the critical partitions
> 4) take the system up again
> 5) update the non-critical partitions
> 6) re-attach the non-critical partitions
>
> That could shave a significant amount of time off your down-time. I would
> script the detach and re-attach processes first, to save some extra.
>
> Admittedly, I haven’t actually tried that procedure, but I see no reason
> why it wouldn’t work.
>
> Apart perhaps, from inserts happening that should have gone to some of
> those detached partitions. Maybe those could be sent to a ‘default’
> partition that gets detached at step 7, after which you can insert+select
> those from the default into the appropriate partitions?
>
> But you were going to test that first anyway, obviously.
>

We were checking this strategy , but what we found is while attaching any
of the historical partition back to the child table , if there runs any
existing inserts on the other live partitions of the same child table that
attach keeps on hang state. Also during this period the parent table (which
is also partitioned) takes an exclusive lock on itself!!

Even detaching any partition  "concurrently" also waits for any inserts to
finish, even those are on other partitions. Is this behavior expected?


Faster data load

2024-09-05 Thread Lok P
Hi,

We are having a requirement to create approx 50 billion rows in a partition
table(~1 billion rows per partition, 200+gb size daily partitions) for a
performance test. We are currently using ' insert into  select.. From  or ;' method . We have dropped all indexes and constraints
First and then doing the load. Still it's taking 2-3 hours to populate one
partition. Is there a faster way to achieve this?

Few teammate suggesting to use copy command and use file load instead,
which will be faster. So I wanted to understand, how different things it
does behind the scenes as compared to insert as select command? As because
it only deals with sql engine only.

Additionally, when we were trying to create indexes post data load on one
partition, it took 30+ minutes. Any possible way to make it faster?

Is there any way to drive the above things in parallel by utilizing full
database resources?

It's postgres 15.4

Regards
Lok


Re: Faster data load

2024-09-05 Thread Lok P
On Fri, 6 Sept, 2024, 9:20 am Muhammad Usman Khan, 
wrote:

> Hi,
>
> You can use pg_partman. If your table is partitioned, you can manage
> partitions in parallel by distributing the load across partitions
> concurrently. Or you can use citus. It can be an excellent solution,
> especially for handling large data volumes and parallelizing data operations
>


Thank you.
The tables are partitioned. Also during index creation we are trying to do
it multiple partitions at same time from multiple sessions.But seeing out
of memory error in 5th or 6th session. And even each sessions taking 30mins
per partitions for index creation. Attach index partitions happening in
seconds though.

>
>


Re: How batch processing works

2024-10-04 Thread Lok P
On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer  wrote:

>
> > Thank you so much.
> > I was expecting method-3(batch insert) to be the fastest or atleast as
> you said
> > perform with similar speed as method-2 (row by row insert with batch
> commit)
>
> Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest.
> I guess I meant to write "method2 takes about twice as long as method3"
> or something like that.
>
>
As in case of batch insert below is the fastest one as it inserts
multiple rows in one statement. Similarly I understand, Delete can be
batched as below. However, can you suggest how an Update can be batched in
a simple/generic fashion in JDBC for an input data stream with multiple
input values. As because for an update if we write as below , it will just
do one row update at a time?

Update  SET column1=?,   column2=?, column3=? where
=? ;


INSERT INTO  VALUES  (1, 'a'), (2, 'a'),(3,'a');
Delete from  where column_name  in (,
,...);


Re: How batch processing works

2024-09-21 Thread Lok P
On Sat, Sep 21, 2024 at 9:51 AM Michał Kłeczek  wrote:

> Hi,
>
> > On 19 Sep 2024, at 07:30, Lok P  wrote:
> >
> [snip]
> >
> > Method-4
> >
> > INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
> > INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
> > commit;
>
> I’ve done some batch processing of JSON messages from Kafka in Java.
> By far the most performant way was to:
>
> 1. Use prepared statements
> 2. Parse JSON messages in Postgres
> 3. Process messages in batches
>
> All three can be achieved by using arrays to pass batches:
>
> WITH parsed AS (
>   SELECT msg::json FROM unnest(?)
> ),
> parents AS (
>   INSERT INTO parent SELECT … FROM parsed RETURNING ...
> )
> INSERT INTO child SELECT … FROM parsed…
>
> Not the single parameter that you can bind to String[]
>
> Hope that helps.
>
>
Got your point.
But wondering why we don't see any difference in performance between
method-2 and method-3 above. So does it mean that,I am testing this in a
wrong way or it's the expected behaviour and thus there is no meaning in
converting the row by row inserts into a bulk insert, but just changing the
commit frequency will do the same job in a row by row insert approach?


Re: How batch processing works

2024-09-21 Thread Lok P
On Sun, Sep 22, 2024 at 12:46 AM Adrian Klaver 
wrote:

> On 9/21/24 07:36, Peter J. Holzer wrote:
> > On 2024-09-21 16:44:08 +0530, Lok P wrote:
>
> >
> ---
> > #!/usr/bin/python3
> >
> > import time
> > import psycopg2
> >
> > num_inserts = 10_000
> > batch_size = 50
> >
> > db = psycopg2.connect()
> > csr = db.cursor()
> >
> > csr.execute("drop table if exists parent_table")
> > csr.execute("create table parent_table (id int primary key, t text)")
> > db.commit()
> >
> > start_time = time.monotonic()
> > for i in range(1, num_inserts+1):
> >  csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
> >  if i % batch_size == 0:
> >  db.commit()
> > db.commit()
> > end_time = time.monotonic()
> > elapsed_time = end_time - start_time
> > print(f"Method 2: Individual Inserts with Commit after {batch_size}
> Rows: {elapsed_time:.3} seconds")
> >
> > # vim: tw=99
> >
> ---
>
> FYI, this is less of problem with psycopg(3) and pipeline mode:
>
> import time
> import psycopg
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg.connect("dbname=test user=postgres host=104.237.158.68")
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> with db.pipeline():
>  for i in range(1, num_inserts+1):
>  csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
>  if i % batch_size == 0:
>  db.commit()
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit
> after {batch_size}  Rows: {elapsed_time:.3} seconds")
>
>
> For remote to a database in another state that took the  time from:
>
> Method 2: Individual Inserts with Commit after 50  Rows: 2.42e+02 seconds
>
> to:
>
> Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after
> 50  Rows: 9.83 seconds
>
> > #!/usr/bin/python3
> >
> > import itertools
> > import time
> > import psycopg2
> >
> > num_inserts = 10_000
> > batch_size = 50
> >
> > db = psycopg2.connect()
> > csr = db.cursor()
> >
> > csr.execute("drop table if exists parent_table")
> > csr.execute("create table parent_table (id int primary key, t text)")
> > db.commit()
> >
> > start_time = time.monotonic()
> > batch = []
> > for i in range(1, num_inserts+1):
> >  batch.append((i, 'a'))
> >  if i % batch_size == 0:
> >  q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
> >  params = list(itertools.chain.from_iterable(batch))
> >  csr.execute(q, params)
> >  db.commit()
> >  batch = []
> > if batch:
> >  q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
> >  csr.execute(q, list(itertools.chain(batch)))
> >  db.commit()
> >  batch = []
> >
> > end_time = time.monotonic()
> > elapsed_time = end_time - start_time
> > print(f"Method 3: Batch Inserts ({batch_size})  with Commit after each
> batch: {elapsed_time:.3} seconds")
> >
> > # vim: tw=99
> >
> ---
>
> The above can also be handled with execute_batch() and execute_values()
> from:
>
> https://www.psycopg.org/docs/extras.html#fast-execution-helpers
>
> >
> > On my laptop, method2 is about twice as fast as method3. But if I
> > connect to a database on the other side of the city, method2 is now more
> > than 16 times faster than method3 . Simply because the delay in
> > communication is now large compared to the time it takes to insert those
> > rows.
> >
>
>
>
>
Thank you. So if I get it correct, if the client app(from which the data is
getting streamed/inserted) is in the same data center/zone as the database
(which is most of the time the case) then the batch insert does not appear
to be much beneficial.

Which also means , people here were afraid of having triggers in such a
high dml table as because this will make the "batch insert" automatically
converted into "row by row" behind the scene, but considering the
above results, it looks fine to go with a row by row approach (but just
having batch commit in place in place of row by row commit). And not to
worry about implementing the true batch insert approach as that is not
making a big difference here in data load performance.


Re: How batch processing works

2024-09-21 Thread Lok P
On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer  wrote:

> On 2024-09-21 16:44:08 +0530, Lok P wrote:
> > But wondering why we don't see any difference in performance between
> method-2
> > and method-3 above.
>
> The code runs completely inside the database. So there isn't much
> difference between a single statement which inserts 50 rows and 50
> statements which insert 1 row each. The work to be done is (almost) the
> same.
>
> This changes once you consider an application which runs outside of the
> database (maybe even on a different host). Such an application has to
> wait for the result of each statement before it can send the next one.
> Now it makes a difference whether you are waiting 50 times for a
> statement which does very little or just once for a statement which does
> more work.
>
> > So does it mean that,I am testing this in a wrong way or
>
> That depends on what you want to test. If you are interested in the
> behaviour of stored procedures, the test is correct. If you want to know
> about the performance of a database client (whether its written in Java,
> Python, Go or whatever), this is the wrong test. You have to write the
> test in your target language and run it on the client system to get
> realistic results (for example, the round-trip times will be a lot
> shorter if the client and database are on the same computer than when
> one is in Europe and the other in America).
>
> For example, here are the three methods as Python scripts:
>
>
> ---
> #!/usr/bin/python3
>
> import time
> import psycopg2
>
> num_inserts = 10_000
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
>
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
> csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 1: Individual Inserts with Commit after every Row:
> {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---
> #!/usr/bin/python3
>
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
> csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
> if i % batch_size == 0:
> db.commit()
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 2: Individual Inserts with Commit after {batch_size}  Rows:
> {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---
> #!/usr/bin/python3
>
> import itertools
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> batch = []
> for i in range(1, num_inserts+1):
> batch.append((i, 'a'))
> if i % batch_size == 0:
> q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
> params = list(itertools.chain.from_iterable(batch))
> csr.execute(q, params)
> db.commit()
> batch = []
> if batch:
> q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
> csr.execute(q, list(itertools.chain(batch)))
> db.commit()
> batch = []
>
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 3: Batch Inserts ({batch_size})  with Commit after each
> batch: {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---
>
> On my laptop, method2 is about twice as fast as method3. But if I
> connect to a database on the other side of the city, method2 is now more

Re: Grants not working on partitions

2024-09-28 Thread Lok P
On Sat, Sep 28, 2024 at 8:46 PM Adrian Klaver 
wrote:

> On 9/28/24 04:02, Lok P wrote:
> > Hi,
> > While we are creating any new tables, we used to give SELECT privilege
> > on the newly created tables using the below command. But we are seeing
> > now , in case of partitioned tables even if we had given the privileges
> > in the same fashion, the user is not able to query specific partitions
> > but only the table. Commands like "select * from
> > schema1. " are erroring out with the "insufficient
> > privilege" error , even if the partition belongs to the same table.
> >
> > Grant SELECT ON  to ;
> >
> > Grant was seen as a one time command which needed while creating the
> > table and then subsequent partition creation for that table was handled
> > by the pg_partman extension. But that extension is not creating or
> > copying any grants on the table to the users. We were expecting , once
> > the base table is given a grant , all the inherited partitions will be
> > automatically applied to those grants. but it seems it's not working
> > that way. So is there any other way to handle this situation?
>
>
> The docs are there for a reason:
>
>
> https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md#child-table-property-inheritance
>
> "Privileges & ownership are NOT inherited by default. If enabled by
> pg_partman, note that this inheritance is only at child table creation
> and isn't automatically retroactive when changed (see
> reapply_privileges()). Unless you need direct access to the child
> tables, this should not be needed. You can set the inherit_privileges
> option if this is needed (see config table information below)."
>
>
> And:
>
> "reapply_privileges(
>  p_parent_table text
> )
> RETURNS void
>
>  This function is used to reapply ownership & grants on all child
> tables based on what the parent table has set.
>  Privileges that the parent table has will be granted to all child
> tables and privileges that the parent does not have will be revoked
> (with CASCADE).
>  Privileges that are checked for are SELECT, INSERT, UPDATE, DELETE,
> TRUNCATE, REFERENCES, & TRIGGER.
>  Be aware that for large partition sets, this can be a very long
> running operation and is why it was made into a separate function to run
> independently. Only privileges that are different between the parent &
> child are applied, but it still has to do system catalog lookups and
> comparisons for every single child partition and all individual
> privileges on each.
>  p_parent_table - parent table of the partition set. Must be schema
> qualified and match a parent table name already configured in pg_partman.
> "
>
>
>
Thank you. I was not aware about this function which copies the grants from
parent to child ,so we can give a call to this function at the end of the
pg_partman job call which is happening through the cron job. But I see ,
the only issue is that this function only has one parameter
"p_parent_table" but nothing for "child_table" and that means it will try
to apply grants on all the childs/partitions which have been created till
today and may already be having the privileges already added in them.

And we have just ~60 partitions in most of the table so hope that will not
take longer but considering we create/purge one partition daily for each
partition table using the pg_partman, every time we give it a call, it will
try to apply/copy the grants on all the partitions(along with the current
day live partition), will it cause the existing running queries on the live
partitions to hard parse? or say will it cause any locking effect when it
will try to apply grant on the current/live partitions , which must be
inserted/updated/deleted data into or being queries by the users?


Grants not working on partitions

2024-09-28 Thread Lok P
Hi,
While we are creating any new tables, we used to give SELECT privilege on
the newly created tables using the below command. But we are seeing now ,
in case of partitioned tables even if we had given the privileges in the
same fashion, the user is not able to query specific partitions but only
the table. Commands like "select * from schema1. " are
erroring out with the "insufficient privilege" error , even if the
partition belongs to the same table.

Grant SELECT ON  to ;

Grant was seen as a one time command which needed while creating the table
and then subsequent partition creation for that table was handled by the
pg_partman extension. But that extension is not creating or copying any
grants on the table to the users. We were expecting , once the base table
is given a grant , all the inherited partitions will be automatically
applied to those grants. but it seems it's not working that way. So is
there any other way to handle this situation?

In other databases(say like Oracle) we use to create standard
"roles"(Read_role, Write_role etc..) and then provide grants to the user
through those roles. And the objects were given direct grants to those
roles. Similarly here in postgres we were granting "read" or "write"
privileges on objects to the roles and letting the users login to the
database using those roles and thus getting all the read/write privileges
assigned to those roles. Are we doing anything wrong?

Regards
Lok


Re: How batch processing works

2024-09-19 Thread Lok P
Below are the results for the posted methods. Tested it on local and it
gave no difference in timing between the method-2 andmethod-3. Failed to
run in dbfiddle somehow.

Also I was initially worried if adding the trigger to the our target table,
will worsen the performance as because , it will make all the execution to
"row by row" rather a true batch insert(method-3 as posted) as there will
be more number of context switches , but it seems it will still be doing
the batch commits(like the way its in method-2). So as per that , we won't
lose any performance as such. Is this understanding correct?


*Method-1- 00:01:44.48*

*Method-2- 00:00:02.67*

*Method-3- 00:00:02.39*

https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6

On Thu, Sep 19, 2024 at 6:42 PM Lok P  wrote:

>
>
> On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson 
> wrote:
>
>> On Thu, Sep 19, 2024 at 5:24 AM Lok P  wrote:
>>
>>>
>>>
>>> [snip]
>>
>>> DO $$
>>> DECLARE
>>> num_inserts INTEGER := 10;
>>> batch_size INTEGER := 50;
>>> start_time TIMESTAMP;
>>> end_time TIMESTAMP;
>>> elapsed_time INTERVAL;
>>> i INTEGER;
>>> BEGIN
>>> -- Method 1: Individual Inserts with Commit after every Row
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..num_inserts LOOP
>>> INSERT INTO parent_table VALUES (i, 'a');
>>> COMMIT;
>>> END LOOP;
>>>
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 1: Individual Inserts with Commit after every Row',
>>> start_time, end_time, elapsed_time);
>>>
>>> -- Method 2: Individual Inserts with Commit after 100 Rows
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..num_inserts LOOP
>>> INSERT INTO parent_table2 VALUES (i, 'a');
>>> -- Commit after every 100 rows
>>> IF i % batch_size = 0 THEN
>>> COMMIT;
>>> END IF;
>>> END LOOP;
>>>
>>> -- Final commit if not already committed
>>>commit;
>>>
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>>> start_time, end_time, elapsed_time);
>>>
>>> -- Method 3: Batch Inserts with Commit after all
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..(num_inserts / batch_size) LOOP
>>> INSERT INTO parent_table3 VALUES
>>> (1 + (i - 1) * batch_size, 'a'),
>>>
>> [snip]
>>
>>> (49 + (i - 1) * batch_size, 'a'),
>>> (50 + (i - 1) * batch_size, 'a'));
>>> COMMIT;
>>> END LOOP;
>>>
>>> COMMIT;  -- Final commit for all
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>>> end_time, elapsed_time);
>>>
>>> END $$;
>>>
>>
>> Reproduce what behavior?
>>
>> Anyway, plpgsql functions (including anonymous DO statements) are -- to
>> Postgresql -- single statements.  Thus, they'll be faster than
>> individual calls..
>>
>> An untrusted language like plpython3u might speed things up even more, if
>> you have to read a heterogeneous external file and insert all the records
>> into the db.
>>
>
> Here if you see my script , the method-1 is doing commit after each row
> insert. And method-2 is doing a batch commit i.e. commit after every "50"
> row. And method-3 is doing a true batch insert i.e. combining all the 50
> values in one insert statement and submitting to the database in oneshot
> and then COMMIT it, so the context switching will be a lot less. So I was
> expecting Method-3 to be the fastest way to insert the rows here, but the
> response time shows the same response time for Method-2 and method-3.
> Method-1 is the slowest through.
>


Re: How batch processing works

2024-09-19 Thread Lok P
On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson  wrote:

> On Thu, Sep 19, 2024 at 5:24 AM Lok P  wrote:
>
>>
>>
>> [snip]
>
>> DO $$
>> DECLARE
>> num_inserts INTEGER := 10;
>> batch_size INTEGER := 50;
>> start_time TIMESTAMP;
>> end_time TIMESTAMP;
>> elapsed_time INTERVAL;
>> i INTEGER;
>> BEGIN
>> -- Method 1: Individual Inserts with Commit after every Row
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table VALUES (i, 'a');
>> COMMIT;
>> END LOOP;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 1: Individual Inserts with Commit after every Row',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 2: Individual Inserts with Commit after 100 Rows
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table2 VALUES (i, 'a');
>> -- Commit after every 100 rows
>> IF i % batch_size = 0 THEN
>> COMMIT;
>> END IF;
>> END LOOP;
>>
>> -- Final commit if not already committed
>>commit;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 3: Batch Inserts with Commit after all
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..(num_inserts / batch_size) LOOP
>> INSERT INTO parent_table3 VALUES
>> (1 + (i - 1) * batch_size, 'a'),
>>
> [snip]
>
>> (49 + (i - 1) * batch_size, 'a'),
>> (50 + (i - 1) * batch_size, 'a'));
>> COMMIT;
>> END LOOP;
>>
>> COMMIT;  -- Final commit for all
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>> end_time, elapsed_time);
>>
>> END $$;
>>
>
> Reproduce what behavior?
>
> Anyway, plpgsql functions (including anonymous DO statements) are -- to
> Postgresql -- single statements.  Thus, they'll be faster than
> individual calls..
>
> An untrusted language like plpython3u might speed things up even more, if
> you have to read a heterogeneous external file and insert all the records
> into the db.
>

Here if you see my script , the method-1 is doing commit after each row
insert. And method-2 is doing a batch commit i.e. commit after every "50"
row. And method-3 is doing a true batch insert i.e. combining all the 50
values in one insert statement and submitting to the database in oneshot
and then COMMIT it, so the context switching will be a lot less. So I was
expecting Method-3 to be the fastest way to insert the rows here, but the
response time shows the same response time for Method-2 and method-3.
Method-1 is the slowest through.


How batch processing works

2024-09-18 Thread Lok P
Hello,
Saw multiple threads around the same , so I want some clarification. As we
know row by row is slow by slow processing , so in heavy write systems(say
the client app is in Java) , people asked to do DMLS in batches rather in a
row by row fashion to minimize the chatting or context switches between
database and client which is resource intensive. What I understand is that
, a true batch processing means the client has to collect all the input
bind values and  prepare the insert statement and submit to the database at
oneshot and then commit.

What it means actually and if we divide the option as below, which method
truly does batch processing or there exists some other method for doing the
batch processing considering postgres as backend database?

I understand, the first method below is truly a row by row processing in
which context switches happen between client and database with each row,
whereas the second method is just batching the commits but not a true batch
processing as it will do the same amount of context switching between the
database and client. But regarding the third and fourth method, will both
execute in a similar fashion in the database with the same number of
context switches? If any other better method exists to do these inserts in
batches? Appreciate your guidance.


CREATE TABLE parent_table (
id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE child_table (
id SERIAL PRIMARY KEY,
parent_id INT REFERENCES parent_table(id),
value TEXT
);


Method-1

insert into parent_table values(1,'a');
commit;
insert into parent_table values(2,'a');
commit;
insert into child_table values(1,1,'a');
Commit;
insert into child_table values(1,2,'a');
commit;

VS

Method-2

insert into parent_table values(1,'a');
insert into parent_table values(2,'a');
insert into child_table values(1,1,'a');
insert into child_table values(1,2,'a');
Commit;

VS

Method-3

with
 a as ( insert into parent_table values(1,'a')  )
, a1 as (insert into parent_table values(2,'a') )
, b as (insert into child_table values(1,1,'a')  )
 , b1 as  (insert into child_table values(1,2,'a')  )
select;
commit;

Method-4

INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
commit;

Regards
Lok


Re: How batch processing works

2024-09-19 Thread Lok P
On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson 
wrote:

>
> [snip]
>
>>
>> Method-4
>>
>> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
>> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
>> commit;
>>
>
> If I knew that I had to load a structured input data file (even if it had
> parent and child records), this is how I'd do it (but probably first try
> and see if "in-memory COPY INTO" is such a thing).
>
>

I was trying to reproduce this behaviour using row by row commit vs just
batch commit vs true batch insert as you mentioned, i am not able to see
any difference between "batch commit" and "true batch insert" response. Am
I missing anything?

CREATE TABLE debug_log (
method1 TEXT,
start_time TIMESTAMP,
end_time TIMESTAMP,
elapsed_time INTERVAL
);

CREATE TABLE parent_table (
id SERIAL PRIMARY KEY,
name TEXT
);


CREATE TABLE parent_table2 (
id SERIAL PRIMARY KEY,
name TEXT
);


CREATE TABLE parent_table3 (
id SERIAL PRIMARY KEY,
name TEXT
);
DO $$
DECLARE
num_inserts INTEGER := 10;
batch_size INTEGER := 50;
start_time TIMESTAMP;
end_time TIMESTAMP;
elapsed_time INTERVAL;
i INTEGER;
BEGIN
-- Method 1: Individual Inserts with Commit after every Row
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table VALUES (i, 'a');
COMMIT;
END LOOP;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 1: Individual Inserts with Commit after every Row',
start_time, end_time, elapsed_time);

-- Method 2: Individual Inserts with Commit after 100 Rows
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table2 VALUES (i, 'a');
-- Commit after every 100 rows
IF i % batch_size = 0 THEN
COMMIT;
END IF;
END LOOP;

-- Final commit if not already committed
   commit;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
start_time, end_time, elapsed_time);

-- Method 3: Batch Inserts with Commit after all
start_time := clock_timestamp();

FOR i IN 1..(num_inserts / batch_size) LOOP
INSERT INTO parent_table3 VALUES
(1 + (i - 1) * batch_size, 'a'),
(2 + (i - 1) * batch_size, 'a'),
(3 + (i - 1) * batch_size, 'a'),
(4 + (i - 1) * batch_size, 'a'),
(5 + (i - 1) * batch_size, 'a'),
(6 + (i - 1) * batch_size, 'a'),
(7 + (i - 1) * batch_size, 'a'),
(8 + (i - 1) * batch_size, 'a'),
(9 + (i - 1) * batch_size, 'a'),
(10 + (i - 1) * batch_size, 'a'),
(11 + (i - 1) * batch_size, 'a'),
(12 + (i - 1) * batch_size, 'a'),
(13 + (i - 1) * batch_size, 'a'),
(14 + (i - 1) * batch_size, 'a'),
(15 + (i - 1) * batch_size, 'a'),
(16 + (i - 1) * batch_size, 'a'),
(17 + (i - 1) * batch_size, 'a'),
(18 + (i - 1) * batch_size, 'a'),
(19 + (i - 1) * batch_size, 'a'),
(20 + (i - 1) * batch_size, 'a'),
(21 + (i - 1) * batch_size, 'a'),
(22 + (i - 1) * batch_size, 'a'),
(23 + (i - 1) * batch_size, 'a'),
(24 + (i - 1) * batch_size, 'a'),
(25 + (i - 1) * batch_size, 'a'),
(26 + (i - 1) * batch_size, 'a'),
(27 + (i - 1) * batch_size, 'a'),
(28 + (i - 1) * batch_size, 'a'),
(29 + (i - 1) * batch_size, 'a'),
(30 + (i - 1) * batch_size, 'a'),
(31 + (i - 1) * batch_size, 'a'),
(32 + (i - 1) * batch_size, 'a'),
(33 + (i - 1) * batch_size, 'a'),
(34 + (i - 1) * batch_size, 'a'),
(35 + (i - 1) * batch_size, 'a'),
(36 + (i - 1) * batch_size, 'a'),
(37 + (i - 1) * batch_size, 'a'),
(38 + (i - 1) * batch_size, 'a'),
(39 + (i - 1) * batch_size, 'a'),
(40 + (i - 1) * batch_size, 'a'),
(41 + (i - 1) * batch_size, 'a'),
(42 + (i - 1) * batch_size, 'a'),
(43 + (i - 1) * batch_size, 'a'),
(44 + (i - 1) * batch_size, 'a'),
(45 + (i - 1) * batch_size, 'a'),
(46 + (i - 1) * batch_size, 'a'),
(47 + (i - 1) * batch_size, 'a'),
(48 + (i - 1) * batch_size, 'a'),
(49 + (i - 1) * batch_size, 'a'),
(50 + (i - 1) * batch_size, 'a'));
COMMIT;
END LOOP;

COMMIT;  -- Final commit for all
end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 3: Ba