Re: Read write performance check
As Rob mentioned, the syntax you posted is not correct. You need to process or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one shot. But again your uses case seems common one considering you want to compare the read and write performance on multiple databases with similar table structure as per your usecase. So in that case, you may want to use some test scripts which others must have already done rather reinventing the wheel. On Wed, 20 Dec, 2023, 10:19 am veem v, wrote: > Thank you. > > Yes, actually we are trying to compare and see what maximum TPS are we > able to reach with both of these row by row and batch read/write test. And > then afterwards, this figure may be compared with other databases etc with > similar setups. > > So wanted to understand from experts here, if this approach is fine? Or > some other approach is advisable? > > I agree to the point that , network will play a role in real world app, > but here, we are mainly wanted to see the database capability, as network > will always play a similar kind of role across all databases. Do you > suggest some other approach to achieve this objective? > > > On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, wrote: > >> On 2023-12-20 00:44:48 +0530, veem v wrote: >> > So at first, we need to populate the base tables with the necessary >> data (say >> > 100million rows) with required skewness using random functions to >> generate the >> > variation in the values of different data types. Then in case of row by >> row >> > write/read test , we can traverse in a cursor loop. and in case of >> batch write/ >> > insert , we need to traverse in a bulk collect loop. Something like >> below and >> > then this code can be wrapped into a procedure and passed to the >> pgbench and >> > executed from there. Please correct me if I'm wrong. >> >> One important point to consider for benchmarks is that your benchmark >> has to be similar to the real application to be useful. If your real >> application runs on a different node and connects to the database over >> the network, a benchmark running within a stored procedure may not be >> very indicative of real performance. >> >> hp >> >> -- >>_ | Peter J. Holzer| Story must make more sense than reality. >> |_|_) || >> | | | h...@hjp.at |-- Charles Stross, "Creative writing >> __/ | http://www.hjp.at/ | challenge!" >> >
Re: Read write performance check
As I mentioned your scenario looks generic one, but I don't have any sample scripts/docs to share, sorry for that. Other people may suggest any sample scripts etc if any. or you may post it on performance group, if someone has done similar stuff in the past. But as per me, the performance test scripts will look like 1) row by row insert/select in cursor loop as you were showing earlier in this thread and another one with batch/bulk inserts. And then calling it through pgbench or any other scheduler for creating concurrency. On Thu, Dec 21, 2023 at 7:00 PM veem v wrote: > Can someone please guide me, if any standard scripting is available for > doing such read/write performance test? Or point me to any available docs? > > On Wed, 20 Dec, 2023, 10:39 am veem v, wrote: > >> Thank you. >> >> That would really be helpful if such test scripts or similar setups are >> already available. Can someone please guide me to some docs or blogs or >> sample scripts, on same please. >> >> On Wed, 20 Dec, 2023, 10:34 am Lok P, wrote: >> >>> As Rob mentioned, the syntax you posted is not correct. You need to >>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M >>> at one shot. >>> >>> But again your uses case seems common one considering you want to >>> compare the read and write performance on multiple databases with similar >>> table structure as per your usecase. So in that case, you may want to use >>> some test scripts which others must have already done rather reinventing >>> the wheel. >>> >>> >>> On Wed, 20 Dec, 2023, 10:19 am veem v, wrote: >>> >>>> Thank you. >>>> >>>> Yes, actually we are trying to compare and see what maximum TPS are we >>>> able to reach with both of these row by row and batch read/write test. And >>>> then afterwards, this figure may be compared with other databases etc with >>>> similar setups. >>>> >>>> So wanted to understand from experts here, if this approach is fine? >>>> Or some other approach is advisable? >>>> >>>> I agree to the point that , network will play a role in real world app, >>>> but here, we are mainly wanted to see the database capability, as network >>>> will always play a similar kind of role across all databases. Do you >>>> suggest some other approach to achieve this objective? >>>> >>>> >>>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, >>>> wrote: >>>> >>>>> On 2023-12-20 00:44:48 +0530, veem v wrote: >>>>> > So at first, we need to populate the base tables with the necessary >>>>> data (say >>>>> > 100million rows) with required skewness using random functions to >>>>> generate the >>>>> > variation in the values of different data types. Then in case of row >>>>> by row >>>>> > write/read test , we can traverse in a cursor loop. and in case of >>>>> batch write/ >>>>> > insert , we need to traverse in a bulk collect loop. Something like >>>>> below and >>>>> > then this code can be wrapped into a procedure and passed to the >>>>> pgbench and >>>>> > executed from there. Please correct me if I'm wrong. >>>>> >>>>> One important point to consider for benchmarks is that your benchmark >>>>> has to be similar to the real application to be useful. If your real >>>>> application runs on a different node and connects to the database over >>>>> the network, a benchmark running within a stored procedure may not be >>>>> very indicative of real performance. >>>>> >>>>> hp >>>>> >>>>> -- >>>>>_ | Peter J. Holzer| Story must make more sense than reality. >>>>> |_|_) || >>>>> | | | h...@hjp.at |-- Charles Stross, "Creative writing >>>>> __/ | http://www.hjp.at/ | challenge!" >>>>> >>>>
How to do faster DML
Hello All, A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are struggling to do the same. Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below create table TAB1_New as SELECT * from TAB1 A where ID in (select min(ID) from TAB1 group by ID having count(ID)>=1 ); But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB. *Create index idx1 on TAB1(ID)* And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same. explain *select min(ID) from TAB1 Agroup by ID having count(ID)>=1* GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46) Group Key: ID Filter: (count(ID) >= 1) -> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14) I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query. In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ? Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, 256GB RAM. PG version 15.4. Regards Lok
Re: How to do faster DML
Apology. One correction, the query is like below. I. E filter will be on on ctid which I believe is equivalent of rowid in oracle and we will not need the index on Id column then. But, it still runs long, so thinking any other way to make the duplicate removal faster? Also wondering , the index creation which took ~2.5hrs+ , would that have been made faster any possible way by allowing more db resource through some session level db parameter setting? create table TAB1_New as SELECT * from TAB1 A where CTID in (select min(CTID) from TAB1 group by ID having count(ID)>=1 ); On Sat, Feb 3, 2024 at 5:50 PM Lok P wrote: > Hello All, > A non partitioned table having ~4.8 billion rows in it and having data > size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has > got approx ~1billion+ duplicate rows inserted in it and we want to get the > duplicate data removed for this table and create a PK/unique constraint > back so as to not have the duplicate values in future. We are struggling to > do the same. > > Teammates suggested doing this using CTAS method, i.e. create a new table > with the unique record set and then drop the main table. Something as below > > create table TAB1_New > as > SELECT * from TAB1 A > where ID in > (select min(ID) from TAB1 > group by ID having count(ID)>=1 ); > > But for the above to work faster , they mentioned to have an index created > on the column using which the duplicate check will be performed i.e ID > column. So, creating the index itself took ~2hrs+ and the index size now > shows as ~116GB. > > > *Create index idx1 on TAB1(ID)* > And now running the SELECT subquery part of the CTAS statement to see if > its giving the correct unique records count. It ran for 2.5 hrs and then we > killed it. Below is the plan for the same. > > explain > *select min(ID) from TAB1 Agroup by ID having count(ID)>=1* > > GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46) > Group Key: ID > Filter: (count(ID) >= 1) > -> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 > rows=4883397120 width=14) > > I want to understand if by any way this can be done faster . Also I am > worried that creating PK constraint/index back after deleting the duplicate > is also going to run forever. Is there any way we can make these heavy > operations faster on postgre by facilitating more database resources > through some parameter setup, like parallel hint etc? We have pg_hint_plan > extension added, but not seeing the parallel hint enforced when adding it > to the query. > > In Oracle we have Parallel hints, Direct path read/write for faster > read/write operations, parallel index scan etc. available, if anything > similar to that available in aurora postgre to facilitate more > horsepower and speed up the batch operations. And , how can we monitor > progress of any running query ? > Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, > 256GB RAM. PG version 15.4. > > Regards > Lok >
Re: How to do faster DML
Ron Johnson 7:37 PM (1 hour ago) to *pgsql-general* On Sat, Feb 3, 2024 at 7:37 PM Ron Johnson wrote: > On Sat, Feb 3, 2024 at 8:55 AM Lok P wrote: > >> Apology. One correction, the query is like below. I. E filter will be on >> on ctid which I believe is equivalent of rowid in oracle and we will not >> need the index on Id column then. >> >> But, it still runs long, so thinking any other way to make the duplicate >> removal faster? >> >> Also wondering , the index creation which took ~2.5hrs+ , would that have >> been made faster any possible way by allowing more db resource through some >> session level db parameter setting? >> >> create table TAB1_New >> as >> SELECT * from TAB1 A >> where CTID in >> (select min(CTID) from TAB1 >> group by ID having count(ID)>=1 ); >> >> >> On Sat, Feb 3, 2024 at 5:50 PM Lok P wrote: >> >>> Hello All, >>> A non partitioned table having ~4.8 billion rows in it and having data >>> size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has >>> got approx ~1billion+ duplicate rows inserted in it and we want to get the >>> duplicate data removed for this table and create a PK/unique constraint >>> back so as to not have the duplicate values in future. We are struggling to >>> do the same. >>> >>> Teammates suggested doing this using CTAS method, i.e. create a new >>> table with the unique record set and then drop the main table. Something as >>> below >>> >>> create table TAB1_New >>> as >>> SELECT * from TAB1 A >>> where ID in >>> (select min(ID) from TAB1 >>> group by ID having count(ID)>=1 ); >>> >>> But for the above to work faster , they mentioned to have an index >>> created on the column using which the duplicate check will be performed i.e >>> ID column. So, creating the index itself took ~2hrs+ and the index size now >>> shows as ~116GB. >>> >>> >>> *Create index idx1 on TAB1(ID)* >>> And now running the SELECT subquery part of the CTAS statement to see if >>> its giving the correct unique records count. It ran for 2.5 hrs and then we >>> killed it. Below is the plan for the same. >>> >>> explain >>> *select min(ID) from TAB1 Agroup by ID having count(ID)>=1* >>> >>> GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46) >>> Group Key: ID >>> Filter: (count(ID) >= 1) >>> -> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 >>> rows=4883397120 width=14) >>> >>> I want to understand if by any way this can be done faster . Also I am >>> worried that creating PK constraint/index back after deleting the duplicate >>> is also going to run forever. Is there any way we can make these heavy >>> operations faster on postgre by facilitating more database resources >>> through some parameter setup, like parallel hint etc? We have pg_hint_plan >>> extension added, but not seeing the parallel hint enforced when adding it >>> to the query. >>> >>> In Oracle we have Parallel hints, Direct path read/write for faster >>> read/write operations, parallel index scan etc. available, if anything >>> similar to that available in aurora postgre to facilitate more >>> horsepower and speed up the batch operations. And , how can we monitor >>> progress of any running query ? >>> Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, >>> 256GB RAM. PG version 15.4. >>> >> > Aurora is not Postgresql, so configurations might not work. Having said > that... > https://www.postgresql.org/docs/15t/how-parallel-query-works.html > > And have you analyzed the table lately? Also, what's your work_mem > and maintenance_work_mem? > Thank you . Below are the values of the default parameters in this instance SHOW max_worker_processes; - 128 show max_parallel_workers_per_gather;- 4 show max_parallel_workers;- 32 show max_parallel_maintenance_workers; - 2 show maintenance_work_mem; - 4155MB show work_mem; - 8MB show shared_buffers ; -22029684 When I ran the CTAS queries and index creation process , I had not a very clear idea of how these are related to each other and help each of the operations, but I set a few of those as below before triggering those in the same session. set max_parallel_workers_per_gather=16; SET max_parallel_maintenance_workers TO 16; SET maintenance_work_mem TO '16 GB'; The instance has a total ~256 GB memory, so how should I adjust/bump these values when running heavy SELECT queries doing a large sequential scan OR large index creation process OR any Select query with heavy sorting/"order by" operations OR heavy JOINS? I have not analyzed the table manually though , but seeing the auto_vaccum and auto_analyze column getting populated in the pg_stat_user_tables , I thought it must be doing that automatically. By the way if we run "analyze tab1' on this 1.5TB table , will that run longer and will any of the above parameters help to expedite that ANALYZE operation too, if I run the ANALYZE manually? Regards Lok >
Re: How to do faster DML
On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte wrote: > On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane > wrote: > ... > > Given the size of your table, you probably want to divide that up. > > As long as nothing is changing the original table, you could do: > > > > insert into mytable2 select * from mytable1 order by ctid limit > 10_000_000 offset 0; > > insert into mytable2 select * from mytable1 order by ctid limit > 10_000_000 offset 10_000_000; > > insert into mytable2 select * from mytable1 order by ctid limit > 10_000_000 offset 20_000_000; > > Is it me or does this have the faint smell of quadratic behaviour? I > mean, you need to read and discard the first 10M to do offset 10M ( > also I believe ctid order is implied in sequential scan, but no so > sure, if not it would need a full table sort on each pass ). > > When doing things like this, I normally have some kind of unique index > and do it by selecting with limit above the last read value( stored > when doing it via code, as I normally load first, index later so I > cannot select max() fast on the target ). Or, with some kind of > "sparse" index (meaning, groups much less than the batch size ) and a > little code you can select where index_col > last order by index_col > limit 10M, remember last received index_col and reselect discarding > missing ( or just reselect it when doing your on conflict do nothing, > which also has a fast select max(id) due to the PK, it will work if it > has an index on id column on the original even if not unique ) to > avoid that. > > Also, I'm not sure if ctid is ordered and you can select where > ctid>last ordered, if that works it probably is faster for immutable > origins. > > Francisco Olarte. Thank you. Yes , I think the "on conflict do nothing;" option looks promising as it will remove the duplicate in case of PK violation but keepte load continue for subsequent rows. However , as we have ~5billion rows in the base table and out of that , we were expecting almost half i.e. ~2billion would be duplicates. And you said, doing the inserts using the "order by CTID Offset" approach must cause one full sequential scan of the whole table for loading each chunk/10M of rows and that would take a long time I believe. I am still trying to understand the other approach which you suggested. Not able to understand "y*ou can select where index_col > last order by index_col **limit 10M," .* However, to get the max ID value of the last 10M loaded rows in target, do you say that having an PK index created on that target table column(ID) will help, and we can save the max (ID) value subsequently in another table to fetch and keep loading from the source table (as ID>Max_ID stored in temp table)? OR Would it be better to do it in one shot only , but by setting a higher value of some parameters like "maintenance_work_mem" or "max_parallel_workers"?
Re: How to do faster DML
On Sun, Feb 4, 2024 at 8:14 PM Dennis White wrote: > I'm surprised no one has mentioned perhaps it's a good idea to partition > this table while adding the pk. By your own statements the table is > difficult to work with as is. Without partitioning the table, row inserts > would need to walk the pk index and could be a factor. If this is static > table then perhaps that's ok but if not... > > Anyway I don't recall what type the ID was or how it's set but i suggest > you seriously investigate using it to partition this table into manageable > smaller tables. > Best of luck. > > On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro wrote: > >> insert into mytable2 select * from mytable1 order by ctid limit >>> 10_000_000 offset 20_000_000; >>> >> >> You can use min/max values grouping them by 10_000_000 records, so you >> don´t need that offset, then generate commands and run them. >> >> select format('insert into mytable2 select * from mytable1 where i >> between %s and %s;', max(i), min(i)) from t group by i/10_000_000; >> > Thank you so much. You are correct. It was seemingly difficult to operate on this table. Every read query is struggling and so partitioning is something we must have to think of. And hoping that, postgres will be able to handle this scale, with proper partitioning and indexing strategy. I have a few related questions. 1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good idea or will it be cumbersome/resource intensive to re-enable the constraints , after persisting all the data in the table? 2)I understand there is no limitation theoretically on the number or size of partitions a table can have in postgres. But I want to know from experts here, from their real life experience, if there exists any such thing which we should take care of before deciding the partitioning strategy, so as to have the soft limit (both with respect to size and number of partitions) obeyed. Note:- Currently this table will be around ~1TB in size and will hold Approx ~3billion rows(post removal of duplicates). But as per business need it may grow up to ~45 billion rows in future. 3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performance. So, Can you please throw some light on the compression strategy which we should follow here (considering a daily range partition table based on transaction_date as partition key)? Regards Lok
Re: How to do faster DML
On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane wrote: > Partitioning will depend a lot on how you access the data. Is it always > using that transaction_date column in the where clause? Can you share the > table definition via \d? We've talked about this table quite a bit, but not > actually seen what it looks like. We are flying blind a little bit. You > mentioned your queries are slow, but can you share an explain analyze on > one of these slow queries? > > 45 billion rows is quite manageable. How many rows are in each day? You > may want to do a larger bucket than 24 hours per partition. > > 1)Even after partitioning the target table , to speed up the data load on >> this table , Is there an option to disable the primary and foreign keys and >> re-enable them post data load finishes. Will that be a good idea >> > > No. > > 3)As the size of the table or each partition is going to be very large and >> this will be a read intensive application, compressing the historical >> partition will help us save the storage space and will also help the read >> queries performance > > > I am not sure what this means. If you are not going to need the data > anymore, dump the data to deep storage and drop the partition. > > Cheers, > Greg > > Thank you. The table has ~127 columns of different data types , combinations of Numeric, varchar, date etc. And is having current size ~1TB holding ~3billion rows currently and the row size is ~300bytes. Currently it has lesser volume , but in future the daily transaction per day which will be inserted into this table will be Max ~500million rows/day. And the plan is to persist at least ~3months of transaction data which will be around 45billion rows in total. And if all works well , we may need to persist ~6 months worth of data in this database in future and that will be ~90 billion. This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data. When you said "*You may want to do a larger bucket than 24 hours per partition.*", do you mean to say partition by weekly or so? Currently as per math i.e. 1TB of storage for ~3billion rows. So the daily range partition size( to hold ~500million transactions/day) will be around ~100-150GB. Won't that be too much data for a single partition to operate on, and increasing the granularity further(say weekly) will make the partition more bulkier? What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres? With regards to loading data to the table faster, wondering why you said ' *NO*' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach. Because this table is a child table and the parent is already having data in it, loading data to this table in presence of PK and FK makes it too slow as it tries to validate those for each set of rows. So we were thinking if doing it at a later stage at oneshot will be a faster approach. Please suggest. I will try to collect some SELECT query and post the explain analyze. Currently we are trying to get rid of the duplicates. Regards Lok >
Re: How to do faster DML
On Sun, Feb 4, 2024 at 9:25 PM Ron Johnson wrote: > 1. Load the children before attaching them to the parent. > 2. Create the child indices, PK and FKs before attaching to the parent. > 3. Do step 2 in multiple parallel jobs. (cron is your friend.) > 4. Attach the children to the "naked" (no PK, no FK, no indices) parent. > 5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the > indices, PK and FK to the parent *after* step 4. > >> >> >> >> Thank You. Can you please help me to understand these steps a bit more accurately. Say we have a parent table already having data in it and also a primary key defined.We will load the child table first , by dropping the Primary key and Foreign keys, so that data load will be faster. Then we will create the primary key index on the child table. When you said using multiple parallel jobs, do you mean creating the PK index on each partition separately from different sessions rather than creating on the table using a single statement or some other faster way to create the PK index? Now the last step is attaching the PK and FK of the above child table to the parent. This will validate each and every row for the uniqueness and also to ensure the foreign key is present in the parent table. Won't this take a longer time in this step?
Re: How to do faster DML
On Mon, 5 Feb, 2024, 2:21 am Lok P, wrote: On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane wrote: Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about this table quite a bit, but not actually seen what it looks like. We are flying blind a little bit. You mentioned your queries are slow, but can you share an explain analyze on one of these slow queries? 45 billion rows is quite manageable. How many rows are in each day? You may want to do a larger bucket than 24 hours per partition. 1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good idea No. 3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performance I am not sure what this means. If you are not going to need the data anymore, dump the data to deep storage and drop the partition. Cheers, Greg Thank you. The table has ~127 columns of different data types , combinations of Numeric, varchar, date etc. And is having current size ~1TB holding ~3billion rows currently and the row size is ~300bytes. Currently it has lesser volume , but in future the daily transaction per day which will be inserted into this table will be Max ~500million rows/day. And the plan is to persist at least ~3months of transaction data which will be around 45billion rows in total. And if all works well , we may need to persist ~6 months worth of data in this database in future and that will be ~90 billion. This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data. When you said "*You may want to do a larger bucket than 24 hours per partition.*", do you mean to say partition by weekly or so? Currently as per math i.e. 1TB of storage for ~3billion rows. So the daily range partition size( to hold ~500million transactions/day) will be around ~100-150GB. Won't that be too much data for a single partition to operate on, and increasing the granularity further(say weekly) will make the partition more bulkier? What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres? With regards to loading data to the table faster, wondering why you said ' *NO*' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach. Because this table is a child table and the parent is already having data in it, loading data to this table in presence of PK and FK makes it too slow as it tries to validate those for each set of rows. So we were thinking if doing it at a later stage at oneshot will be a faster approach. Please suggest. I will try to collect some SELECT query and post the explain analyze. Currently we are trying to get rid of the duplicates. Regards Lok Any thoughts, based on above usage pattern? While I did the maths based on the current stats with limited data sets. The partitions size coming as 100 to 150gb as I explained above, if we keep it daily range partition. Should we have to make it hourly then? So want some experts view if this going to work fine for a read latency intensive applications or we should have some different strategy?
Re: How to do faster DML
On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane wrote: > On Sun, Feb 4, 2024 at 3:52 PM Lok P wrote: > >> This table will always be queried on the transaction_date column as one >> of the filters criteria. But the querying/search criteria can span from a >> day to a month worth of transaction date data. >> > > In that case, and based on the numbers you provided, daily partitioning > seems a decent solution. > > What I mean was, we will definitely need the data for querying purposes by >> the users, but just to keep the storage space incontrol (and to help the >> read queries), we were thinking of having the historical partitions >> compressed. And for that , if any compression strategy should we follow on >> postgres? >> > > There is no compression strategy, per se. There are ways Postgres > internally compresses the data (see "TOAST"), but in general, the table is > either there or not. If space is a concern you should examine if you truly > need 127 columns, make sure they are arranged correctly (search for > 'postgres column tetris'), and move unused and older partitions elsewhere - > like a separate cheaper Postgres server, or something offline. > > With regards to loading data to the table faster, wondering why you said ' >> *NO*' to load the data first and enabling/Creating the Primary key and >> Foreign key constraint later approach. >> > > For an initial load, this is fine, if you do things very carefully. I'd > build the child table indexes post load but still feed things into the main > table as an initial tradeoff, but YMMV. > > Just looked back and saw this is actually Aurora, not Postgres. Most of > the advice on this thread should still apply, but be aware that things are > not the same as Postgres, especially at the storage layer. For all the > money you are giving them, don't forget to bug them as well. > > Cheers, > Greg > > Thank you so much Greg. That helps. So when you said *"In that case, and based on the numbers you provided, daily partitioning seems a decent solution."* , does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries? And yes I think you are spot on in regards to the storage. Actually we loaded csv files which were having size summed to ~200GB and I was wondering howcome the table size becomes ~1TB when that data is loaded to the database. I was not aware about the "column tetris" and how postgres padding additional spaces to the column data based on subsequent columns type, still trying to digest the concept. Though we have many columns with NULL values in them. Will try to reorder the column in the table and hopefully , we will be able to really reduce the storage space through that. Will raise a case too. Regards Lok
Re: How to do faster DML
Thank you Greg. *"and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query."* Understood the point. As postgres is a "row store" database, so keeping the size of the row lesser by making the number of columns to minimum, should definitely help as more number of rows can be packed then into one page or block (which I believe is 8KB default size per block). And that mean more number of blocks can be cached while reading, so better cache hit ratio. As you rightly pointed out, Another thing I noticed the shared_buffer parameters set as 2029684 in this instance, which comes to ~21MB and that seems very small for a database operating in large scale. And I see we have RAM in the instance showing as ~256GB. So thinking of bumping it to something as ~40-50GB. Hope that will help to some extent. Not sure if there is methods to manually, cache some objects(tables/indexes) which were getting used frequently by the read queries. On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane wrote: > > So when you said *"In that case, and based on the numbers you provided, >> daily partitioning seems a decent solution."* >> , does it mean that size of the partition (which will be ~100-150GB per >> partition here) is not an issue for the postgres to serve read latency >> sensitive queries? >> > > Yes, that should be fine. At the end of the day, what's going to be more > important is making sure you are using good indexing, and keeping all your > active stuff in cache. Since you have 127 columns, only pull back the > columns you absolutely need for each query. > > Cheers, > Greg > >
Re: How to do faster DML
> On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane > wrote: > >> On Sun, Feb 4, 2024 at 3:52 PM Lok P wrote: >> >>> What I mean was, we will definitely need the data for querying purposes >>> by the users, but just to keep the storage space incontrol (and to help the >>> read queries), we were thinking of having the historical partitions >>> compressed. And for that , if any compression strategy should we follow on >>> postgres? >>> >> >> There is no compression strategy, per se. There are ways Postgres >> internally compresses the data (see "TOAST"), but in general, the table is >> either there or not. If space is a concern you should examine if you truly >> need 127 columns, make sure they are arranged correctly (search for >> 'postgres column tetris'), and move unused and older partitions elsewhere - >> like a separate cheaper Postgres server, or something offline. >> >> Cheers, >> Greg >> >> Rearranging the table columns by typlen desc, didn't give much storage space reduction. So, I was trying TOAST compression by creating the table from scratch and using the LZ4 algorithm defined on the column definition level just for varchar type columns , as it seems this compression only works for varchar and text columns. And the table had 7 columns defined as varchar out of total 12 columns. I write the DDL something as below Column1 varchar(50) compression(lz4) not null However , when i loaded the table using INSERT AS SELECT from the main table(which is uncompressed one) , i see the size of the compressed table remains same and also i applied the function "pg_column_compression()" to see if any column value is compressed using lz4, it returns all "null", which means not compressed. So it seems the compression does not apply for the rows inserted using "CTAS" or "INSERT AS SELECT". Does that mean it is only applicable for the row by row inserts but not batch inserts(which a bigger system normally will have)? I was not expecting this though, so it was disappointing. Regards Lok
Users and object privileges maintenance
Hello All, We were having past experience in Oracle and are newly getting moved to postgres database. In there we have schema which alternately also called as Users and the super user was sys/system through which the dev team never login but only DBA does. And DBA used to create some functional users to which write access is provided to the table/view when they are created through Grant command. These functional users are used by applications to do the DML etc. Other users are given read only privileges only so as to not do DML but just view the data. Again these privileges are not given directly to the objects but given through roles for better maintenance and control purposes. We had some users also which were not supposed to see some sensitive attribute, so we created views(by excluding the sensitive column) on top of those and only gave the read-only access to the views but not to the underlying tables. So I just wanted to understand if these grants and privileges for objects/users are given and maintained in a similar way in postgres database Or any different strategy is needed. And if it's different in RDS Postgres as compared to opensource one? If any document to follow for the same. Regards Lok
Re: Question on Table creation
*"1)In this situation , do we have to drop the "Schema_Owner" and recreate it with all small letters? And then create the schema with small letters again?"* As per above question goes, I believe OP is not required to drop and recreate but has to just Rename the user something as below and that would remove the case sensitiveness making all lower case. and the same thing can be done for the schema too. ALTER USER "*Schema_Owner*" RENAME TO *Schema_Owner* ; ALTER schema "Schema_Name" RENAME TO Schema_Name;
Re: Creating table and indexes for new application
My 2 cents... Foreign key indexes are required for avoiding locking when deleting and updating the pk in parent. But insert only table may not have any issue. And we used to do this in other relational databases like oracle but not sure how different is this in PG. However considering your high transaction volume it might have some negative impact on your data inserts having foreign keys on the txn tables and also related indexes in first place I think. So better evaluate. Others may comment on this. On Fri, 23 Feb, 2024, 2:05 am yudhi s, wrote: > Hello Friends, > We are newly creating tables for a system which is going to consume > transaction data from customers and store in postgres version 15+ database. > And it would be ~400million transactions/rows per day in the main > transaction table and almost double in the multiple child tables and some > child tables will hold lesser records too. > > Say TABLE1 is the main or parent transaction table which will hold > ~400million transactions. > TABLE2 is the child table which is going to hold ~800million rows/day. It > has one to many relationships with table1. > TABLe3 is another child table which is going to hold ~200million rows per > day. > > We are considering all of these tables for partitioning by the same > transaction_date column and it would be daily partitions. We have some > questions, > > 1)While creating these tables and related indexes, do we need to be > careful of defining any other storage parameters like tablespaces etc Or > its fine to make those table/indexes aligned to the default tablespace > only? and are there any constraints on tablespace size , as we will have > 100's GB of data going to be stored in each of the daily partitions? > > 2)Should we be creating composite indexes on each foreign key for table2 > and table3, because any update or delete on parent is going to take lock on > all child tables? > > 3)We were thinking of simple Btree indexes to be created on the columns > based on the search criteria of the queries. but the indexes doc i see in > postgres having INCLUDE keywords also available in them. So I'm struggling > to understand a bit, how it's adding value to the read query performance if > those additional columns are not added explicitly to the index but are part > of the INCLUDE clause? Will it give some benefit in regards to the index > storage space? or should we always keep all the columns in the index > definition itself other than some exception scenario? Struggling to > understand the real benefit of the INCLUDE clause. > > Below is a sample DDL of what it will look like. > > Create table syntax:- > > CREATE TABLE TABLE1 > ( > TXN_ID varchar(36) NOT NULL , > txn_timestamp date NOT NULL , > CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp) > ) partition by range (txn_timestamp); > > > CREATE TABLE TABLE2 > ( > table2_id varchar(36) not null, > TXN_ID varchar(36) NOT NULL , > txn_timestamp date NOT NULL , > CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp) > ) partition by range (txn_timestamp); > > alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references > table1(TXN_ID,txn_timestamp); > Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key > > CREATE TABLE TABLE3 > ( > table3_id varchar(36) not null, > TXN_ID varchar(36) NOT NULL , > txn_timestamp date NOT NULL , > CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp) > ) partition by range (txn_timestamp); > > alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references > table1(TXN_ID,txn_timestamp); > Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign > key. > > Thanks And Regards > Yudhi >
Re: Creating table and indexes for new application
On Sun, 25 Feb, 2024, 1:05 am yudhi s, wrote: > > On Fri, Feb 23, 2024 at 5:26 PM sud wrote: > >> >> >> On Fri, 23 Feb, 2024, 1:28 pm yudhi s, >> wrote: >> >>> >>> >>> On Fri, 23 Feb, 2024, 1:20 pm sud, wrote: >>> On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, wrote: > On Fri, 2024-02-23 at 02:05 +0530, yudhi s > > > 2)Should we be creating composite indexes on each foreign key for > table2 and table3, because > > any update or delete on parent is going to take lock on all child > tables? > > Every foreign key needs its own index. A composite index is only > appropriate if the foreign > key spans multiple columns. > > From the DDL which OP posted it's using composite foreign key thus a composite index would be needed. However, if someone doesn't delete or update the parent table PK , is it still advisable to have all the FK indexed? Like in general I think transaction id should not get updated in a normal scenario unless some special case. >>> Thank you. I can double check if we have confirmed use case of deleting >>> the parent table or updating PK in the parent table. But anyway it can >>> happen for data fix for sure in some scenario. >>> >>> But yes, we are certainly going to drop/purge partition from all the >>> parent and child table after specific days. So isn't that need the FK to be >>> indexed or else it will scan whole parent table partition? >>> >> >> >> I am not sure if drop partition of parent table, will have a lock or will >> do a full scan on the child table while doing the partition maintenance or >> dropping the partitions, in absence of foreign key index. Others may >> comment here. >> >>> > Can you please help me understand, If it's true that all the statements > like Delete, Update and Drop partition of parent table will take lock on > the child table and Full scan the child table , and thus foreign key index > on all the child table is necessary irrespective of the performance > overhead it has on all the INSERT queries into the child tables? > Not sure of the lock but I think you should be able to see a full scan on child table while doing delete or update of parent table PK. Explain Analyze should show that I believe. Not sure if explain analyze will work for drop partition too. >
Re: Is partition pruning impacted by data type
On Tue, Mar 5, 2024 at 1:09 AM sud wrote: > > However the question we have is , > 1)If there is any downside of having the partition key with "timestamp > with timezone" type? Will it impact the partition pruning of the queries > anyway by appending any run time "time zone" conversion function during the > query planning/execution phase? > 2) As it will take the default server times , so during daylight saving > the server time will change, so in that case, can it cause any unforeseen > issue? > 3)Will this cause the data to be spread unevenly across partitions and > make the partitions unevenly sized? If will go for UTC/GMT as db time, the > user's one day transaction might span across two daily partitions. > > My 2 cents. We have cases which use the "timestamp with timezone" column as partition key and the partition pruning happens for the read queries without any issue, so we don't see any conversion functions applied to the predicate as such which is partition key. I think if the users go global it's better to have the database time in UTC time zone. and it's obvious that, In case of global users the data ought to be span across multiple days as the days won't be as per the users time zone rather UTC.
Re: Not able to purge partition
On Sun, Mar 24, 2024 at 12:38 AM veem v wrote: > On Sat, 23 Mar 2024 at 23:08, Laurenz Albe > wrote: > >> On Sat, 2024-03-23 at 22:41 +0530, veem v wrote: >> > 1)As we see having foreign key defined is making the detach partition >> run >> > for minutes(in our case 5-10minutes for 60 million rows partition), so >> > how to make the parent table partition detach and drop work fast in >> such >> > a scenario while maintaining the foreign key intact? >> >> I told you: don't do it. >> Instead, use foreign keys between the partitions. >> > > I am struggling to understand how to maintain those partitions then? As > because we were planning to use pg_partman for creating and dropping > partitions automatically without much hassle. So do you mean to say do the > partition maintenance(create/drop) by creating our own jobs and not to use > the pg_partman extension for this. > > Say for example in our case the parent table has 3-4 child table and all > are partitioned on same keys/columns, so how we can identify the child > partitions and then create all foreign keys to the respective parent table > partitions and attach those partitions to parent table and also make this > process automated? Appreciate any guidance on this. > > Actually, using pg_partman was taking care of everything starting from > creating partitions with different names and creating respective indexes, > constraints also with different names for each partitions without us being > worrying anything about those. > This appears to be a major issue , if it's taking minutes for dropping the parent table partitions and not allowing read operation during that time on the child table by taking locks on them. We have many databases in Oracle with such referential key constraints existing on partitioned tables and we were planning to move those to postgres. I think in Oracle, they were by default created partition to partition without need to check the whole table or all the child table partitions while dropping the parent partitions .
Cron not running
Hello All, In RDS postgres(version 15.4), we have scheduled partition maintenance through pg_partman and it's scheduled through pg_cron as below. The same script has been executed in dev and test environments, and we are seeing the cron job is scheduled in both environments because we see one record in table cron.job. But we are not seeing any entries in cron.job_run_details for the test database and also partitions are not getting created/dropped as we have expected on the test database. So it means it's not running in one of the databases. Wondering why it is so, if we are missing anything? How to debug this issue ,find the cause if it's failing internally for some reason and fix it ? select partman.create_parent( p_parent_table := 'TAB1', p_control := 'part_col', p_type := 'native', p_interval := '1 day', p_premake := 5, p_start_partition => '2024-02-01 00:00:00' ); update partman.part_config set infinite_time_partitions = 'true' , retention = '1 months', retention_keep_table='true', retention_keep_index='true' where parent_table = 'TAB1'; SELECT cron.schedule('@daily',partman.run_maintenance()); Regards Lok
Issue with date/timezone conversion function
Hi All, It's version 15.4 of postgresql database. Every "date/time" data type attribute gets stored in the database in UTC timezone only. One of the support persons local timezone is "asia/kolkata" and that support person needs to fetch the count of transactions from a table- transaction_tab and share it with another person/customer who is in the EST timezone, so basically the transaction has to be shown or displayed the EST timezone. We are using below three queries for displaying each 15 minutes , hourly and daily interval transaction counts from the table based on the create_timestamp column (in which data is stored in UTC timezone in the database but it has to be displayed to the user in EST timezone). These tables are INSERT only tables and the data in the create_timestamp column is populated using the now() function from the application, which means it will always be incremental, and the historical day transaction count is going to be the same. However surprisingly the counts are changing each day when the user fetches the result using the below query. So my question was , if there is any issue with the way we are fetching the data and it's making some date/time shift which is why the transaction count looks to be changing even for the past days data? And also somehow this conversion function "DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST')" is showing time in CST but not EST, why so? **For fetching 15 minutes interval transaction count SELECT DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') + (EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE 'EST') / 15 * 15) * INTERVAL '15 minute' AS sample_time_interval, COUNT(*) AS count_1 from transaction_tab WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day' GROUP BY DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') + (EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE 'EST') / 15 * 15) * INTERVAL '15 minute' ORDER BY sample_time_interval; **For fetching hourly interval transaction count SELECT DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') AS sample_time_interval, COUNT(*) AS count_1 from transaction_tab WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day' GROUP BY DATE_TRUNC('hour', create_timestamp AT TIME ZONE 'EST') ORDER BY sample_time_interval; **For fetching daily interval transaction count SELECT DATE_TRUNC('day', create_timestamp AT TIME ZONE 'EST') AS sample_time_interval, COUNT(*) AS count_1 from transaction_tab WHERE create_timestamp > CURRENT_TIMESTAMP - INTERVAL '10 day' GROUP BY DATE_TRUNC('day', create_timestamp AT TIME ZONE 'EST') ORDER BY sample_time_interval;
Re: Issue with date/timezone conversion function
On Tue, Apr 9, 2024 at 9:26 PM Adrian Klaver wrote: > On 4/9/24 08:43, Lok P wrote: > > Hi All, > > It's version 15.4 of postgresql database. Every "date/time" data type > > attribute gets stored in the database in UTC timezone only. One of the > > support persons local timezone is "asia/kolkata" and that support > > person needs to fetch the count of transactions from a table- > > transaction_tab and share it with another person/customer who is in the > > EST timezone, so basically the transaction has to be shown or displayed > > the EST timezone. > > What is the datatype for the create_timestamp? > > What does SHOW timezone; return on the server? > > > Thank you for the quick response. The column data type for "create_timestamp" is "timestamptz'. Show timezone from the support users client machine UI showing "Asia/Calcutta". Not having access to run "Show timezone" on the server currently, I will try to get it. output from pg_setting showing setting as "Asia/Calcutta', reset_val as "Asia/Calcutta", boot_val as "GMT"
Re: Issue with date/timezone conversion function
On Tue, Apr 9, 2024 at 10:33 PM Tom Lane wrote: > Lok P writes: > > These tables are INSERT only tables and the data in the create_timestamp > > column is populated using the now() function from the application, which > > means it will always be incremental, and the historical day transaction > > count is going to be the same. However surprisingly the counts are > changing > > each day when the user fetches the result using the below query. So my > > question was , if there is any issue with the way we are fetching the > data > > and it's making some date/time shift which is why the transaction count > > looks to be changing even for the past days data? > > Well, your cutoff time "CURRENT_TIMESTAMP - INTERVAL '10 day'" is > constantly moving, so that'd account for shifts in what's perceived > to belong to the oldest day. Maybe you want "CURRENT_DATE - 10" > instead? > > > And also somehow this > > conversion function "DATE_TRUNC('hour', create_timestamp AT TIME ZONE > > 'EST')" is showing time in CST but not EST, why so? > > 'EST' is going to rotate to UTC-5, but that's probably not what > you want in the summer. I'd suggest AT TIME ZONE 'America/New_York' > or the like. See > > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES > > regards, tom lane > Thank you so much. You are correct. The AT TIME ZONE 'America/New_York' is giving correct EST time conversion. But I think regarding why it looks to be shifting i.e. the same time duration appears to be holding a different count of transactions while the base table is not getting updated/inserted/deleted for its historical create_timestamps, I suspect the below conversion part. The task is to count each ~15minutes duration transaction and publish in ordered fashion i.e. something as below, but the way it's been written seems wrong. It's an existing script. It first gets the date component with truncated hour and then adds the time component to it to make it ~15minutes interval. Can it be written in some simple way? 9-apr-2024 14:00 12340 9-apr-2024 14:15 12312 9-apr-2024 14:30 12323 9-apr-2024 14:45 12304 *DATE_TRUNC('hour', create_timestamp AT TIME ZONE '*America/New_York'*) +* *(EXTRACT(MINUTE FROM create_timestamp AT TIME ZONE '*America/New_York*') / 15 * 15) * INTERVAL '15 minute'*
Logging statement having any threat?
Hello All, Its postgres version 15.4 and its RDS, in which our dev team gets the infrastructure code from another third party team which provides us base infrastructure code to build a postgres database, in which we will be able to do change DB parameter values etc whatever is mentioned in the file with possible values. But surprisingly we don't see log_statement there. Below was our requirement, For debugging and evaluating performance we were having pg_stat_statements but it contains aggregated information about all the query execution. But in case just want to debug any point in time issues where the selected few queries were performing bad (may be because of plan change), we were planning to have the auto_explain extension added and set the log_min_duration to ~5 seconds, So that, all the queries going above that time period(5 seconds) will be logged and provide detailed information on the exact point of bottleneck. But we see the log_statement parameter has been removed from the base infrastructure script/terraform script given by the database team here, so that means we will get it as default which is "NONE", which means no statement(SELECT/DML/DDL etc) can be logged. Now when we reach out to the infrastructure team , they are saying these variables(pg_cluster_log_statement,pg_instance_log_statement) were removed due to potential security threat. So I want to understand from experts here , how this is really a security threat and if any option to get this logging enabled (which will help us debug performance issues) at same time addressing the threat too? Regards Lok
Re: Logging statement having any threat?
On Sat, Apr 20, 2024 at 10:02 PM Adrian Klaver wrote: > > Have you tried?: > > > https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT > > " > log_statement (enum) > ><...> > > The default is none. Only superusers and users with the appropriate SET > privilege can change this setting. > " > > Or > > > https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SET > > set_config ( setting_name text, new_value text, is_local boolean ) → text > > > > > > Now when we reach out to the infrastructure team , they are saying these > > variables(pg_cluster_log_statement,pg_instance_log_statement) were > > Where are those variables coming from? I can not find them in RDS or > Terraform docs. > > Thank You Adrian. Actually I was trying to understand if the auto_explain can only work and help us see the slow sql statements in the log, only after we set the "log_statement" parameter to non default values (like all, mod, ddl)? And what is the exact threat with the logging these queries , and i think ,I got the point as you mentioned , having access to database itself is making someone to see the object details, however do you agree that in case of RDS logs are available through different mediums like cloud watch, data dog agent etc , so that may pose additional threats as because , may be some person doesn't have access to database directly but still having permission to see the logs, so the appropriate access control need to put in place? And additionally I was trying to execute the "SELECT set_config('log_statement', 'all', true);" but it says "*permission denied to set parameter "log_statement*".".So might be it needs a higher privileged user to run it. To answer your question on the variable those we have on the terraform module, the terraform module is customized by the database infra team so that might be why we are seeing those there which may not be exactly the same as its showing in RDS docs for postgres. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_LogAccess.Concepts.PostgreSQL.html
How you make efficient design for CDC and book marking
Hello, My understanding is that the replication tools normally rely on the database transaction logs to find the CDC/delta changes(Insert/Update/Delete) for tables and then move those delta changes to the target system/databases. Whatever may be the source database (It might be open source postgres or aws RDS). And never done though, but i am assuming , manually scanning the DB logs must not be easy ones and also might not be given access to DB logs because of security reasons too. Hope my understanding is correct here. Thus, in absence of such replication tool(may be because of the additional cost associated etc) if someone wants to find the delta changes (insert/update/delete) in a database as efficiently as possible and move those to the target database in a continuous data streaming setup, I can only think of below option i.e maintaining audit columns like create_timestamp/update_timestamp columns in every source table so that they can be utilized to get the CDC for Insert and Update statements and also for bookmarking. But to find the delta for the deletes , there is not much option but to have row level triggers created on the base table which will populate another audit table with the deleted rows, and this is going to crawl if we get a lot of deletes(in millions) on the source tables. Want to know from experts, if there exists any other way to have these manual CDC and book marking more efficient for such continuous delta data movement scenarios? Regards Lok
Re: Question on roles and privileges
For the initial installation the extensions may need superuser privileges. On Fri, May 10, 2024 at 10:04 AM yudhi s wrote: > Hello All, > We want to make sure to keep minimal privileges for the users based on > their roles and responsibility. We have one user group who will be working > on analyzing/debugging into performance issues in the databases. Basically > this group will be operating on extensions like apg_plan_management, > pg_hint_plan, auto_explain, plprofiler, pg_repack. So these extensions will > already be installed for the group, but they will just need to use those > appropriately. For example pg_hint_plan will not need any write privilege > because the user just has to put the hint in the query and run it to see > any performance variation. > > So like that , what kind of minimal privileges will each of these > extensions need to make them work for this performance group? Basically if > any of these will need write privilege or all works can be performed using > Readonly roles/privilege only? > > And I understand pg_monitor role wraps up most of the key read only > privileges within it to work on performance issues and also its a readonly > privilege only. So I wanted to know from experts here , if it's true and > pg_monitor role will suffice for all the above work? > > Regards > Yudhi >
Creating big indexes
Hello, We have a few tables having size ~5TB and are partitioned on a timestamp column. They have ~90 partitions in them and are storing 90 days of data. We want to create a couple of indexes on those tables. They are getting the incoming transactions(mainly inserts) 24/7 , which are mostly happening on the current day/live partition. Its RDS postgres version 15.4. So in this situation Should we go with below i.e one time create index command on the table.. CREATE INDEX CONCURRENTLY idx1 ON tab(column_name); Or create index on individual partitions from different sessions, say for example create indexes on 30 partitions each from three different sessions so as to finish all the 90 partitions faster? CREATE INDEX CONCURRENTLY idx1 ON tab_part1(column_name); CREATE INDEX CONCURRENTLY idx1 ON tab_part2(column_name); . . Basically I have three questions: 1)If we can do this index creation activity online without impacting the incoming transactions or do we have to take down time for this activity? 2)If we can't do it online then , what is the fastest method to do this index creation activity ? 3)Should we change the DB parameters in a certain way to make the process faster? We have currently set below parameters max_parallel_workers-16 max_parallel_maintenance_workers-2 maintenance_work_mem- 4GB Regards Lok
Re: How to create efficient index in this scenario?
On Sat, Jun 8, 2024 at 7:03 PM veem v wrote: > Hi , > It's postgres version 15.4. A table is daily range partitioned on a column > transaction_timestamp. It has a unique identifier which is the ideal for > primary key (say transaction_id) , however as there is a limitation in > which we have to include the partition key as part of the primary key, so > it has to be a composite index. Either it has to be > (transaction_id,transaction_timestamp) or ( transaction_timestamp, > transaction_id). But which one should we go for, if both of the columns get > used in all the queries? > > We will always be using transaction_timestamp as mostly a range predicate > filter/join in the query and the transaction_id will be mostly used as a > join condition/direct filter in the queries. So we were wondering, which > column should we be using as a leading column in this index? > > There is a blog below (which is for oracle), showing how the index should > be chosen and it states , "*Stick the columns you do range scans on last > in the index, filters that get equality predicates should come first.* ", > and in that case we should have the PK created as in the order > (transaction_id,transaction_timestamp). It's because making the range > predicate as a leading column won't help use that as an access predicate > but as an filter predicate thus will read more blocks and thus more IO. > Does this hold true in postgres too? > > https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ > I believe the analogy holds true here in postgres too and the index in this case should be on (transaction_id, transaction_timestamp). > > > Additionally there is another scenario in which we have the requirement to > have another timestamp column (say create_timestamp) to be added as part of > the primary key along with transaction_id and we are going to query this > table frequently by the column create_timestamp as a range predicate. And > ofcourse we will also have the range predicate filter on partition key > "transaction_timestamp". But we may or may not have join/filter on column > transaction_id, so in this scenario we should go for > (create_timestamp,transaction_id,transaction_timestamp). because > "transaction_timestamp" is set as partition key , so putting it last > doesn't harm us. Will this be the correct order or any other index order is > appropriate? > > > In this case , the index should be on ( create_timestamp,transaction_id,transaction_timestamp), considering the fact that you will always have queries with "create_timestamp" as predicate and may not have transaction_id in the query predicate.
Re: Creating big indexes
On Sun, Jun 9, 2024 at 10:36 AM sud wrote: > > You can first create the index on the table using the "On ONLY"keyword, > something as below. > > CREATE INDEX idx ON ONLY tab(col1); > > Then create indexes on each partition in "concurrently" from multiple > sessions in chunks. > > CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1); > CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1); > > After this step finishes the table level index which was created in the > first step will be in valid state automatically. > > Thank you so much. Should we also tweak the parameters related to the parallelism and memory as I mentioned in the first post?
Re: Creating big indexes
On Sun, Jun 9, 2024 at 10:39 AM Lok P wrote: > > > On Sun, Jun 9, 2024 at 10:36 AM sud wrote: > >> >> You can first create the index on the table using the "On ONLY"keyword, >> something as below. >> >> CREATE INDEX idx ON ONLY tab(col1); >> >> Then create indexes on each partition in "concurrently" from multiple >> sessions in chunks. >> >> CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1); >> CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1); >> >> After this step finishes the table level index which was created in the >> first step will be in valid state automatically. >> >> > Thank you so much. > Should we also tweak the parameters related to the parallelism and memory > as I mentioned in the first post? > Additionally ,is it also possible to drop the indexes also from the big partition table efficiently? To avoid the "transaction id wrap around" or "table bloat" when the index drop runs for longer duration?
Design strategy for table with many attributes
Hello, In one of the applications we are getting transactions in messages/events format and also in files and then they are getting parsed and stored into the relational database. The number of attributes/columns each transaction has is ~900+. Logically they are part of one single transaction and should be stored in one table as one single row. There will be ~500million such transactions each day coming into the system. And there will be approx ~10K peak write TPS and 5K read TPS in target state. This system has a postgres database as a "source of truth" or OLTP store. And then data moves to snowflakes for the olap store. Initially when the system was designed the number of attributes per transaction was <100 but slowly the business wants to keep/persist other attributes too in the current system and the number of columns keep growing. However, as worked with some database systems , we get few suggestions from DBA's to not have many columns in a single table. For example in oracle they say not to go beyond ~255 columns as then row chaining and row migration type of things are going to hunt us. Also we are afraid concurrent DMLS on the table may cause this as a contention point. So I wanted to understand , in such a situation what would be the best design approach we should use irrespective of databases? Or say, what is the maximum number of columns per table we should restrict? Should we break the single transaction into multiple tables like one main table and other addenda tables with the same primary key to join and fetch the results wherever necessary? Regards Lok
Re: Design strategy for table with many attributes
On Fri, Jul 5, 2024 at 1:26 AM David G. Johnston wrote: > On Thu, Jul 4, 2024 at 12:38 PM Lok P wrote: > >> >> Should we break the single transaction into multiple tables like one main >> table and other addenda tables with the same primary key to join and fetch >> the results wherever necessary? >> >> > I would say yes. Find a way to logically group sets of columns together > and place those groups into separate tables. I'd also be looking for cases > where multiple columns really should be multiple rows. This is not > uncommon. > > David J. > > Thank you David. As you said, to logically break this into multiple tables so i believe it means it should be such that there will be no need to query multiple tables and join them most of the time for fetching the results. It should just fetch the results from one table at any point in time. But do you also suggest keeping those table pieces related to each other through the same primary key ? Won't there be a problem when we load the data like say for example , in normal scenario the data load will be to one table but when we break it to multiple tables it will happen to all the individual pieces, won't that cause additional burden to the data load? Also I understand the technical limitation of the max number of columns per table is ~1600. But should you advise to restrict/stop us to some low number long before reaching that limit , such that we will not face any anomalies when we grow in future. And if we should maintain any specific order in the columns from start to end column in the specific table?
Re: Design strategy for table with many attributes
On Fri, Jul 5, 2024 at 10:45 AM Guyren Howe wrote: > On Jul 4, 2024, at 22:07, Lok P wrote: > > If you stick to the principle of grouping columns in a table when you use > those columns together, you should be good. > > Note that you might want to split up the “parent” table if that naturally > groups its columns together for certain uses. In that case, you could have > the same pk on all the 1:1 tables you then have. In that case, the pk for > each of those tables is also the fk. > Thank you. When you said below, *"Note that you might want to split up the “parent” table if that naturally groups its columns together for certain uses. In that case, you could have the same pk on all the 1:1 tables you then have. In that case, the pk for each of those tables is also the fk."* Do you mean having a real FK created through DDL and maintaining it or just assume it and no need to define it for all the pieces/tables. Only keep the same PK across all the pieces and as we know these are related to the same transaction and are logically related?
Re: Design strategy for table with many attributes
Some folks in the team suggested to have key business attributes or say frequently queried attributes in individual columns and others in a column in same table clubbed in JSON format. Is that advisable or any issues can occur with this approach? Also not sure how effectively postgres processes JSON (both read and write perspective) as compared to normal column in a oltp environment. Please advise. As David suggested it breaks if a row exceeds the 8k limit I. E a single page size , will that still holds true if we have a column with JSON in it? On Fri, 5 Jul, 2024, 12:04 pm Guyren Howe, wrote: > On Jul 4, 2024, at 23:28, Lok P wrote: > > > > *"Note that you might want to split up the “parent” table if that > naturally groups its columns together for certain uses. In that case, you > could have the same pk on all the 1:1 tables you then have. In that case, > the pk for each of those tables is also the fk."* > Do you mean having a real FK created through DDL and maintaining it or > just assume it and no need to define it for all the pieces/tables. Only > keep the same PK across all the pieces and as we know these are related to > the same transaction and are logically related? > > > A primary key *names something*. Often it’s a kind of platonic > representation of a real thing — say, a person. > > I might use a person’s login columns in some functions, and the person’s > name, birth date, etc in other functions. > > Rather than have one table, I should split this into two, but use the same > primary key (I would either name both id or both, say, person_id, > irrespective of the name of the table, so it’s clear you’re doing this). > > You can just do a join on the mutual primary keys as you’d expect. In > fact, if you name them the same, you can just use NATURAL JOIN. > > So you’d have person_details and person_login tables, and have a person_id > pk for both. >
Re: Design strategy for table with many attributes
On Fri, 5 Jul, 2024, 1:44 pm David Rowley, wrote: > On Fri, 5 Jul 2024 at 19:53, Lok P wrote: > > As David suggested it breaks if a row exceeds the 8k limit I. E a single > page size , will that still holds true if we have a column with JSON in it? > > You wouldn't be at risk of the same tuple length problem if you > reduced the column count and stored the additional details in JSON. > Each varlena column is either stored in the tuple inline, or toasted > and stored out of line. Out of line values need an 18-byte pointer to > the toasted data. That pointer contributes to the tuple length. > > > David > Got it. Thank you very much. So there would be performance overhead with JSON and we need to validate that carefully, if at all going in that direction. However out of curiosity, if the roasted/compressed component or column which is JSON itself goes beyond 8k post compression, will it break then?
Partition boundary messed up
Hello All, We normally operate on UTC timezone so we normally create partitions in UTC timezone so that each day partition starts from today's midnight UTC to next day's midnight UTC. The script looks something like below. And also that way reference partition tables are also created in a similar way. Say for example the 29th august partition in parent partition reference to 29th august child partition as FK. This was working fine even for partition maintenance(mainly while dropping historical partitions) too without any issue. For one of the newly introduced partition tables, by mistake somebody created the initial partition definition without setting the timezone to UTC and all the partitions created are in different time zones. And as this table refers to another partition table (which is the parent and have partitions created in UTC timezone) the rows are spanning across two partitions and it's breaking the partition maintenance process while we try to drop the historical partition. Now the issue is that the newly introduced table already has billions of rows pumped into it spanning across 40 partitions. So is there an easy way to just alter the partition boundary to make it UTC midnight to midnight range? or The only way is to create a new table from scratch with the correct partition boundary in UTC timezone and then move the data and then create foreign key on that(which I believe is going to take a lot of time too)? Another thing we noticed, it shows initial partitions having boundaries in NON UTC (which we understand because of the missing timezone syntax) but then suddenly the subsequent partitions are getting created UTC too, not sure how it happened. And I believe it will create issues while rows come into the database which falls in between these ranges? Wondering if there is any easy way to correct this mess now? Note- This is postgres version 15+. Below partition creation script we use:- set timesozne='UTC'; SELECT partman.create_parent( p_parent_table := 'schema1.tab1', p_control := 'TRAN_DATE', p_type := 'native', p_interval := '1 day', p_premake := 90, p_start_partition => '2024-02-15 00:00:00' ); UPDATE partman.part_config SET infinite_time_partitions = 'true' , premake=20 WHERE parent_table = 'schema1.tab1'; CALL partman.run_maintenance_proc(); Below details i fetched from pg_class for the table which is messed up:- Partition_name Partition Expressions TAB1_p2024_08_29 FOR VALUES FROM ('2024-08-29 00:00:00+05:30') TO ('2024-08-30 00:00:00+05:30') TAB1_p2024_08_30 FOR VALUES FROM ('2024-08-30 00:00:00+05:30') TO ('2024-08-31 00:00:00+05:30') TAB1_p2024_08_31 FOR VALUES FROM ('2024-08-31 00:00:00+05:30') TO ('2024-09-01 00:00:00+05:30') TAB1_p2024_09_01 FOR VALUES FROM ('2024-09-01 00:00:00+05:30') TO ('2024-09-02 00:00:00+05:30') *TAB1_p2024_09_02 FOR VALUES FROM ('2024-09-02 00:00:00+05:30') TO ('2024-09-03 00:00:00+05:30')* *TAB1_p2024_09_03 FOR VALUES FROM ('2024-09-03 05:30:00+05:30') TO ('2024-09-04 05:30:00+05:30')*TAB1_p2024_09_04 FOR VALUES FROM ('2024-09-04 05:30:00+05:30') TO ('2024-09-05 05:30:00+05:30') TAB1_p2024_09_05 FOR VALUES FROM ('2024-09-05 05:30:00+05:30') TO ('2024-09-06 05:30:00+05:30') TAB1_p2024_09_06 FOR VALUES FROM ('2024-09-06 05:30:00+05:30') TO ('2024-09-07 05:30:00+05:30') Regards Lok
Re: Partition boundary messed up
Thank you. I understand that is going to take a lot of time as we already have billions of rows in the main table spread across 40+ partitions. Also this table is child to another parent partition table and so, it will take a lot of time to validate the FK constraint back for the new table. Is there a less disruptive way(zero to minimal downtime) possible to fix this mess? On Sat, Jul 27, 2024 at 2:08 PM yudhi s wrote: > If you were thinking of some ALTER command which will just alter the > boundaries of the incorrect partition and make it correct , I don't think > there exists any such. You may have to create a whole new table and run the > partman.create_parent block with the UTC time zone set so that all the > subsequent partitions will be created with correct boundaries and then move > the data into it from the existing table. > > >
Column type modification in big tables
Hello all, We have a postgres table which is a range partitions on a timestamp column having total size ~3TB holding a total ~5billion rows spanning across ~150 daily partitions and there are ~140+columns in the table. Also this table is a child to another partition table. And we have partition creation handled through pg_partman extension on this table. We have a requirement of modifying the existing column lengths as below . So doing it directly through a single alter command will probably scan and rewrite the whole table which may take hours of run time. So trying to understand from experts what is the best possible way to tackle such changes in postgres database? And if any side effects we may see considering this table being child to another one and also dependency with pg_partman extension. two of the columns from varchar(20) to numeric(3) one of the columns from varchar(20) to varchar(2) one of the columns from Number(10,2) to Numeric(8,2) Regards Lok
Re: Column type modification in big tables
On Wed, Aug 7, 2024 at 4:51 PM sud wrote: > > > Others may correct but i think, If you don't have the FK defined on these > columns you can do below. > > > --Alter table add column which will be very fast within seconds as it will > just add it to the data dictionary. > > ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3), new_column2 > varchar2(3); > > > *-- Back populate the data partition wise and commit, if it's really > needed* > > UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), > new_column2 = CAST(old_column2 AS varchar2(3)) ; > commit; > UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), > new_column2 = CAST(old_column2 AS varchar2(3)) ; > commit; > UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), > new_column2 = CAST(old_column2 AS varchar2(3)) ; > commit; > . > > > *--Alter table drop old columns which will be very fast within seconds as > it will just drop it from the data dictionary.* > ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; > Thank you so much. I understand this will be the fastest possible way to achieve the column modification. But talking about the dropped column which will be sitting in the table and consuming storage space, Is it fine to leave as is or auto vacuum will remove the column values behind the scene and also anyway , once those partitions will be purged they will be by default purged. Is this understanding correct? And also will this have any impact on the partition maintenance which is currently done by pg_partman as because the template table is now different internally(not from outside though). Will it cause conflict because of those dropped columns from the main table?
Re: Column type modification in big tables
On Thu, Aug 8, 2024 at 1:06 AM sud wrote: > > > On Wed, Aug 7, 2024 at 5:00 PM Lok P wrote: > >> >> >> On Wed, Aug 7, 2024 at 4:51 PM sud wrote: >> >>> >>> >>> Others may correct but i think, If you don't have the FK defined on >>> these columns you can do below. >>> >>> >>> --Alter table add column which will be very fast within seconds as it >>> will just add it to the data dictionary. >>> >>> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3), new_column2 >>> varchar2(3); >>> >>> >>> *-- Back populate the data partition wise and commit, if it's really >>> needed* >>> >>> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), >>> new_column2 = CAST(old_column2 AS varchar2(3)) ; >>> commit; >>> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), >>> new_column2 = CAST(old_column2 AS varchar2(3)) ; >>> commit; >>> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)), >>> new_column2 = CAST(old_column2 AS varchar2(3)) ; >>> commit; >>> . >>> >>> >>> *--Alter table drop old columns which will be very fast within seconds >>> as it will just drop it from the data dictionary.* >>> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2; >>> >> >> >> >> Thank you so much. >> >> I understand this will be the fastest possible way to achieve the column >> modification. >> >> But talking about the dropped column which will be sitting in the table >> and consuming storage space, Is it fine to leave as is or auto vacuum will >> remove the column values behind the scene and also anyway , once those >> partitions will be purged they will be by default purged. Is this >> understanding correct? >> >> And also will this have any impact on the partition maintenance which is >> currently done by pg_partman as because the template table is now different >> internally(not from outside though). Will it cause conflict because of >> those dropped columns from the main table? >> > > I think leaving the table as is after the dropping column will be fine for > you because your regular partition maintenance/drop will slowly purge the > historical partitions and eventually they will be removed. But if you > update those new columns with the old column values, then autovacuum should > also take care of removing the rows with older column values (which are > dead actually) . > > Not sure if pg_partman will cause any issue ,as because the table now has > the column data type/length changed. Others may confirm. > Thank you so much. Can anybody suggest any other possible way here. As, we also need to have the existing values be updated to the new column value here using update command (even if it will update one partition at a time). And as I see we have almost all the values in the column not null, which means it will update almost ~5billion rows across all the partitions. So my question is , is there any parameter(like work_mem,maintenance_work_mem etc) which we can set to make this update faster? or any other way to get this column altered apart from this method? >
Re: Column type modification in big tables
On Fri, Aug 9, 2024 at 2:06 AM Greg Sabino Mullane wrote: > On Thu, Aug 8, 2024 at 2:39 PM Lok P wrote: > >> Can anybody suggest any other possible way here. >> > > Sure - how about not changing the column type at all? > > > one of the columns from varchar(20) to varchar(2) > > ALTER TABLE foobar ADD CONSTRAINT twocharplease CHECK (length(mycol) <= 2) > NOT VALID; > > > one of the columns from Number(10,2) to Numeric(8,2) > > ALTER TABLE foobar ADD CONSTRAINT eightprecision CHECK (mycol <= 10^8) NOT > VALID; > > > two of the columns from varchar(20) to numeric(3) > > This one is trickier, as we don't know the contents, nor why it is going > to numeric(3) - not a terribly useful data type, but let's roll with it and > assume the stuff in the varchar is a number of some sort, and that we don't > allow nulls: > > ALTER TABLE foobar ADD CONSTRAINT onekorless CHECK (mycol::numeric(3) is > not null) NOT VALID; > > You probably want to check on the validity of the existing rows: see the > docs on VALIDATE CONSTRAINT here: > > https://www.postgresql.org/docs/current/sql-altertable.html > > > Thank you so much. Will definitely try to evaluate this approach. The Only concern I have is , as this data is moving downstream with exactly the same data type and length , so will it cause the downstream code to break while using this column in the join or filter criteria. Also I believe the optimizer won't be able to utilize this information while preparing the execution plan. Another thing , correct me if wrong, My understanding is , if we want to run the "validate constraint" command after running this "check constraint with not valid" command, this will do a full table scan across all the partitions , but it's still beneficial as compared to updating the columns values for each rows. Correct me if I'm wrong.
Re: Column type modification in big tables
On Fri, Aug 9, 2024 at 9:19 PM Greg Sabino Mullane wrote: > On Fri, Aug 9, 2024 at 6:39 AM Lok P wrote: > >> Thank you so much. Will definitely try to evaluate this approach. The >> Only concern I have is , as this data is moving downstream with exactly the >> same data type and length , so will it cause the downstream code to break >> while using this column in the join or filter criteria. Also I believe the >> optimizer won't be able to utilize this information while preparing the >> execution plan. >> > > Yes, this is not as ideal as rewriting the table, but you asked for > another approaches :) As to the impact of your downstream stuff, I think > you have to try and see. Not clear what you mean by the optimizer, it's not > going to really care about numeric(10) versus numeric(8) or varchar(20) vs > varchar(2). It's possible the varchar -> numeric could cause issues, but > without real-world queries and data we cannot say. > > >> Another thing , correct me if wrong, My understanding is , if we want >> to run the "validate constraint" command after running this "check >> constraint with not valid" command, this will do a full table scan across >> all the partitions , but it's still beneficial as compared to updating the >> columns values for each rows. Correct me if I'm wrong. >> > > Yes, it needs to scan the entire table, but it's a lightweight lock, won't > block concurrent access, will not need to detoast, and makes no table or > index updates. Versus an entire table rewrite which will do heavy locking, > take up tons of I/O, update all the indexes, and generate quite a lot of > WAL. > > Thank you so much Greg. Considering the option, if we are able to get large down time to get this activity done. Some teammates suggested altering the column with "USING" Clause. I am not really able to understand the difference, also when i tested on a simple table, it seems the "USING" clause takes more time as compared to normal ALTER. But again I don't see any way to see the progress and estimated completion time. Can you share your thoughts on this? ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING mycol::NUMERIC(3); VS ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ; * Another thing also comes to my mind whether we should just create a new partition table(say new_part_table) from scratch from the DDL of the existing table(say old_part_table) and then load the data into it using command (insert into new_part_table.. select..from old_part_table). Then create indexes and constraints etc, something as below. Will this approach be faster/better as compared to the simple "alter table alter column approach" as above, considering we will have 4-6 hours of downtime for altering three different columns on this ~5TB table? *-- Steps* Create table exactly same as existing partition table but with the modified column types/lengths. drop indexes ; (Except PK and FK indexes may be..) drop constraints; insert into new_part_table (...) select (...) from old_part_table; create indexes concurrently ; create constraints; (But this table is also a child table to another partition table, so creating the foreign key may be resource consuming here too). drop the old_part_table; rename the new_part_table to old_part_table; rename all the partitions; VACUUM old_part_table ; ANALYZE old_part_table ;
Re: Column type modification in big tables
On Sat, Aug 10, 2024 at 5:47 PM sud wrote: > > > My 2cents. > If you have enough time then from a simplicity point of view, your single > line alter command may look good, but how are you going to see the amount > of progress it has made so far and how much time it's going to take to > finish. And you got ~6hrs of down time but if it fails at 5th hour then you > will be in a bad position. > > > Does it mean that , if we get enough downtime then , we should rather go with the option of recreating the table from scratch and populating the data from the existing table and then rename it back? It does look more complicated considering many steps like creating indexes, constraints back and renaming it and then running vacuum and analyze etc. Can someone through some light , in case we get 5-6hrs downtime for this change , then what method should we choose for this Alter operation?
Re: Column type modification in big tables
On Tue, Aug 13, 2024 at 7:39 PM Greg Sabino Mullane wrote: > On Sat, Aug 10, 2024 at 5:06 PM Lok P wrote: > > >> Can someone through some light , in case we get 5-6hrs downtime for this >> change , then what method should we choose for this Alter operation? >> > > We can't really answer that. Only you know what resources you have, what > risk/reward you are willing to handle, and how long things may take. For > that latter item, your best bet is to try this out on the same/similar > hardware and see how long it takes. Do a smaller table and extrapolate if > you need to. Or promote one of your replicas offline and modify that. I've > given you a low-risk / medium-reward option with check constraints, but for > the ALTER TABLE options you really need to try it and see (on non-prod). > > *"Do a smaller table and extrapolate if you need to. Or promote one of your replicas offline and modify that. I've given you a low-risk / medium-reward option with check constraints, but for the ALTER TABLE options you really need to try it and see (on non-prod)."* Is there any possible method(maybe by looking into the data dictionary tables/views etc) to see the progress of the Alter statement by which we can estimate the expected completion time of the "Alter" command? I understand pg_stat_activity doesn't show any completion percentage of a statement, but wondering if by any other possible way we can estimate the amount of time it will take in prod for the completion of the ALTER command.
Re: Column type modification in big tables
On Tue, Aug 13, 2024 at 7:39 PM Greg Sabino Mullane wrote: > On Sat, Aug 10, 2024 at 5:06 PM Lok P wrote: > > >> Can someone through some light , in case we get 5-6hrs downtime for this >> change , then what method should we choose for this Alter operation? >> > > We can't really answer that. Only you know what resources you have, what > risk/reward you are willing to handle, and how long things may take. For > that latter item, your best bet is to try this out on the same/similar > hardware and see how long it takes.* Do a smaller table and extrapolate > if you need to. * > Hello Greg, In terms of testing on sample data and extrapolating, as i picked the avg partition sizeof the table (which is ~20GB) and i created a non partitioned table with exactly same columns and populated with similar data and also created same set of indexes on it and the underlying hardware is exactly same as its on production. I am seeing it's taking ~5minutes to alter all the four columns on this table. So we have ~90 partitions in production with data in them and the other few are future partitions and are blank. (Note- I executed the alter with "work_mem=4GB, maintenance_work_mem=30gb, max_parallel_worker_per_gather=8, max_parallel_maintenance_worker =16" ) So considering the above figures , can i safely assume it will take ~90*5minutes= ~7.5hours in production and thus that many hours of downtime needed for this alter OR do we need to consider any other factors or activity here?
Re: Column type modification in big tables
On Thu, Aug 15, 2024 at 9:18 PM Alban Hertroys wrote: > > > On 15 Aug 2024, at 14:15, Lok P wrote: > > (…) > Are all those partitions critical, or only a relative few? > > If that’s the case, you could: > 1) detach the non-critical partitions > 2) take the system down for maintenance > 3) update the critical partitions > 4) take the system up again > 5) update the non-critical partitions > 6) re-attach the non-critical partitions > > That could shave a significant amount of time off your down-time. I would > script the detach and re-attach processes first, to save some extra. > > Thank you so much. The partition table which we are planning to apply the ALTER script is a child table to another big partition table. And we have foreign key defined on table level but not partition to partition. So will detaching the partitions and then altering column of each detached partition and then re-attaching will revalidate the foreign key again? If that is the case then the re-attaching partition step might consume a lot of time. Is my understanding correct here?
Re: Column type modification in big tables
On Fri, Aug 16, 2024 at 2:04 AM Lok P wrote: > > On Thu, Aug 15, 2024 at 9:18 PM Alban Hertroys wrote: > >> >> > On 15 Aug 2024, at 14:15, Lok P wrote: >> >> (…) >> Are all those partitions critical, or only a relative few? >> >> If that’s the case, you could: >> 1) detach the non-critical partitions >> 2) take the system down for maintenance >> 3) update the critical partitions >> 4) take the system up again >> 5) update the non-critical partitions >> 6) re-attach the non-critical partitions >> >> That could shave a significant amount of time off your down-time. I would >> script the detach and re-attach processes first, to save some extra. >> >> > Thank you so much. > > The partition table which we are planning to apply the ALTER script is a > child table to another big partition table. And we have foreign key > defined on table level but not partition to partition. So will detaching > the partitions and then altering column of each detached partition and then > re-attaching will revalidate the foreign key again? If that is the case > then the re-attaching partition step might consume a lot of time. Is my > understanding correct here? > Additionally , if we are okay with the 7.5hrs of down time , is my calculation/extrapolation of total time consumption based on a sample table, for direct alter, accurate? Because, in that case , I was thinking it's less complex and also less error prone to just do it in a single alter command rather than going for multiple steps of detach, alter, attach partition.
Re: Column type modification in big tables
On Thu, 15 Aug, 2024, 9:18 pm Alban Hertroys, wrote: > > > On 15 Aug 2024, at 14:15, Lok P wrote: > > (…) > > > Hello Greg, > > > > In terms of testing on sample data and extrapolating, as i picked the > avg partition sizeof the table (which is ~20GB) and i created a non > partitioned table with exactly same columns and populated with similar data > and also created same set of indexes on it and the underlying hardware is > exactly same as its on production. I am seeing it's taking ~5minutes to > alter all the four columns on this table. So we have ~90 partitions in > production with data in them and the other few are future partitions and > are blank. (Note- I executed the alter with "work_mem=4GB, > maintenance_work_mem=30gb, max_parallel_worker_per_gather=8, > max_parallel_maintenance_worker =16" ) > > > > So considering the above figures , can i safely assume it will take > ~90*5minutes= ~7.5hours in production and thus that many hours of downtime > needed for this alter OR do we need to consider any other factors or > activity here? > > Are all those partitions critical, or only a relative few? > > If that’s the case, you could: > 1) detach the non-critical partitions > 2) take the system down for maintenance > 3) update the critical partitions > 4) take the system up again > 5) update the non-critical partitions > 6) re-attach the non-critical partitions > > That could shave a significant amount of time off your down-time. I would > script the detach and re-attach processes first, to save some extra. > > Admittedly, I haven’t actually tried that procedure, but I see no reason > why it wouldn’t work. > > Apart perhaps, from inserts happening that should have gone to some of > those detached partitions. Maybe those could be sent to a ‘default’ > partition that gets detached at step 7, after which you can insert+select > those from the default into the appropriate partitions? > > But you were going to test that first anyway, obviously. > We were checking this strategy , but what we found is while attaching any of the historical partition back to the child table , if there runs any existing inserts on the other live partitions of the same child table that attach keeps on hang state. Also during this period the parent table (which is also partitioned) takes an exclusive lock on itself!! Even detaching any partition "concurrently" also waits for any inserts to finish, even those are on other partitions. Is this behavior expected?
Faster data load
Hi, We are having a requirement to create approx 50 billion rows in a partition table(~1 billion rows per partition, 200+gb size daily partitions) for a performance test. We are currently using ' insert into select.. From or ;' method . We have dropped all indexes and constraints First and then doing the load. Still it's taking 2-3 hours to populate one partition. Is there a faster way to achieve this? Few teammate suggesting to use copy command and use file load instead, which will be faster. So I wanted to understand, how different things it does behind the scenes as compared to insert as select command? As because it only deals with sql engine only. Additionally, when we were trying to create indexes post data load on one partition, it took 30+ minutes. Any possible way to make it faster? Is there any way to drive the above things in parallel by utilizing full database resources? It's postgres 15.4 Regards Lok
Re: Faster data load
On Fri, 6 Sept, 2024, 9:20 am Muhammad Usman Khan, wrote: > Hi, > > You can use pg_partman. If your table is partitioned, you can manage > partitions in parallel by distributing the load across partitions > concurrently. Or you can use citus. It can be an excellent solution, > especially for handling large data volumes and parallelizing data operations > Thank you. The tables are partitioned. Also during index creation we are trying to do it multiple partitions at same time from multiple sessions.But seeing out of memory error in 5th or 6th session. And even each sessions taking 30mins per partitions for index creation. Attach index partitions happening in seconds though. > >
Re: How batch processing works
On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer wrote: > > > Thank you so much. > > I was expecting method-3(batch insert) to be the fastest or atleast as > you said > > perform with similar speed as method-2 (row by row insert with batch > commit) > > Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest. > I guess I meant to write "method2 takes about twice as long as method3" > or something like that. > > As in case of batch insert below is the fastest one as it inserts multiple rows in one statement. Similarly I understand, Delete can be batched as below. However, can you suggest how an Update can be batched in a simple/generic fashion in JDBC for an input data stream with multiple input values. As because for an update if we write as below , it will just do one row update at a time? Update SET column1=?, column2=?, column3=? where =? ; INSERT INTO VALUES (1, 'a'), (2, 'a'),(3,'a'); Delete from where column_name in (, ,...);
Re: How batch processing works
On Sat, Sep 21, 2024 at 9:51 AM Michał Kłeczek wrote: > Hi, > > > On 19 Sep 2024, at 07:30, Lok P wrote: > > > [snip] > > > > Method-4 > > > > INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); > > INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); > > commit; > > I’ve done some batch processing of JSON messages from Kafka in Java. > By far the most performant way was to: > > 1. Use prepared statements > 2. Parse JSON messages in Postgres > 3. Process messages in batches > > All three can be achieved by using arrays to pass batches: > > WITH parsed AS ( > SELECT msg::json FROM unnest(?) > ), > parents AS ( > INSERT INTO parent SELECT … FROM parsed RETURNING ... > ) > INSERT INTO child SELECT … FROM parsed… > > Not the single parameter that you can bind to String[] > > Hope that helps. > > Got your point. But wondering why we don't see any difference in performance between method-2 and method-3 above. So does it mean that,I am testing this in a wrong way or it's the expected behaviour and thus there is no meaning in converting the row by row inserts into a bulk insert, but just changing the commit frequency will do the same job in a row by row insert approach?
Re: How batch processing works
On Sun, Sep 22, 2024 at 12:46 AM Adrian Klaver wrote: > On 9/21/24 07:36, Peter J. Holzer wrote: > > On 2024-09-21 16:44:08 +0530, Lok P wrote: > > > > --- > > #!/usr/bin/python3 > > > > import time > > import psycopg2 > > > > num_inserts = 10_000 > > batch_size = 50 > > > > db = psycopg2.connect() > > csr = db.cursor() > > > > csr.execute("drop table if exists parent_table") > > csr.execute("create table parent_table (id int primary key, t text)") > > db.commit() > > > > start_time = time.monotonic() > > for i in range(1, num_inserts+1): > > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > > if i % batch_size == 0: > > db.commit() > > db.commit() > > end_time = time.monotonic() > > elapsed_time = end_time - start_time > > print(f"Method 2: Individual Inserts with Commit after {batch_size} > Rows: {elapsed_time:.3} seconds") > > > > # vim: tw=99 > > > --- > > FYI, this is less of problem with psycopg(3) and pipeline mode: > > import time > import psycopg > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg.connect("dbname=test user=postgres host=104.237.158.68") > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > with db.pipeline(): > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit > after {batch_size} Rows: {elapsed_time:.3} seconds") > > > For remote to a database in another state that took the time from: > > Method 2: Individual Inserts with Commit after 50 Rows: 2.42e+02 seconds > > to: > > Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after > 50 Rows: 9.83 seconds > > > #!/usr/bin/python3 > > > > import itertools > > import time > > import psycopg2 > > > > num_inserts = 10_000 > > batch_size = 50 > > > > db = psycopg2.connect() > > csr = db.cursor() > > > > csr.execute("drop table if exists parent_table") > > csr.execute("create table parent_table (id int primary key, t text)") > > db.commit() > > > > start_time = time.monotonic() > > batch = [] > > for i in range(1, num_inserts+1): > > batch.append((i, 'a')) > > if i % batch_size == 0: > > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > > params = list(itertools.chain.from_iterable(batch)) > > csr.execute(q, params) > > db.commit() > > batch = [] > > if batch: > > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > > csr.execute(q, list(itertools.chain(batch))) > > db.commit() > > batch = [] > > > > end_time = time.monotonic() > > elapsed_time = end_time - start_time > > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each > batch: {elapsed_time:.3} seconds") > > > > # vim: tw=99 > > > --- > > The above can also be handled with execute_batch() and execute_values() > from: > > https://www.psycopg.org/docs/extras.html#fast-execution-helpers > > > > > On my laptop, method2 is about twice as fast as method3. But if I > > connect to a database on the other side of the city, method2 is now more > > than 16 times faster than method3 . Simply because the delay in > > communication is now large compared to the time it takes to insert those > > rows. > > > > > > Thank you. So if I get it correct, if the client app(from which the data is getting streamed/inserted) is in the same data center/zone as the database (which is most of the time the case) then the batch insert does not appear to be much beneficial. Which also means , people here were afraid of having triggers in such a high dml table as because this will make the "batch insert" automatically converted into "row by row" behind the scene, but considering the above results, it looks fine to go with a row by row approach (but just having batch commit in place in place of row by row commit). And not to worry about implementing the true batch insert approach as that is not making a big difference here in data load performance.
Re: How batch processing works
On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer wrote: > On 2024-09-21 16:44:08 +0530, Lok P wrote: > > But wondering why we don't see any difference in performance between > method-2 > > and method-3 above. > > The code runs completely inside the database. So there isn't much > difference between a single statement which inserts 50 rows and 50 > statements which insert 1 row each. The work to be done is (almost) the > same. > > This changes once you consider an application which runs outside of the > database (maybe even on a different host). Such an application has to > wait for the result of each statement before it can send the next one. > Now it makes a difference whether you are waiting 50 times for a > statement which does very little or just once for a statement which does > more work. > > > So does it mean that,I am testing this in a wrong way or > > That depends on what you want to test. If you are interested in the > behaviour of stored procedures, the test is correct. If you want to know > about the performance of a database client (whether its written in Java, > Python, Go or whatever), this is the wrong test. You have to write the > test in your target language and run it on the client system to get > realistic results (for example, the round-trip times will be a lot > shorter if the client and database are on the same computer than when > one is in Europe and the other in America). > > For example, here are the three methods as Python scripts: > > > --- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 1: Individual Inserts with Commit after every Row: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --- > #!/usr/bin/python3 > > import itertools > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > batch = [] > for i in range(1, num_inserts+1): > batch.append((i, 'a')) > if i % batch_size == 0: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > params = list(itertools.chain.from_iterable(batch)) > csr.execute(q, params) > db.commit() > batch = [] > if batch: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > csr.execute(q, list(itertools.chain(batch))) > db.commit() > batch = [] > > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each > batch: {elapsed_time:.3} seconds") > > # vim: tw=99 > > --- > > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more
Re: Grants not working on partitions
On Sat, Sep 28, 2024 at 8:46 PM Adrian Klaver wrote: > On 9/28/24 04:02, Lok P wrote: > > Hi, > > While we are creating any new tables, we used to give SELECT privilege > > on the newly created tables using the below command. But we are seeing > > now , in case of partitioned tables even if we had given the privileges > > in the same fashion, the user is not able to query specific partitions > > but only the table. Commands like "select * from > > schema1. " are erroring out with the "insufficient > > privilege" error , even if the partition belongs to the same table. > > > > Grant SELECT ON to ; > > > > Grant was seen as a one time command which needed while creating the > > table and then subsequent partition creation for that table was handled > > by the pg_partman extension. But that extension is not creating or > > copying any grants on the table to the users. We were expecting , once > > the base table is given a grant , all the inherited partitions will be > > automatically applied to those grants. but it seems it's not working > > that way. So is there any other way to handle this situation? > > > The docs are there for a reason: > > > https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md#child-table-property-inheritance > > "Privileges & ownership are NOT inherited by default. If enabled by > pg_partman, note that this inheritance is only at child table creation > and isn't automatically retroactive when changed (see > reapply_privileges()). Unless you need direct access to the child > tables, this should not be needed. You can set the inherit_privileges > option if this is needed (see config table information below)." > > > And: > > "reapply_privileges( > p_parent_table text > ) > RETURNS void > > This function is used to reapply ownership & grants on all child > tables based on what the parent table has set. > Privileges that the parent table has will be granted to all child > tables and privileges that the parent does not have will be revoked > (with CASCADE). > Privileges that are checked for are SELECT, INSERT, UPDATE, DELETE, > TRUNCATE, REFERENCES, & TRIGGER. > Be aware that for large partition sets, this can be a very long > running operation and is why it was made into a separate function to run > independently. Only privileges that are different between the parent & > child are applied, but it still has to do system catalog lookups and > comparisons for every single child partition and all individual > privileges on each. > p_parent_table - parent table of the partition set. Must be schema > qualified and match a parent table name already configured in pg_partman. > " > > > Thank you. I was not aware about this function which copies the grants from parent to child ,so we can give a call to this function at the end of the pg_partman job call which is happening through the cron job. But I see , the only issue is that this function only has one parameter "p_parent_table" but nothing for "child_table" and that means it will try to apply grants on all the childs/partitions which have been created till today and may already be having the privileges already added in them. And we have just ~60 partitions in most of the table so hope that will not take longer but considering we create/purge one partition daily for each partition table using the pg_partman, every time we give it a call, it will try to apply/copy the grants on all the partitions(along with the current day live partition), will it cause the existing running queries on the live partitions to hard parse? or say will it cause any locking effect when it will try to apply grant on the current/live partitions , which must be inserted/updated/deleted data into or being queries by the users?
Grants not working on partitions
Hi, While we are creating any new tables, we used to give SELECT privilege on the newly created tables using the below command. But we are seeing now , in case of partitioned tables even if we had given the privileges in the same fashion, the user is not able to query specific partitions but only the table. Commands like "select * from schema1. " are erroring out with the "insufficient privilege" error , even if the partition belongs to the same table. Grant SELECT ON to ; Grant was seen as a one time command which needed while creating the table and then subsequent partition creation for that table was handled by the pg_partman extension. But that extension is not creating or copying any grants on the table to the users. We were expecting , once the base table is given a grant , all the inherited partitions will be automatically applied to those grants. but it seems it's not working that way. So is there any other way to handle this situation? In other databases(say like Oracle) we use to create standard "roles"(Read_role, Write_role etc..) and then provide grants to the user through those roles. And the objects were given direct grants to those roles. Similarly here in postgres we were granting "read" or "write" privileges on objects to the roles and letting the users login to the database using those roles and thus getting all the read/write privileges assigned to those roles. Are we doing anything wrong? Regards Lok
Re: How batch processing works
Below are the results for the posted methods. Tested it on local and it gave no difference in timing between the method-2 andmethod-3. Failed to run in dbfiddle somehow. Also I was initially worried if adding the trigger to the our target table, will worsen the performance as because , it will make all the execution to "row by row" rather a true batch insert(method-3 as posted) as there will be more number of context switches , but it seems it will still be doing the batch commits(like the way its in method-2). So as per that , we won't lose any performance as such. Is this understanding correct? *Method-1- 00:01:44.48* *Method-2- 00:00:02.67* *Method-3- 00:00:02.39* https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6 On Thu, Sep 19, 2024 at 6:42 PM Lok P wrote: > > > On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson > wrote: > >> On Thu, Sep 19, 2024 at 5:24 AM Lok P wrote: >> >>> >>> >>> [snip] >> >>> DO $$ >>> DECLARE >>> num_inserts INTEGER := 10; >>> batch_size INTEGER := 50; >>> start_time TIMESTAMP; >>> end_time TIMESTAMP; >>> elapsed_time INTERVAL; >>> i INTEGER; >>> BEGIN >>> -- Method 1: Individual Inserts with Commit after every Row >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table VALUES (i, 'a'); >>> COMMIT; >>> END LOOP; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 1: Individual Inserts with Commit after every Row', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 2: Individual Inserts with Commit after 100 Rows >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table2 VALUES (i, 'a'); >>> -- Commit after every 100 rows >>> IF i % batch_size = 0 THEN >>> COMMIT; >>> END IF; >>> END LOOP; >>> >>> -- Final commit if not already committed >>>commit; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 3: Batch Inserts with Commit after all >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..(num_inserts / batch_size) LOOP >>> INSERT INTO parent_table3 VALUES >>> (1 + (i - 1) * batch_size, 'a'), >>> >> [snip] >> >>> (49 + (i - 1) * batch_size, 'a'), >>> (50 + (i - 1) * batch_size, 'a')); >>> COMMIT; >>> END LOOP; >>> >>> COMMIT; -- Final commit for all >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >>> end_time, elapsed_time); >>> >>> END $$; >>> >> >> Reproduce what behavior? >> >> Anyway, plpgsql functions (including anonymous DO statements) are -- to >> Postgresql -- single statements. Thus, they'll be faster than >> individual calls.. >> >> An untrusted language like plpython3u might speed things up even more, if >> you have to read a heterogeneous external file and insert all the records >> into the db. >> > > Here if you see my script , the method-1 is doing commit after each row > insert. And method-2 is doing a batch commit i.e. commit after every "50" > row. And method-3 is doing a true batch insert i.e. combining all the 50 > values in one insert statement and submitting to the database in oneshot > and then COMMIT it, so the context switching will be a lot less. So I was > expecting Method-3 to be the fastest way to insert the rows here, but the > response time shows the same response time for Method-2 and method-3. > Method-1 is the slowest through. >
Re: How batch processing works
On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson wrote: > On Thu, Sep 19, 2024 at 5:24 AM Lok P wrote: > >> >> >> [snip] > >> DO $$ >> DECLARE >> num_inserts INTEGER := 10; >> batch_size INTEGER := 50; >> start_time TIMESTAMP; >> end_time TIMESTAMP; >> elapsed_time INTERVAL; >> i INTEGER; >> BEGIN >> -- Method 1: Individual Inserts with Commit after every Row >> start_time := clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table VALUES (i, 'a'); >> COMMIT; >> END LOOP; >> >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 1: Individual Inserts with Commit after every Row', >> start_time, end_time, elapsed_time); >> >> -- Method 2: Individual Inserts with Commit after 100 Rows >> start_time := clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table2 VALUES (i, 'a'); >> -- Commit after every 100 rows >> IF i % batch_size = 0 THEN >> COMMIT; >> END IF; >> END LOOP; >> >> -- Final commit if not already committed >>commit; >> >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >> start_time, end_time, elapsed_time); >> >> -- Method 3: Batch Inserts with Commit after all >> start_time := clock_timestamp(); >> >> FOR i IN 1..(num_inserts / batch_size) LOOP >> INSERT INTO parent_table3 VALUES >> (1 + (i - 1) * batch_size, 'a'), >> > [snip] > >> (49 + (i - 1) * batch_size, 'a'), >> (50 + (i - 1) * batch_size, 'a')); >> COMMIT; >> END LOOP; >> >> COMMIT; -- Final commit for all >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >> end_time, elapsed_time); >> >> END $$; >> > > Reproduce what behavior? > > Anyway, plpgsql functions (including anonymous DO statements) are -- to > Postgresql -- single statements. Thus, they'll be faster than > individual calls.. > > An untrusted language like plpython3u might speed things up even more, if > you have to read a heterogeneous external file and insert all the records > into the db. > Here if you see my script , the method-1 is doing commit after each row insert. And method-2 is doing a batch commit i.e. commit after every "50" row. And method-3 is doing a true batch insert i.e. combining all the 50 values in one insert statement and submitting to the database in oneshot and then COMMIT it, so the context switching will be a lot less. So I was expecting Method-3 to be the fastest way to insert the rows here, but the response time shows the same response time for Method-2 and method-3. Method-1 is the slowest through.
How batch processing works
Hello, Saw multiple threads around the same , so I want some clarification. As we know row by row is slow by slow processing , so in heavy write systems(say the client app is in Java) , people asked to do DMLS in batches rather in a row by row fashion to minimize the chatting or context switches between database and client which is resource intensive. What I understand is that , a true batch processing means the client has to collect all the input bind values and prepare the insert statement and submit to the database at oneshot and then commit. What it means actually and if we divide the option as below, which method truly does batch processing or there exists some other method for doing the batch processing considering postgres as backend database? I understand, the first method below is truly a row by row processing in which context switches happen between client and database with each row, whereas the second method is just batching the commits but not a true batch processing as it will do the same amount of context switching between the database and client. But regarding the third and fourth method, will both execute in a similar fashion in the database with the same number of context switches? If any other better method exists to do these inserts in batches? Appreciate your guidance. CREATE TABLE parent_table ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE child_table ( id SERIAL PRIMARY KEY, parent_id INT REFERENCES parent_table(id), value TEXT ); Method-1 insert into parent_table values(1,'a'); commit; insert into parent_table values(2,'a'); commit; insert into child_table values(1,1,'a'); Commit; insert into child_table values(1,2,'a'); commit; VS Method-2 insert into parent_table values(1,'a'); insert into parent_table values(2,'a'); insert into child_table values(1,1,'a'); insert into child_table values(1,2,'a'); Commit; VS Method-3 with a as ( insert into parent_table values(1,'a') ) , a1 as (insert into parent_table values(2,'a') ) , b as (insert into child_table values(1,1,'a') ) , b1 as (insert into child_table values(1,2,'a') ) select; commit; Method-4 INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); commit; Regards Lok
Re: How batch processing works
On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson wrote: > > [snip] > >> >> Method-4 >> >> INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); >> INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); >> commit; >> > > If I knew that I had to load a structured input data file (even if it had > parent and child records), this is how I'd do it (but probably first try > and see if "in-memory COPY INTO" is such a thing). > > I was trying to reproduce this behaviour using row by row commit vs just batch commit vs true batch insert as you mentioned, i am not able to see any difference between "batch commit" and "true batch insert" response. Am I missing anything? CREATE TABLE debug_log ( method1 TEXT, start_time TIMESTAMP, end_time TIMESTAMP, elapsed_time INTERVAL ); CREATE TABLE parent_table ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE parent_table2 ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE parent_table3 ( id SERIAL PRIMARY KEY, name TEXT ); DO $$ DECLARE num_inserts INTEGER := 10; batch_size INTEGER := 50; start_time TIMESTAMP; end_time TIMESTAMP; elapsed_time INTERVAL; i INTEGER; BEGIN -- Method 1: Individual Inserts with Commit after every Row start_time := clock_timestamp(); FOR i IN 1..num_inserts LOOP INSERT INTO parent_table VALUES (i, 'a'); COMMIT; END LOOP; end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 1: Individual Inserts with Commit after every Row', start_time, end_time, elapsed_time); -- Method 2: Individual Inserts with Commit after 100 Rows start_time := clock_timestamp(); FOR i IN 1..num_inserts LOOP INSERT INTO parent_table2 VALUES (i, 'a'); -- Commit after every 100 rows IF i % batch_size = 0 THEN COMMIT; END IF; END LOOP; -- Final commit if not already committed commit; end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', start_time, end_time, elapsed_time); -- Method 3: Batch Inserts with Commit after all start_time := clock_timestamp(); FOR i IN 1..(num_inserts / batch_size) LOOP INSERT INTO parent_table3 VALUES (1 + (i - 1) * batch_size, 'a'), (2 + (i - 1) * batch_size, 'a'), (3 + (i - 1) * batch_size, 'a'), (4 + (i - 1) * batch_size, 'a'), (5 + (i - 1) * batch_size, 'a'), (6 + (i - 1) * batch_size, 'a'), (7 + (i - 1) * batch_size, 'a'), (8 + (i - 1) * batch_size, 'a'), (9 + (i - 1) * batch_size, 'a'), (10 + (i - 1) * batch_size, 'a'), (11 + (i - 1) * batch_size, 'a'), (12 + (i - 1) * batch_size, 'a'), (13 + (i - 1) * batch_size, 'a'), (14 + (i - 1) * batch_size, 'a'), (15 + (i - 1) * batch_size, 'a'), (16 + (i - 1) * batch_size, 'a'), (17 + (i - 1) * batch_size, 'a'), (18 + (i - 1) * batch_size, 'a'), (19 + (i - 1) * batch_size, 'a'), (20 + (i - 1) * batch_size, 'a'), (21 + (i - 1) * batch_size, 'a'), (22 + (i - 1) * batch_size, 'a'), (23 + (i - 1) * batch_size, 'a'), (24 + (i - 1) * batch_size, 'a'), (25 + (i - 1) * batch_size, 'a'), (26 + (i - 1) * batch_size, 'a'), (27 + (i - 1) * batch_size, 'a'), (28 + (i - 1) * batch_size, 'a'), (29 + (i - 1) * batch_size, 'a'), (30 + (i - 1) * batch_size, 'a'), (31 + (i - 1) * batch_size, 'a'), (32 + (i - 1) * batch_size, 'a'), (33 + (i - 1) * batch_size, 'a'), (34 + (i - 1) * batch_size, 'a'), (35 + (i - 1) * batch_size, 'a'), (36 + (i - 1) * batch_size, 'a'), (37 + (i - 1) * batch_size, 'a'), (38 + (i - 1) * batch_size, 'a'), (39 + (i - 1) * batch_size, 'a'), (40 + (i - 1) * batch_size, 'a'), (41 + (i - 1) * batch_size, 'a'), (42 + (i - 1) * batch_size, 'a'), (43 + (i - 1) * batch_size, 'a'), (44 + (i - 1) * batch_size, 'a'), (45 + (i - 1) * batch_size, 'a'), (46 + (i - 1) * batch_size, 'a'), (47 + (i - 1) * batch_size, 'a'), (48 + (i - 1) * batch_size, 'a'), (49 + (i - 1) * batch_size, 'a'), (50 + (i - 1) * batch_size, 'a')); COMMIT; END LOOP; COMMIT; -- Final commit for all end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 3: Ba