Running concurrent txns and measuring the timings in Postgres
Hi, Is there a standard procedure to execute two or more concurrent txns at the same time? I understand that if we want to run concurrent txns, we need to execute them from different psql sessions. But how do we make sure that they begin execution almost at the same time. Also, I'm interested in measuring the time taken across all executing txns, i.e. the time from the start of the earliest txns till the end of the last txn. Best, -SB
Re: Running concurrent txns and measuring the timings in Postgres
> It would help to know what problem you are trying to solve? Multiple txns are inserting tuples into a table concurrently. Wanted to measure the total time taken to complete the insertion process. Some txns overlap with others on the tuples they insert. Duplicate tuples are not inserted. On Wed, Jul 24, 2019 at 1:58 PM Adrian Klaver wrote: > On 7/24/19 9:54 AM, Souvik Bhattacherjee wrote: > > Hi, > > > > Is there a standard procedure to execute two or more concurrent txns at > > the same time? I understand that if we want to run concurrent txns, we > > need to execute them from different psql sessions. But how do we make > > sure that they begin execution almost at the same time. > > Well different sessions be they psql or some other client. That would be > the difficulty, determining what is submitting the transaction. > > > > > Also, I'm interested in measuring the time taken across all executing > > txns, i.e. the time from the start of the earliest txns till the end of > > the last txn. > > It would help to know what problem you are trying to solve? > > > > > Best, > > -SB > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Running concurrent txns and measuring the timings in Postgres
> The duplicate elimination is being handled by ON CONFLICT or some custom > process in the code generating the transactions? Yes, we used ON CONFLICT for that. Thanks btw. > If the transactions are being created from a single app/script could you > not just use 'timing' to mark the beginning of the transactions and the > end and record that somewhere(db table and/or file)? So did you mean to say that I need to get the timestamps of the beginning/end of the txn since \timing only produces elapsed time? Surely that would solve the problem but I'm not sure how to get that done in Postgres. I wanted to check to see if there are simpler ways to get this done in Postgres before trying out something similar to Rob's suggestion or yours. On Wed, Jul 24, 2019 at 4:12 PM Adrian Klaver wrote: > On 7/24/19 12:22 PM, Souvik Bhattacherjee wrote: > > > It would help to know what problem you are trying to solve? > > > > Multiple txns are inserting tuples into a table concurrently. Wanted to > > measure > > the total time taken to complete the insertion process. Some txns > > overlap with > > others on the tuples they insert. Duplicate tuples are not inserted. > > The duplicate elimination is being handled by ON CONFLICT or some custom > process in the code generating the transactions? > > If the transactions are being created from a single app/script could you > not just use 'timing' to mark the beginning of the transactions and the > end and record that somewhere(db table and/or file)? > > > > > On Wed, Jul 24, 2019 at 1:58 PM Adrian Klaver > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 7/24/19 9:54 AM, Souvik Bhattacherjee wrote: > > > Hi, > > > > > > Is there a standard procedure to execute two or more concurrent > > txns at > > > the same time? I understand that if we want to run concurrent > > txns, we > > > need to execute them from different psql sessions. But how do we > > make > > > sure that they begin execution almost at the same time. > > > > Well different sessions be they psql or some other client. That > > would be > > the difficulty, determining what is submitting the transaction. > > > > > > > > Also, I'm interested in measuring the time taken across all > > executing > > > txns, i.e. the time from the start of the earliest txns till the > > end of > > > the last txn. > > > > It would help to know what problem you are trying to solve? > > > > > > > > Best, > > > -SB > > > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Running concurrent txns and measuring the timings in Postgres
>Well it depends on the part you have not filled in, what client(s) you > are using and how the transactions are being generated? Using a psql client and txns are generated manually at this point. Each txn is stored separately in a .sql file and are fired from different psql sessions, if that helps. On Wed, Jul 24, 2019 at 4:44 PM Adrian Klaver wrote: > On 7/24/19 1:42 PM, Souvik Bhattacherjee wrote: > > > The duplicate elimination is being handled by ON CONFLICT or some > custom > > > process in the code generating the transactions? > > > > Yes, we used ON CONFLICT for that. Thanks btw. > > > > > If the transactions are being created from a single app/script could > you > > > not just use 'timing' to mark the beginning of the transactions and > the > > > end and record that somewhere(db table and/or file)? > > > > So did you mean to say that I need to get the timestamps of the > > beginning/end > > of the txn since \timing only produces elapsed time? Surely that would > > solve the > > problem but I'm not sure how to get that done in Postgres. > > > > I wanted to check to see if there are simpler ways to get this done in > > Postgres > > before trying out something similar to Rob's suggestion or yours. > > > > Well it depends on the part you have not filled in, what client(s) you > are using and how the transactions are being generated? > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Running concurrent txns and measuring the timings in Postgres
I got this thing running and hopefully works as expected. The txns are stored in insert_txn1.sql, insert_txn2.sql, ... Please let me know if you find any issues with this. Script is attached. On Wed, Jul 24, 2019 at 5:11 PM Adrian Klaver wrote: > On 7/24/19 1:52 PM, Souvik Bhattacherjee wrote: > > >Well it depends on the part you have not filled in, what client(s) you > > > are using and how the transactions are being generated? > > > > Using a psql client and txns are generated manually at this point. Each > > txn is > > stored separately in a .sql file and are fired from different psql > > sessions, if that > > helps. > > > > A quick demo: > > psql -d production -U postgres -c "\timing" -c "select line_id, category > from avail_headers order by line_id;" > > Timing is on. > Time: 0.710 ms > > > On Wed, Jul 24, 2019 at 4:44 PM Adrian Klaver > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 7/24/19 1:42 PM, Souvik Bhattacherjee wrote: > > > > The duplicate elimination is being handled by ON CONFLICT or > > some custom > > > > process in the code generating the transactions? > > > > > > Yes, we used ON CONFLICT for that. Thanks btw. > > > > > > > If the transactions are being created from a single app/script > > could you > > > > not just use 'timing' to mark the beginning of the > > transactions and the > > > > end and record that somewhere(db table and/or file)? > > > > > > So did you mean to say that I need to get the timestamps of the > > > beginning/end > > > of the txn since \timing only produces elapsed time? Surely that > > would > > > solve the > > > problem but I'm not sure how to get that done in Postgres. > > > > > > I wanted to check to see if there are simpler ways to get this > > done in > > > Postgres > > > before trying out something similar to Rob's suggestion or yours. > > > > > > > Well it depends on the part you have not filled in, what client(s) > you > > are using and how the transactions are being generated? > > > > > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > concur_txn_timing.sql Description: application/sql
Bulk Inserts
Hi, I'm trying to measure the performance of the following: Multiple txns inserting tuples into a table concurrently vs single txn doing the whole insertion. *new table created as:* create table tab2 ( id serial, attr1 integer not null, attr2 integer not null, primary key(id) ); *EXP 1: inserts with multiple txn:* insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 10); insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 20); note: attr2 has only two values 10 and 20 *EXP 2: inserts with a single txn:* insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1); I also performed another experiment as follows: *EXP 3:* select attr1, attr2 into tab2 from tab1; The observation here is EXP 3 is much faster than EXP 2 probably due to bulk inserts used by Postgres. However I could not find a way to insert id values in tab2 using EXP 3. Also select .. into .. from .. throws an error if we create a table first and then populate the tuples using the command. I have the following questions: 1. Is it possible to have an id column in tab2 and perform a bulk insert using select .. into .. from .. or using some other means? 2. If a table is already created, is it possible to do bulk inserts via multiple txns inserting into the same table (EXP 3)? Best, -SB
Re: Bulk Inserts
Hi Adrian, Thanks for the response. > Yes, but you will some code via client or function that batches the > inserts for you. Could you please elaborate a bit on how EXP 1 could be performed such that it uses bulk inserts? Best, -SB On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver wrote: > On 8/9/19 3:06 PM, Souvik Bhattacherjee wrote: > > Hi, > > > > I'm trying to measure the performance of the following: Multiple txns > > inserting tuples into a table concurrently vs single txn doing the whole > > insertion. > > > > *new table created as:* > > create table tab2 ( > > id serial, > > attr1 integer not null, > > attr2 integer not null, > > primary key(id) > > ); > > > > *EXP 1: inserts with multiple txn:* > > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > > attr2 = 10); > > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > > attr2 = 20); > > > > note: attr2 has only two values 10 and 20 > > > > *EXP 2: inserts with a single txn:* > > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1); > > > > I also performed another experiment as follows: > > *EXP 3:* select attr1, attr2 into tab2 from tab1; > > > > The observation here is EXP 3 is much faster than EXP 2 probably due to > > bulk inserts used by Postgres. However I could not find a way to insert > > id values in tab2 using EXP 3. Also select .. into .. from .. throws an > > error if we create a table first and then populate the tuples using the > > command. > > Yes as SELECT INTO is functionally the same as CREATE TABLE AS: > > https://www.postgresql.org/docs/11/sql-selectinto.html > > > > > I have the following questions: > > 1. Is it possible to have an id column in tab2 and perform a bulk insert > > using select .. into .. from .. or using some other means? > > Not using SELECT INTO for reasons given above. > Though it is possible to SELECT INTO as you show in EXP 3 and then: > alter table tab2 add column id serial primary key; > EXP 2 shows the other means. > > > 2. If a table is already created, is it possible to do bulk inserts via > > multiple txns inserting into the same table (EXP 3)? > > Yes, but you will some code via client or function that batches the > inserts for you. > > > > > Best, > > -SB > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Bulk Inserts
> Does this appeal to you: > COPY (SELECT * FROM relation) TO ... ( https://www.postgresql.org/docs/10/sql-copy.html) Not sure if COPY can be used to transfer data between tables. On Sat, Aug 10, 2019 at 11:01 PM Rob Sargent wrote: > > > On Aug 10, 2019, at 8:47 PM, Souvik Bhattacherjee > wrote: > > Hi Adrian, > > Thanks for the response. > > > Yes, but you will some code via client or function that batches the > > inserts for you. > > Could you please elaborate a bit on how EXP 1 could be performed such that > it uses bulk inserts? > > Best, > -SB > > On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver > wrote: > >> On 8/9/19 3:06 PM, Souvik Bhattacherjee wrote: >> > Hi, >> > >> > I'm trying to measure the performance of the following: Multiple txns >> > inserting tuples into a table concurrently vs single txn doing the >> whole >> > insertion. >> > >> > *new table created as:* >> > create table tab2 ( >> > id serial, >> > attr1 integer not null, >> > attr2 integer not null, >> > primary key(id) >> > ); >> > >> > *EXP 1: inserts with multiple txn:* >> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where >> > attr2 = 10); >> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where >> > attr2 = 20); >> > >> > note: attr2 has only two values 10 and 20 >> > >> > *EXP 2: inserts with a single txn:* >> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1); >> > >> > I also performed another experiment as follows: >> > *EXP 3:* select attr1, attr2 into tab2 from tab1; >> > >> > The observation here is EXP 3 is much faster than EXP 2 probably due >> to >> > bulk inserts used by Postgres. However I could not find a way to insert >> > id values in tab2 using EXP 3. Also select .. into .. from .. throws an >> > error if we create a table first and then populate the tuples using the >> > command. >> >> Yes as SELECT INTO is functionally the same as CREATE TABLE AS: >> >> https://www.postgresql.org/docs/11/sql-selectinto.html >> >> > >> > I have the following questions: >> > 1. Is it possible to have an id column in tab2 and perform a bulk >> insert >> > using select .. into .. from .. or using some other means? >> >> Not using SELECT INTO for reasons given above. >> Though it is possible to SELECT INTO as you show in EXP 3 and then: >> alter table tab2 add column id serial primary key; >> EXP 2 shows the other means. >> >> > 2. If a table is already created, is it possible to do bulk inserts via >> > multiple txns inserting into the same table (EXP 3)? >> >> Yes, but you will some code via client or function that batches the >> inserts for you. >> >> > >> > Best, >> > -SB >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> > Top-posting (i.e. putting your reply at the top is discouraged here) > Does this appeal to you: > > COPY (SELECT * FROM relation) TO ... ( > https://www.postgresql.org/docs/10/sql-copy.html) > > >
Re: Bulk Inserts
> If the selects are returning more then one row then you are already > doing bulk inserts. If they are returning single rows or you want to > batch them then you need some sort of code to do that. Something > like(pseudo Python like code): > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)] > for val_batch in attr2_vals: BEGIN for id in val_batch: insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = id) COMMIT For *EXP 1: inserts with multiple txn:* insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 10); insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 20); tab1 has ~6M rows and there are only two values for the attribute attr2 in tab1 which are evenly distributed. So, yes, I guess I'm already doing batching here. Also, I ran the following two statements to see if their performances are comparable. While STMT 1 always runs faster in my machine but their performances seem to differ by a couple of seconds at most. STMT 1: select attr1, attr2 into tab2 from tab1; STMT 2: insert into tab2 (select attr1, attr2 from tab1); However adding the serial id column as an ALTER TABLE statement actually takes more time than inserting the tuples, so the combined total time is more than double the time taken to insert the tuples into tab2 without serial id column. Best, -SB On Sun, Aug 11, 2019 at 11:11 AM Adrian Klaver wrote: > On 8/10/19 7:47 PM, Souvik Bhattacherjee wrote: > > Hi Adrian, > > > > Thanks for the response. > > > > > Yes, but you will some code via client or function that batches the > > > inserts for you. > > > > Could you please elaborate a bit on how EXP 1 could be performed such > > that it uses bulk inserts? > > I guess it comes down to what you define as bulk inserts. From your OP: > > EXP 1: inserts with multiple txn: > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > attr2 = 10); > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > attr2 = 20); > > If the selects are returning more then one row then you are already > doing bulk inserts. If they are returning single rows or you want to > batch them then you need some sort of code to do that. Something > like(pseudo Python like code): > > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)] > > for val_batch in attr2_vals: > BEGIN > for id in val_batch: > insert into tab2 (attr1, attr2) (select attr1, attr2 > from tab1 where attr2 = id) > COMMIT > > > > > Best, > > -SB > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Bulk Inserts
> All I have left is: > select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1, > attr2 into tab2 from tab1; > That will not create a serial type in the id column though. You can > attach a sequence to that column. Something like: > 1) create sequence tab2_id start owned by tab2.id; > 2) alter table tab2 alter COLUMN id set default nextval('tab2_id'); Thanks. This is a bit indirect but works fine. Performance wise this turns out to the best when inserting rows from one table to another (new) table with a serial id column in the new table. Best, -SB On Tue, Aug 13, 2019 at 11:08 AM Adrian Klaver wrote: > On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote: > > > If the selects are returning more then one row then you are already > > > doing bulk inserts. If they are returning single rows or you want to > > > batch them then you need some sort of code to do that. Something > > > like(pseudo Python like code): > > > > > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)] > > > > > for val_batch in attr2_vals: > > BEGIN > > for id in val_batch: > > insert into tab2 (attr1, attr2) (select attr1, attr2 > > from tab1 where attr2 = id) > > COMMIT > > > > For *EXP 1: inserts with multiple txn:* > > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > > attr2 = 10); > > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where > > attr2 = 20); > > > > tab1 has ~6M rows and there are only two values for the attribute attr2 > in > > tab1 which are evenly distributed. So, yes, I guess I'm already doing > > batching > > here. > > > > Also, I ran the following two statements to see if their performances > > are comparable. > > While STMT 1 always runs faster in my machine but their performances > > seem to differ > > by a couple of seconds at most. > > > > STMT 1: select attr1, attr2 into tab2 from tab1; > > STMT 2: insert into tab2 (select attr1, attr2 from tab1); > > All I have left is: > > select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1, > attr2 into tab2 from tab1; > > That will not create a serial type in the id column though. You can > attach a sequence to that column. Something like: > > 1) create sequence tab2_id start owned by tab2.id; > > 2) alter table tab2 alter COLUMN id set default nextval('tab2_id'); > > > > > > > However adding the serial id column as an ALTER TABLE statement actually > > takes more time > > than inserting the tuples, so the combined total time is more than > > double the time taken to insert > > the tuples into tab2 without serial id column. > > > > Best, > > -SB > > > > > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Method to pass data between queries in a multi-statement transaction
Hello, I'm trying to pass some values between queries in a multi-statement transaction. For example, consider the following representative multi-statement transaction: begin; select * from table1 t1, table2 t2 where t1.cid = t2.cid; delete from table1 where cid in (values-to-be-populated-from-the-previous-query); commit; Now, assume that I'm able to get the cid values from table1 that satisfies the equi-join condition and I want to pass those values in the IN condition in the subsequent delete query. Is there a straightforward way to achieve this by modifying the postgresql source code? I tried doing this by creating a hash table (by following this example: https://wiki.postgresql.org/wiki/HashTable). The hash key in this case is the current transactionid (which I presume should remain unchanged for the entire duration of the transaction) and the hash value is dynamically allocated. Within the query I can perform a lookup and insert without any problem. However, when I try to do a lookup of the hash value from a different query that did not insert the value originally, I do not get any value. The hash table is able to tell me that the key exists (due to the fact that the key is not dynamically allocated) but doesn't return any value. My guess is that after each query in the multi-statement txn block, the storage for that query is deallocated, which results in the following behaviour. The hash table approach (although it didn't work) above, IMO, has the drawback that it needs to be locked since there can be other txns that can try to access the hash table as well. The other approach here would be the serialize the values into a file and then read those values later from the subsequent query. However this is not efficient. Thus, I'm looking for a method of passing values between queries in a multi-statement block that avoids the disk and does not need locking. I was wondering if there is a way to define a hash table (or any data structure) which is visible only to the current transaction. -SB
Re: Method to pass data between queries in a multi-statement transaction
There are few if any situations where you need to immediately and completely pass all values from one query to another in the same transaction where the queries cannot just be combined into a single statement. Your representative example is one that is easily combined into a single statement. > What if I need the result of the join to be stored into table3 as well as the tuples that participated in the query to be deleted from table1. The following can be done without the need to transfer values from the previous query into the next: begin; insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept from table1 t1, table2 t2 where t1.cid = t2.cid); delete from table1 where cid in (select c.cid from table1 t1, table2 t2 where t1.cid = t2.cid); commit; However note that we have to perform the join twice, which is not efficient. Now to make things worse, increase the number of tables to join while imposing the requirement of tuple deletion to apply to all or to a subset of the tables that participate in join. Now, the stuff you are trying seems to indicate you are trying to do something in C, inside the engine itself, with all of this. If that is the case you may want to be more clear as to what you are attempting to do. But as far as server SQL goes the only persistence area are tables/relations - including temporary ones. > I'm trying to modify the engine here. -SB On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee > wrote: > >> Hello, >> >> I'm trying to pass some values between queries in a multi-statement >> transaction. For example, consider the following representative >> multi-statement transaction: >> >> begin; >> select * from table1 t1, table2 t2 where t1.cid = t2.cid; >> delete from table1 where cid in >> (values-to-be-populated-from-the-previous-query); >> commit; >> > > There are few if any situations where you need to immediately and > completely pass all values from one query to another in the same > transaction where the queries cannot just be combined into a single > statement. Your representative example is one that is easily combined into > a single statement. > > Now, the stuff you are trying seems to indicate you are trying to do > something in C, inside the engine itself, with all of this. If that is the > case you may want to be more clear as to what you are attempting to do. > But as far as server SQL goes the only persistence area are > tables/relations - including temporary ones. > > David J. >
Re: Method to pass data between queries in a multi-statement transaction
One can argue here that we can use the results from table3 to perform the delete operation instead of joining again. But table3 may be subject to updates which makes the situation tricky. On Wed, Apr 17, 2019 at 8:30 PM Souvik Bhattacherjee wrote: > There are few if any situations where you need to immediately and > completely pass all values from one query to another in the same > transaction where the queries cannot just be combined into a single > statement. Your representative example is one that is easily combined into > a single statement. > > > What if I need the result of the join to be stored into table3 as well > as the tuples that participated in the query to be deleted from table1. The > following can be done without the need to transfer values from the previous > query into the next: > > begin; > insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept > from table1 t1, table2 t2 where t1.cid = t2.cid); > delete from table1 where cid in (select c.cid from table1 t1, table2 t2 > where t1.cid = t2.cid); > commit; > > However note that we have to perform the join twice, which is not > efficient. Now to make things worse, increase the number of tables to join > while imposing the requirement of tuple deletion to apply to all or to a > subset of the tables that participate in join. > > Now, the stuff you are trying seems to indicate you are trying to do > something in C, inside the engine itself, with all of this. If that is the > case you may want to be more clear as to what you are attempting to do. > But as far as server SQL goes the only persistence area are > tables/relations - including temporary ones. > > > I'm trying to modify the engine here. > > -SB > > On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> >> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee >> wrote: >> >>> Hello, >>> >>> I'm trying to pass some values between queries in a multi-statement >>> transaction. For example, consider the following representative >>> multi-statement transaction: >>> >>> begin; >>> select * from table1 t1, table2 t2 where t1.cid = t2.cid; >>> delete from table1 where cid in >>> (values-to-be-populated-from-the-previous-query); >>> commit; >>> >> >> There are few if any situations where you need to immediately and >> completely pass all values from one query to another in the same >> transaction where the queries cannot just be combined into a single >> statement. Your representative example is one that is easily combined into >> a single statement. >> >> Now, the stuff you are trying seems to indicate you are trying to do >> something in C, inside the engine itself, with all of this. If that is the >> case you may want to be more clear as to what you are attempting to do. >> But as far as server SQL goes the only persistence area are >> tables/relations - including temporary ones. >> >> David J. >> >
Re: Method to pass data between queries in a multi-statement transaction
Thanks Michel. However this only works if a is an unique attribute in the table that would help us to identify tuples that participated in the join. Consider the following join: insert into table3 (id, level, empname, salary) (select t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1 t1, table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid); Now if I want to delete those tuples from table2 that satisfied the join condition, I need to execute the join again with additional attributes. Also note that based on query plan, i.e. whether table0 and table1 were joined first followed by table1 and table2, we have to execute one additional join to get the tuples in table2 that satisfied the join condition (t1.pid = t2.pid). Getting that information while the query is executed may not be difficult. There are other use cases in my application that require me to transfer the data from one query to the next within a transaction. Thus, what I'm looking for here is way to store the information and then pass that information to the next query efficiently. For example, is it possible to define a struct of my choice, private to the current transaction, that would store the data and then pass it around to the next query in the transaction without having to materialize that struct (or deal with concurrency issues as in the hash table approach mentioned earlier) . -SB On Wed, Apr 17, 2019 at 10:47 PM Michel Pelletier < pelletier.mic...@gmail.com> wrote: > On Wed, Apr 17, 2019 at 5:30 PM Souvik Bhattacherjee > wrote: > >> > What if I need the result of the join to be stored into table3 as well >> as the tuples that participated in the query to be deleted from table1. The >> following can be done without the need to transfer values from the previous >> query into the next: >> >> begin; >> insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept >> from table1 t1, table2 t2 where t1.cid = t2.cid); >> delete from table1 where cid in (select c.cid from table1 t1, table2 t2 >> where t1.cid = t2.cid); >> commit; >> >> > > You can use INSERT...RETURNING in a WITH query: > > postgres=# create table foo (a integer); > CREATE TABLE > postgres=# create table bar (b integer); > CREATE TABLE > postgres=# insert into bar values (42); > INSERT 0 1 > postgres=# with i as (insert into foo values (42) returning a) delete from > bar where b = (select a from i); > DELETE 1 > > -Michel > > >> >> On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston < >> david.g.johns...@gmail.com> wrote: >> >>> >>> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee >>> wrote: >>> >>>> Hello, >>>> >>>> I'm trying to pass some values between queries in a multi-statement >>>> transaction. For example, consider the following representative >>>> multi-statement transaction: >>>> >>>> begin; >>>> select * from table1 t1, table2 t2 where t1.cid = t2.cid; >>>> delete from table1 where cid in >>>> (values-to-be-populated-from-the-previous-query); >>>> commit; >>>> >>> >>> There are few if any situations where you need to immediately and >>> completely pass all values from one query to another in the same >>> transaction where the queries cannot just be combined into a single >>> statement. Your representative example is one that is easily combined into >>> a single statement. >>> >>> Now, the stuff you are trying seems to indicate you are trying to do >>> something in C, inside the engine itself, with all of this. If that is the >>> case you may want to be more clear as to what you are attempting to do. >>> But as far as server SQL goes the only persistence area are >>> tables/relations - including temporary ones. >>> >>> David J. >>> >>
Re: Method to pass data between queries in a multi-statement transaction
Well the DELETE is not going to work as c.cid will error as undefined. > Yes, that's a typo. I haven't tested it out before typing; just wanted to convey the general idea. -SB On Thu, Apr 18, 2019 at 10:50 AM Adrian Klaver wrote: > On 4/17/19 5:30 PM, Souvik Bhattacherjee wrote: > > There are few if any situations where you need to immediately and > > completely pass all values from one query to another in the same > > transaction where the queries cannot just be combined into a single > > statement. Your representative example is one that is easily combined > > into a single statement. > > > > > What if I need the result of the join to be stored into table3 as > > well as the tuples that participated in the query to be deleted from > > table1. The following can be done without the need to transfer values > > from the previous query into the next: > > > > begin; > > insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, > > t2.dept from table1 t1, table2 t2 where t1.cid = t2.cid); > > delete from table1 where cid in (select c.cid from table1 t1, table2 t2 > > where t1.cid = t2.cid); > > Well the DELETE is not going to work as c.cid will error as undefined. > > > commit; > > > > However note that we have to perform the join twice, which is not > > efficient. Now to make things worse, increase the number of tables to > > join while imposing the requirement of tuple deletion to apply to all or > > to a subset of the tables that participate in join. > > You might want to take a look at CTE's: > > https://www.postgresql.org/docs/11/queries-with.html > > > > > Now, the stuff you are trying seems to indicate you are trying to do > > something in C, inside the engine itself, with all of this. If that is > > the case you may want to be more clear as to what you are attempting to > > do. But as far as server SQL goes the only persistence area are > > tables/relations - including temporary ones. > > > >> I'm trying to modify the engine here. > > > > -SB > > > > On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston > > mailto:david.g.johns...@gmail.com>> wrote: > > > > > > On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee > > mailto:kivu...@gmail.com>> wrote: > > > > Hello, > > > > I'm trying to pass some values between queries in a > > multi-statement transaction. For example, consider the following > > representative multi-statement transaction: > > > > begin; > > select * from table1 t1, table2 t2 where t1.cid = t2.cid; > > delete from table1 where cid in > > (values-to-be-populated-from-the-previous-query); > > commit; > > > > > > There are few if any situations where you need to immediately and > > completely pass all values from one query to another in the same > > transaction where the queries cannot just be combined into a single > > statement. Your representative example is one that is easily > > combined into a single statement. > > > > Now, the stuff you are trying seems to indicate you are trying to do > > something in C, inside the engine itself, with all of this. If that > > is the case you may want to be more clear as to what you are > > attempting to do. But as far as server SQL goes the only > > persistence area are tables/relations - including temporary ones. > > > > David J. > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Method to pass data between queries in a multi-statement transaction
Or just add pid to table3... > That's an application requirement. So pid cannot be added at will to table3. How much development and maintenance effort are you willing to spend here to gain what is likely to amount to only a bit of efficiency? Many things are possible if you are going to modify the server code but why add grief? > How much development / maintenance effort do you anticipate for implementing this feature? This is something that my application would need so I'm willing to dedicate some time to it. By the way, I figured that the hash table deallocation issue could be resolved (although in an inefficient way) by serializing the data to a string and then copying that value into the hash table during insertion. However the hash table is still visible to all the transactions I suppose and as a result needs to be locked. Just wanted to let you know that I have initialized the hash table within the PostgresMain() method and the hash table is declared as an extern variable, which I anticipate to be accessed by many methods. How difficult is it to make a hash table (or any data structure) private to the current transaction so that I do not have to resort to locking? -SB On Thu, Apr 18, 2019 at 5:34 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Apr 18, 2019 at 9:03 AM Souvik Bhattacherjee > wrote: > >> Thanks Michel. >> >> However this only works if a is an unique attribute in the table that >> would help us to identify tuples that participated in the join. Consider >> the following join: >> >> insert into table3 (id, level, empname, salary) >> (select t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1 >> t1, table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid); >> >> Now if I want to delete those tuples from table2 that satisfied the join >> condition, I need to execute the join again with additional attributes. >> > > Or just add pid to table3... > >> Also note that based on query plan, i.e. whether table0 and table1 were >> joined first followed by table1 and table2, we have to execute one >> additional join to get the tuples in table2 that satisfied the join >> condition (t1.pid = t2.pid). >> > > ??? > >> Getting that information while the query is executed may not be >> difficult. There are other use cases in my application that require me to >> transfer the data from one query to the next within a transaction. >> > > There may be some that benefit to some degree but its likely that you can > write the application and queries in such a way to avoid a hard requirement. > > Thus, what I'm looking for here is way to store the information and then >> pass that information to the next query efficiently. >> > For example, is it possible to define a struct of my choice, private to >> the current transaction, that would store the data and then pass it around >> to the next query in the transaction without having to materialize that >> struct (or deal with concurrency issues as in the hash table approach >> mentioned earlier) . >> > > How much development and maintenance effort are you willing to spend here > to gain what is likely to amount to only a bit of efficiency? Many things > are possible if you are going to modify the server code but why add grief? > > David J. > >
Is it possible to store the output of EXPLAIN into a table
Hi, I was just wondering if it is possible to store the output of EXPLAIN into a table. create temp table mytab as (select * from (explain select * from table1 where attr = 5) t); Unfortunately, the query above does not work. Thanks, -Souvik
Quick hack for fetching the contents of a temp table
Hi, Is there a quick way to fetch the contents of a temp table from within postgres server code? The temp table has a single column and there is a single value in the column. Thus, I'm using the temp table as a variable to store a string. begin; create temp table mytemp(metastr char(1024)) on commit drop; The metastr attribute in the temp table is populated by a function as shown below. insert into mytemp(metastr) (select somefunction1()); I need to use the return value of the function somefunction1() to modify a query and execute it. While it is easy to fetch the desired result using plpgsql functions(), however modifying and rewriting the query using the plpgsql does not seem straightforward. Therefore I have already written the query modification function within the postgres server since I need to use the results from pg_parse_query() method. I stepped through the "select metastr from mytemp;" query only to find that the process of fetching a tuple and the corresponding attribute is a bit convoluted and I was wondering if something in the lines of this code is possible: https://www.postgresql.org/docs/11/xfunc-c.html#id-1.8.3.13.10 >From this code, it appears that the tuple has already been provided in t through the statement: HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0); and before that through the query: SELECT name, c_overpaid(emp, 1500) AS overpaid FROM emp WHERE name = 'Bill' OR name = 'Sam'; What I'm looking for here is a method to fetch the tuple from mytemp within the postgres server code and then extract the value from the metastr attribute into a char[]. Since all statements above are within a transaction block, I think we do not need to have read locks while accessing temp table. Thanks, -SB
Re: Is it possible to store the output of EXPLAIN into a table
Thanks, works fine! -Souvik On Mon, Apr 22, 2019 at 12:02 PM Tom Lane wrote: > Souvik Bhattacherjee writes: > > I was just wondering if it is possible to store the output of EXPLAIN > into > > a table. > > EXPLAIN won't do that directly, but you could make a plpgsql function > along the lines of > > for t in execute explain ... > return next t; > > (too lazy to check the exact details here, but I believe you can find > related examples in our regression tests) and then store the function's > result into a table. > > regards, tom lane >
Re: Quick hack for fetching the contents of a temp table
This question seems related to the other thread you started on April 18th? > This is related but perhaps a little complicated than the previous one, since this involves modifying the query that would be executed. Several of us suggested you use a CTE to materialize your intermediate values. Did you try that? > I'm working with a temp table here used for storing the intermediate results, which is one of the alternate solutions provided the other day. Not sure how different this would be from using a CTE. Without going into the motivation behind the application, what I'm interested in here is to modify the query based on a prior query. A simple example follows: Q1: select * from table1 t1, table2 t2 where t1.cid = t2.cid and attr1 = 5 and attr2 = 10; Step 1: extract the filter conditions on table1 and table2 from Q1. So we perform the following: explain select * from table1 t1, table2 t2 where t1.cid = t2.cid and attr1 = 5 and attr2 = 10; and extract the filter conditions. Let's say, (attr1 = 5) belongs to table1 and (attr2 = 10) belongs to table2. [We store the extracted filter conditions into a temp table] Q2. insert into table3(cid, attr1, attr3) (select * from table1 where attr3 = 7); Step 2: Our intent here is to use the extracted filter condition from table1 and substitute it in Q2. Thus after substitution, Q2 would look like: insert into table3(cid, attr1, attr3) (select * from table1 where attr1 = 5 and attr3 = 7); We want to execute all of it within a transaction, so we have the following multi-statement transaction: begin; create temp table mytemp(metastr char(1024)) on commit drop; insert into mytemp(metastr) (select somefunction1('explain select * from table1 t1, table2 t2 where t1.cid = t2.cid and attr1 = 5 and attr2 = 10')); rewriteq table1 insert into table3(cid, attr1, attr3) (select * from table1 where attr3 = 7); commit; Note this line: *rewriteq table1* insert into table3(cid, attr1, attr3) (select * from table1 where attr3 = 7); We have a flag *rewriteq* that indicates this query needs to be rewritten and we have an argument *table1 *that indicates the filter condition in table1 needs to be updated. We also need the contents of mytemp from where we fetch the filter conditions extracted, if any. While this is not exactly a rewrite, more like an update to the query, still we use the term rewrite here. Thus the actual query that will be processed by the postgres is insert into table3(cid, attr1, attr3) (select * from table1 where attr1 = 5 and attr3 = 7); Hope this makes the question clearer. -SB On Tue, Apr 23, 2019 at 8:38 PM Michel Pelletier wrote: > On Tue, Apr 23, 2019 at 2:56 PM Souvik Bhattacherjee > wrote: > >> Hi, >> >> I need to use the return value of the function somefunction1() to modify >> a query and execute it. >> >> While it is easy to fetch the desired result using plpgsql functions(), >> however modifying and rewriting the query using the plpgsql does not seem >> straightforward. Therefore I have already written the query modification >> function within the postgres server since I need to use the results from >> pg_parse_query() method. >> >> > This question seems related to the other thread you started on April > 18th? Several of us suggested you use a CTE to materialize your > intermediate values. Did you try that? The example you provided was not > complete, and I asked for a reproducible example of what you were trying to > achieve and you didn't provide one. I'm all for encouraging people to hack > in the core, but it seems like you have the impression that your goal is > not possible in SQL when it very likely is. Changing the core is not only > going to be a high maintenance burden on you, but it also implies you have > a problem that no one else in 30 years of postgres development has had. > What's more likely is that it is possible to do what you want in SQL and > we'd like to help you, but we need more information and you need to try the > suggestions and answer the questions we ask back of you. > > I'm happy to be wrong about this, because that means postgres can improve > in some way, but first we need more information about what your problem > actually is. > > -Michel >
Allocating shared memory in Postgres
Hi, I need to allocate shared memory in Postgres 11.0 outside the initialization phase. In order to achieve this I have done the following: - increased the amount of shared memory by increasing the value of size in CreateSharedMemoryAndSemaphores (int port) in ipci.c. I have made sure that the amount of memory that I need, say m << M, where M is the amount of additional shared memory that I have allocated. - during a particular query, where I need to allocate shared memory (which is a function of the sizes of the tables in the query), I invoke ShmemInitStruct() in shmem.c This seems to work in my case, although I haven't tested it extensively. My concern here is that when I go through the NOTES in shmem.c, I find the following line: Fixed-size structures contain things like global variables for a module and should never be allocated after the shared memory initialization phase. I'm allocating a shared array data structure through ShmemInitStruct and I'm not sure if the lines above apply to my case, since I'm doing the allocation during a query. Any help/clarifications in this regard would be appreciated. Best, -SB
Re: Allocating shared memory in Postgres
For starters, what happens if two backends do this concurrently? > I'm assuming here that a single backend process handles all queries from a given client. In case of parallel queries, the master process will be responsible for allocating the shared memory and not the workers. Please let me know if this is not something that you implied by two backends. Even with only one backend, if you do a query that requires X space, and then you do another query that requires X+1 space, what's going to happen? > In my application, every query that allocates shared memory is wrapped in a separate txn, and the life of the shared memory is for the duration of the query. The only purpose of allocating shared memory here is to make the memory segment visible to the worker processes for that particular query. No other txns/query actually accesses this shared memory. Also, when I allocate shared memory, the txn id is used as a key to the ShmemIndex to differentiate between two concurrent shared memory allocation requests. Recent PG releases have a "DSM" mechanism for short-lived (query lifespan, typically) shared memory that's separate from the core shmem pool. That might suit your needs better. The system design is really not friendly to demanding more core shmem after postmaster start. > Yes, I understand that "DSM" mechanisms exist. But I wanted to know if the approach that I had outlined will work even if there are certain drawbacks to it such as 1. overestimating the initial shared memory size that needs to be allocated, 2. not able to free the shared memory after use Best, -SB On Wed, Jul 3, 2019 at 4:17 PM Tom Lane wrote: > Souvik Bhattacherjee writes: > > I need to allocate shared memory in Postgres 11.0 outside the > > initialization phase. In order to achieve this I have done the following: > > - during a particular query, where I need to allocate shared memory > (which > > is a function of the sizes of the tables in the query), I invoke > > ShmemInitStruct() in shmem.c > > This seems like a pretty horrid idea. For starters, what happens if two > backends do this concurrently? Even with only one backend, if you do a > query that requires X space, and then you do another query that requires > X+1 space, what's going to happen? > > Recent PG releases have a "DSM" mechanism for short-lived (query lifespan, > typically) shared memory that's separate from the core shmem pool. That > might suit your needs better. The system design is really not friendly > to demanding more core shmem after postmaster start. > > regards, tom lane >