from:"Souvik Bhattacherjee"

Running concurrent txns and measuring the timings in Postgres

2019-07-24 Thread Souvik Bhattacherjee

Hi,

Is there a standard procedure to execute two or more concurrent txns at the
same time? I understand that if we want to run concurrent txns, we need to
execute them from different psql sessions. But how do we make sure that
they begin execution almost at the same time.

Also, I'm interested in measuring the time taken across all executing txns,
i.e. the time from the start of the earliest txns till the end of the last
txn.

Best,
-SB

Re: Running concurrent txns and measuring the timings in Postgres

2019-07-24 Thread Souvik Bhattacherjee

> It would help to know what problem you are trying to solve?

Multiple txns are inserting tuples into a table concurrently. Wanted to
measure
the total time taken to complete the insertion process. Some txns overlap
with
others on the tuples they insert. Duplicate tuples are not inserted.

On Wed, Jul 24, 2019 at 1:58 PM Adrian Klaver 
wrote:

> On 7/24/19 9:54 AM, Souvik Bhattacherjee wrote:
> > Hi,
> >
> > Is there a standard procedure to execute two or more concurrent txns at
> > the same time? I understand that if we want to run concurrent txns, we
> > need to execute them from different psql sessions. But how do we make
> > sure that they begin execution almost at the same time.
>
> Well different sessions be they psql or some other client. That would be
> the difficulty, determining what is submitting the transaction.
>
> >
> > Also, I'm interested in measuring the time taken across all executing
> > txns, i.e. the time from the start of the earliest txns till the end of
> > the last txn.
>
> It would help to know what problem you are trying to solve?
>
> >
> > Best,
> > -SB
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Running concurrent txns and measuring the timings in Postgres

2019-07-24 Thread Souvik Bhattacherjee

> The duplicate elimination is being handled by ON CONFLICT or some custom
> process in the code generating the transactions?

Yes, we used ON CONFLICT for that. Thanks btw.

> If the transactions are being created from a single app/script could you
> not just use 'timing' to mark the beginning of the transactions and the
> end and record that somewhere(db table and/or file)?

So did you mean to say that I need to get the timestamps of the
beginning/end
of the txn since \timing only produces elapsed time?  Surely that would
solve the
problem but I'm not sure how to get that done in Postgres.

I wanted to check to see if there are simpler ways to get this done in
Postgres
before trying out something similar to Rob's suggestion or yours.

On Wed, Jul 24, 2019 at 4:12 PM Adrian Klaver 
wrote:

> On 7/24/19 12:22 PM, Souvik Bhattacherjee wrote:
> >  > It would help to know what problem you are trying to solve?
> >
> > Multiple txns are inserting tuples into a table concurrently. Wanted to
> > measure
> > the total time taken to complete the insertion process. Some txns
> > overlap with
> > others on the tuples they insert. Duplicate tuples are not inserted.
>
> The duplicate elimination is being handled by ON CONFLICT or some custom
> process in the code generating the transactions?
>
> If the transactions are being created from a single app/script could you
> not just use 'timing' to mark the beginning of the transactions and the
> end and record that somewhere(db table and/or file)?
>
> >
> > On Wed, Jul 24, 2019 at 1:58 PM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 7/24/19 9:54 AM, Souvik Bhattacherjee wrote:
> >  > Hi,
> >  >
> >  > Is there a standard procedure to execute two or more concurrent
> > txns at
> >  > the same time? I understand that if we want to run concurrent
> > txns, we
> >  > need to execute them from different psql sessions. But how do we
> > make
> >  > sure that they begin execution almost at the same time.
> >
> > Well different sessions be they psql or some other client. That
> > would be
> > the difficulty, determining what is submitting the transaction.
> >
> >  >
> >  > Also, I'm interested in measuring the time taken across all
> > executing
> >  > txns, i.e. the time from the start of the earliest txns till the
> > end of
> >  > the last txn.
> >
> > It would help to know what problem you are trying to solve?
> >
> >  >
> >  > Best,
> >  > -SB
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Running concurrent txns and measuring the timings in Postgres

2019-07-24 Thread Souvik Bhattacherjee

>Well it depends on the part you have not filled in, what client(s) you
> are using and how the transactions are being generated?

Using a psql client and txns are generated manually at this point. Each txn
is
stored separately in a .sql file and are fired from different psql
sessions, if that
helps.

On Wed, Jul 24, 2019 at 4:44 PM Adrian Klaver 
wrote:

> On 7/24/19 1:42 PM, Souvik Bhattacherjee wrote:
> >  > The duplicate elimination is being handled by ON CONFLICT or some
> custom
> >  > process in the code generating the transactions?
> >
> > Yes, we used ON CONFLICT for that. Thanks btw.
> >
> >  > If the transactions are being created from a single app/script could
> you
> >  > not just use 'timing' to mark the beginning of the transactions and
> the
> >  > end and record that somewhere(db table and/or file)?
> >
> > So did you mean to say that I need to get the timestamps of the
> > beginning/end
> > of the txn since \timing only produces elapsed time?  Surely that would
> > solve the
> > problem but I'm not sure how to get that done in Postgres.
> >
> > I wanted to check to see if there are simpler ways to get this done in
> > Postgres
> > before trying out something similar to Rob's suggestion or yours.
> >
>
> Well it depends on the part you have not filled in, what client(s) you
> are using and how the transactions are being generated?
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Running concurrent txns and measuring the timings in Postgres

2019-07-24 Thread Souvik Bhattacherjee

I got this thing running and hopefully works as expected. The txns are
stored in insert_txn1.sql, insert_txn2.sql, ...
Please let me know if you find any issues with this.
Script is attached.



On Wed, Jul 24, 2019 at 5:11 PM Adrian Klaver 
wrote:

> On 7/24/19 1:52 PM, Souvik Bhattacherjee wrote:
> >  >Well it depends on the part you have not filled in, what client(s) you
> >  > are using and how the transactions are being generated?
> >
> > Using a psql client and txns are generated manually at this point. Each
> > txn is
> > stored separately in a .sql file and are fired from different psql
> > sessions, if that
> > helps.
> >
>
> A quick demo:
>
> psql -d production -U postgres -c "\timing" -c "select line_id, category
>   from avail_headers order by line_id;"
>
> Timing is on.
> Time: 0.710 ms
>
> > On Wed, Jul 24, 2019 at 4:44 PM Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 7/24/19 1:42 PM, Souvik Bhattacherjee wrote:
> >  >  > The duplicate elimination is being handled by ON CONFLICT or
> > some custom
> >  >  > process in the code generating the transactions?
> >  >
> >  > Yes, we used ON CONFLICT for that. Thanks btw.
> >  >
> >  >  > If the transactions are being created from a single app/script
> > could you
> >  >  > not just use 'timing' to mark the beginning of the
> > transactions and the
> >  >  > end and record that somewhere(db table and/or file)?
> >  >
> >  > So did you mean to say that I need to get the timestamps of the
> >  > beginning/end
> >  > of the txn since \timing only produces elapsed time?  Surely that
> > would
> >  > solve the
> >  > problem but I'm not sure how to get that done in Postgres.
> >  >
> >  > I wanted to check to see if there are simpler ways to get this
> > done in
> >  > Postgres
> >  > before trying out something similar to Rob's suggestion or yours.
> >  >
> >
> > Well it depends on the part you have not filled in, what client(s)
> you
> > are using and how the transactions are being generated?
> >
> >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


concur_txn_timing.sql
Description: application/sql

Bulk Inserts

2019-08-09 Thread Souvik Bhattacherjee

Hi,

I'm trying to measure the performance of the following: Multiple txns
inserting tuples into a table concurrently vs single txn doing the whole
insertion.

*new table created as:*
create table tab2 (
id serial,
attr1 integer not null,
attr2 integer not null,
primary key(id)
);

*EXP 1: inserts with multiple txn:*
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 20);

note: attr2 has only two values 10 and 20

*EXP 2: inserts with a single txn:*
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1);

I also performed another experiment as follows:
*EXP 3:* select attr1, attr2 into tab2 from tab1;

The observation here is EXP 3  is much faster than EXP 2 probably due to
bulk inserts used by Postgres. However I could not find a way to insert id
values in tab2 using EXP 3. Also select .. into .. from .. throws an error
if we create a table first and then populate the tuples using the command.

I have the following questions:
1. Is it possible to have an id column in tab2 and perform a bulk insert
using select .. into .. from .. or using some other means?
2. If a table is already created, is it possible to do bulk inserts via
multiple txns inserting into the same table (EXP 3)?

Best,
-SB

Re: Bulk Inserts

2019-08-10 Thread Souvik Bhattacherjee

Hi Adrian,

Thanks for the response.

> Yes, but you will some code via client or function that batches the
> inserts for you.

Could you please elaborate a bit on how EXP 1 could be performed such that
it uses bulk inserts?

Best,
-SB

On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver 
wrote:

> On 8/9/19 3:06 PM, Souvik Bhattacherjee wrote:
> > Hi,
> >
> > I'm trying to measure the performance of the following: Multiple txns
> > inserting tuples into a table concurrently vs single txn doing the whole
> > insertion.
> >
> > *new table created as:*
> > create table tab2 (
> > id serial,
> > attr1 integer not null,
> > attr2 integer not null,
> > primary key(id)
> > );
> >
> > *EXP 1: inserts with multiple txn:*
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 10);
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 20);
> >
> > note: attr2 has only two values 10 and 20
> >
> > *EXP 2: inserts with a single txn:*
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1);
> >
> > I also performed another experiment as follows:
> > *EXP 3:* select attr1, attr2 into tab2 from tab1;
> >
> > The observation here is EXP 3  is much faster than EXP 2 probably due to
> > bulk inserts used by Postgres. However I could not find a way to insert
> > id values in tab2 using EXP 3. Also select .. into .. from .. throws an
> > error if we create a table first and then populate the tuples using the
> > command.
>
> Yes as SELECT INTO is functionally the same as CREATE TABLE AS:
>
> https://www.postgresql.org/docs/11/sql-selectinto.html
>
> >
> > I have the following questions:
> > 1. Is it possible to have an id column in tab2 and perform a bulk insert
> > using select .. into .. from .. or using some other means?
>
> Not using SELECT INTO for reasons given above.
> Though it is possible to SELECT INTO as you show in EXP 3 and then:
> alter table tab2 add column id serial primary key;
> EXP 2 shows the other means.
>
> > 2. If a table is already created, is it possible to do bulk inserts via
> > multiple txns inserting into the same table (EXP 3)?
>
> Yes, but you will some code via client or function that batches the
> inserts for you.
>
> >
> > Best,
> > -SB
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Bulk Inserts

2019-08-10 Thread Souvik Bhattacherjee

> Does this appeal to you:
> COPY (SELECT * FROM relation) TO ... (
https://www.postgresql.org/docs/10/sql-copy.html)

Not sure if COPY can be used to transfer data between tables.

On Sat, Aug 10, 2019 at 11:01 PM Rob Sargent  wrote:

>
>
> On Aug 10, 2019, at 8:47 PM, Souvik Bhattacherjee 
> wrote:
>
> Hi Adrian,
>
> Thanks for the response.
>
> > Yes, but you will some code via client or function that batches the
> > inserts for you.
>
> Could you please elaborate a bit on how EXP 1 could be performed such that
> it uses bulk inserts?
>
> Best,
> -SB
>
> On Fri, Aug 9, 2019 at 7:26 PM Adrian Klaver 
> wrote:
>
>> On 8/9/19 3:06 PM, Souvik Bhattacherjee wrote:
>> > Hi,
>> >
>> > I'm trying to measure the performance of the following: Multiple txns
>> > inserting tuples into a table concurrently vs single txn doing the
>> whole
>> > insertion.
>> >
>> > *new table created as:*
>> > create table tab2 (
>> > id serial,
>> > attr1 integer not null,
>> > attr2 integer not null,
>> > primary key(id)
>> > );
>> >
>> > *EXP 1: inserts with multiple txn:*
>> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
>> > attr2 = 10);
>> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
>> > attr2 = 20);
>> >
>> > note: attr2 has only two values 10 and 20
>> >
>> > *EXP 2: inserts with a single txn:*
>> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1);
>> >
>> > I also performed another experiment as follows:
>> > *EXP 3:* select attr1, attr2 into tab2 from tab1;
>> >
>> > The observation here is EXP 3  is much faster than EXP 2 probably due
>> to
>> > bulk inserts used by Postgres. However I could not find a way to insert
>> > id values in tab2 using EXP 3. Also select .. into .. from .. throws an
>> > error if we create a table first and then populate the tuples using the
>> > command.
>>
>> Yes as SELECT INTO is functionally the same as CREATE TABLE AS:
>>
>> https://www.postgresql.org/docs/11/sql-selectinto.html
>>
>> >
>> > I have the following questions:
>> > 1. Is it possible to have an id column in tab2 and perform a bulk
>> insert
>> > using select .. into .. from .. or using some other means?
>>
>> Not using SELECT INTO for reasons given above.
>> Though it is possible to SELECT INTO as you show in EXP 3 and then:
>> alter table tab2 add column id serial primary key;
>> EXP 2 shows the other means.
>>
>> > 2. If a table is already created, is it possible to do bulk inserts via
>> > multiple txns inserting into the same table (EXP 3)?
>>
>> Yes, but you will some code via client or function that batches the
>> inserts for you.
>>
>> >
>> > Best,
>> > -SB
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
> Top-posting (i.e. putting your reply at the top is discouraged here)
> Does this appeal to you:
>
> COPY (SELECT * FROM relation) TO ... (
> https://www.postgresql.org/docs/10/sql-copy.html)
>
>
>

Re: Bulk Inserts

2019-08-13 Thread Souvik Bhattacherjee

> If the selects are returning more then one row then you are already
> doing bulk inserts. If they are returning single rows or you want to
> batch them then you need some sort of code to do that. Something
> like(pseudo Python like code):

> attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]

> for val_batch in attr2_vals:
BEGIN
for id in val_batch:
insert into tab2 (attr1, attr2) (select attr1, attr2
 from tab1 where attr2 = id)
 COMMIT

For *EXP 1: inserts with multiple txn:*
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2
= 20);

tab1 has ~6M rows and there are only two values for the attribute attr2 in
tab1 which are evenly distributed. So, yes, I guess I'm already doing
batching
here.

Also, I ran the following two statements to see if their performances are
comparable.
While STMT 1 always runs faster in my machine but their performances seem
to differ
by a couple of seconds at most.

STMT 1: select attr1, attr2 into tab2 from tab1;
STMT 2: insert into tab2 (select attr1, attr2 from tab1);

However adding the serial id column as an ALTER TABLE statement actually
takes more time
than inserting the tuples, so the combined total time is more than double
the time taken to insert
the tuples into tab2 without serial id column.

Best,
-SB



On Sun, Aug 11, 2019 at 11:11 AM Adrian Klaver 
wrote:

> On 8/10/19 7:47 PM, Souvik Bhattacherjee wrote:
> > Hi Adrian,
> >
> > Thanks for the response.
> >
> >  > Yes, but you will some code via client or function that batches the
> >  > inserts for you.
> >
> > Could you please elaborate a bit on how EXP 1 could be performed such
> > that it uses bulk inserts?
>
> I guess it comes down to what you define as bulk inserts. From your OP:
>
> EXP 1: inserts with multiple txn:
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 10);
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 20);
>
> If the selects are returning more then one row then you are already
> doing bulk inserts. If they are returning single rows or you want to
> batch them then you need some sort of code to do that. Something
> like(pseudo Python like code):
>
> attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
>
> for val_batch in attr2_vals:
> BEGIN
> for id in val_batch:
> insert into tab2 (attr1, attr2) (select attr1, attr2
>  from tab1 where attr2 = id)
>  COMMIT
>
> >
> > Best,
> > -SB
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Bulk Inserts

2019-08-13 Thread Souvik Bhattacherjee

> All I have left is:

> select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
> attr2 into tab2 from tab1;

> That will not create a serial type in the id column though. You can
> attach a sequence to that column. Something like:

> 1) create sequence tab2_id start  owned by tab2.id;

> 2) alter table tab2 alter COLUMN id set default nextval('tab2_id');

Thanks. This is a bit indirect but works fine. Performance wise this turns
out to the best when inserting rows from one table to another (new) table
with a serial id column in the new table.

Best,
-SB

On Tue, Aug 13, 2019 at 11:08 AM Adrian Klaver 
wrote:

> On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote:
> >  > If the selects are returning more then one row then you are already
> >  > doing bulk inserts. If they are returning single rows or you want to
> >  > batch them then you need some sort of code to do that. Something
> >  > like(pseudo Python like code):
> >
> >  > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
> >
> >  > for val_batch in attr2_vals:
> >  BEGIN
> >  for id in val_batch:
> >  insert into tab2 (attr1, attr2) (select attr1, attr2
> >   from tab1 where attr2 = id)
> >   COMMIT
> >
> > For *EXP 1: inserts with multiple txn:*
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 10);
> > insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> > attr2 = 20);
> >
> > tab1 has ~6M rows and there are only two values for the attribute attr2
> in
> > tab1 which are evenly distributed. So, yes, I guess I'm already doing
> > batching
> > here.
> >
> > Also, I ran the following two statements to see if their performances
> > are comparable.
> > While STMT 1 always runs faster in my machine but their performances
> > seem to differ
> > by a couple of seconds at most.
> >
> > STMT 1: select attr1, attr2 into tab2 from tab1;
> > STMT 2: insert into tab2 (select attr1, attr2 from tab1);
>
> All I have left is:
>
> select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
> attr2 into tab2 from tab1;
>
> That will not create a serial type in the id column though. You can
> attach a sequence to that column. Something like:
>
> 1) create sequence tab2_id start  owned by tab2.id;
>
> 2) alter table tab2 alter COLUMN id set default nextval('tab2_id');
>
>
>
> >
> > However adding the serial id column as an ALTER TABLE statement actually
> > takes more time
> > than inserting the tuples, so the combined total time is more than
> > double the time taken to insert
> > the tuples into tab2 without serial id column.
> >
> > Best,
> > -SB
> >
> >
> >
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Method to pass data between queries in a multi-statement transaction

2019-04-17 Thread Souvik Bhattacherjee

Hello,

I'm trying to pass some values between queries in a multi-statement
transaction. For example, consider the following representative
multi-statement transaction:

begin;
select * from table1 t1, table2 t2 where t1.cid = t2.cid;
delete from table1 where cid in
(values-to-be-populated-from-the-previous-query);
commit;

Now, assume that I'm able to get the cid values from table1 that satisfies
the equi-join condition and I want to pass those values in the IN condition
in the subsequent delete query. Is there a straightforward way to achieve
this by modifying the postgresql source code?

I tried doing this by creating a hash table (by following this example:
https://wiki.postgresql.org/wiki/HashTable). The hash key in this case is
the current transactionid (which I presume should remain unchanged for the
entire duration of the transaction) and the hash value is dynamically
allocated. Within the query I can perform a lookup and insert without any
problem. However, when I try to do a lookup of the hash value from a
different query that did not insert the value originally, I do not get any
value. The hash table is able to tell me that the key exists (due to the
fact that the key is not dynamically allocated) but doesn't return any
value. My guess is that after each query in the multi-statement txn block,
the storage for that query is deallocated, which results in the following
behaviour.

The hash table approach (although it didn't work) above, IMO, has the
drawback that it needs to be locked since there can be other txns that can
try to access the hash table as well.

The other approach here would be the serialize the values into a file and
then read those values later from the subsequent query. However this is not
efficient.

Thus, I'm looking for a method of passing values between queries in a
multi-statement block that avoids the disk and does not need locking. I was
wondering if there is a way to define a hash table (or any data structure)
which is visible only to the current transaction.

-SB

Re: Method to pass data between queries in a multi-statement transaction

2019-04-17 Thread Souvik Bhattacherjee

There are few if any situations where you need to immediately and
completely pass all values from one query to another in the same
transaction where the queries cannot just be combined into a single
statement.  Your representative example is one that is easily combined into
a single statement.

> What if I need the result of the join to be stored into table3 as well as
the tuples that participated in the query to be deleted from table1. The
following can be done without the need to transfer values from the previous
query into the next:

begin;
insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept
from table1 t1, table2 t2 where t1.cid = t2.cid);
delete from table1 where cid in (select c.cid from table1 t1, table2 t2
where t1.cid = t2.cid);
commit;

However note that we have to perform the join twice, which is not
efficient. Now to make things worse, increase the number of tables to join
while imposing the requirement of tuple deletion to apply to all or to a
subset of the tables that participate in join.

Now, the stuff you are trying seems to indicate you are trying to do
something in C, inside the engine itself, with all of this.  If that is the
case you may want to be more clear as to what you are attempting to do.
But as far as server SQL goes the only persistence area are
tables/relations - including temporary ones.

> I'm trying to modify the engine here.

-SB

On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee 
> wrote:
>
>> Hello,
>>
>> I'm trying to pass some values between queries in a multi-statement
>> transaction. For example, consider the following representative
>> multi-statement transaction:
>>
>> begin;
>> select * from table1 t1, table2 t2 where t1.cid = t2.cid;
>> delete from table1 where cid in
>> (values-to-be-populated-from-the-previous-query);
>> commit;
>>
>
> There are few if any situations where you need to immediately and
> completely pass all values from one query to another in the same
> transaction where the queries cannot just be combined into a single
> statement.  Your representative example is one that is easily combined into
> a single statement.
>
> Now, the stuff you are trying seems to indicate you are trying to do
> something in C, inside the engine itself, with all of this.  If that is the
> case you may want to be more clear as to what you are attempting to do.
> But as far as server SQL goes the only persistence area are
> tables/relations - including temporary ones.
>
> David J.
>

Re: Method to pass data between queries in a multi-statement transaction

2019-04-17 Thread Souvik Bhattacherjee

One can argue here that we can use the results from table3 to perform the
delete operation instead of joining again. But table3 may be subject to
updates which makes the situation tricky.

On Wed, Apr 17, 2019 at 8:30 PM Souvik Bhattacherjee 
wrote:

> There are few if any situations where you need to immediately and
> completely pass all values from one query to another in the same
> transaction where the queries cannot just be combined into a single
> statement.  Your representative example is one that is easily combined into
> a single statement.
>
> > What if I need the result of the join to be stored into table3 as well
> as the tuples that participated in the query to be deleted from table1. The
> following can be done without the need to transfer values from the previous
> query into the next:
>
> begin;
> insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept
> from table1 t1, table2 t2 where t1.cid = t2.cid);
> delete from table1 where cid in (select c.cid from table1 t1, table2 t2
> where t1.cid = t2.cid);
> commit;
>
> However note that we have to perform the join twice, which is not
> efficient. Now to make things worse, increase the number of tables to join
> while imposing the requirement of tuple deletion to apply to all or to a
> subset of the tables that participate in join.
>
> Now, the stuff you are trying seems to indicate you are trying to do
> something in C, inside the engine itself, with all of this.  If that is the
> case you may want to be more clear as to what you are attempting to do.
> But as far as server SQL goes the only persistence area are
> tables/relations - including temporary ones.
>
> > I'm trying to modify the engine here.
>
> -SB
>
> On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee 
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to pass some values between queries in a multi-statement
>>> transaction. For example, consider the following representative
>>> multi-statement transaction:
>>>
>>> begin;
>>> select * from table1 t1, table2 t2 where t1.cid = t2.cid;
>>> delete from table1 where cid in
>>> (values-to-be-populated-from-the-previous-query);
>>> commit;
>>>
>>
>> There are few if any situations where you need to immediately and
>> completely pass all values from one query to another in the same
>> transaction where the queries cannot just be combined into a single
>> statement.  Your representative example is one that is easily combined into
>> a single statement.
>>
>> Now, the stuff you are trying seems to indicate you are trying to do
>> something in C, inside the engine itself, with all of this.  If that is the
>> case you may want to be more clear as to what you are attempting to do.
>> But as far as server SQL goes the only persistence area are
>> tables/relations - including temporary ones.
>>
>> David J.
>>
>

Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee

Thanks Michel.

However this only works if a is an unique attribute in the table that would
help us to identify tuples that participated in the join. Consider the
following join:

insert into table3 (id, level, empname, salary)
(select  t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1 t1,
table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid);

Now if I want to delete those tuples from table2 that satisfied the join
condition, I need to execute the join again with additional attributes.
Also note that based on query plan, i.e. whether table0 and table1 were
joined first followed by table1 and table2, we have to execute one
additional join to get the tuples in table2 that satisfied the join
condition (t1.pid = t2.pid).

Getting that information while the query is executed may not be difficult.
There are other use cases in my application that require me to transfer the
data from one query to the next within a transaction.

Thus, what I'm looking for here is way to store the information and then
pass that information to the next query efficiently.
For example, is it possible to define a struct of my choice, private to the
current transaction, that would store the data and then pass it around to
the next query in the transaction without having to materialize that struct
(or deal with concurrency issues as in the hash table approach mentioned
earlier) .

-SB

On Wed, Apr 17, 2019 at 10:47 PM Michel Pelletier <
pelletier.mic...@gmail.com> wrote:

> On Wed, Apr 17, 2019 at 5:30 PM Souvik Bhattacherjee 
> wrote:
>
>> > What if I need the result of the join to be stored into table3 as well
>> as the tuples that participated in the query to be deleted from table1. The
>> following can be done without the need to transfer values from the previous
>> query into the next:
>>
>> begin;
>> insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname, t2.dept
>> from table1 t1, table2 t2 where t1.cid = t2.cid);
>> delete from table1 where cid in (select c.cid from table1 t1, table2 t2
>> where t1.cid = t2.cid);
>> commit;
>>
>>
>
> You can use INSERT...RETURNING in a WITH query:
>
> postgres=# create table foo (a integer);
> CREATE TABLE
> postgres=# create table bar (b integer);
> CREATE TABLE
> postgres=# insert into bar values (42);
> INSERT 0 1
> postgres=# with i as (insert into foo values (42) returning a) delete from
> bar where b = (select a from i);
> DELETE 1
>
> -Michel
>
>
>>
>> On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>>
>>> On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to pass some values between queries in a multi-statement
>>>> transaction. For example, consider the following representative
>>>> multi-statement transaction:
>>>>
>>>> begin;
>>>> select * from table1 t1, table2 t2 where t1.cid = t2.cid;
>>>> delete from table1 where cid in
>>>> (values-to-be-populated-from-the-previous-query);
>>>> commit;
>>>>
>>>
>>> There are few if any situations where you need to immediately and
>>> completely pass all values from one query to another in the same
>>> transaction where the queries cannot just be combined into a single
>>> statement.  Your representative example is one that is easily combined into
>>> a single statement.
>>>
>>> Now, the stuff you are trying seems to indicate you are trying to do
>>> something in C, inside the engine itself, with all of this.  If that is the
>>> case you may want to be more clear as to what you are attempting to do.
>>> But as far as server SQL goes the only persistence area are
>>> tables/relations - including temporary ones.
>>>
>>> David J.
>>>
>>

Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee

Well the DELETE is not going to work as c.cid will error as undefined.

> Yes, that's a typo. I haven't tested it out before typing; just wanted to
convey the general idea.

-SB

On Thu, Apr 18, 2019 at 10:50 AM Adrian Klaver 
wrote:

> On 4/17/19 5:30 PM, Souvik Bhattacherjee wrote:
> > There are few if any situations where you need to immediately and
> > completely pass all values from one query to another in the same
> > transaction where the queries cannot just be combined into a single
> > statement.  Your representative example is one that is easily combined
> > into a single statement.
> >
> >  > What if I need the result of the join to be stored into table3 as
> > well as the tuples that participated in the query to be deleted from
> > table1. The following can be done without the need to transfer values
> > from the previous query into the next:
> >
> > begin;
> > insert into table3 (id, attr1, attr2) (select t1.cid, t1.empname,
> > t2.dept from table1 t1, table2 t2 where t1.cid = t2.cid);
> > delete from table1 where cid in (select c.cid from table1 t1, table2 t2
> > where t1.cid = t2.cid);
>
> Well the DELETE is not going to work as c.cid will error as undefined.
>
> > commit;
> >
> > However note that we have to perform the join twice, which is not
> > efficient. Now to make things worse, increase the number of tables to
> > join while imposing the requirement of tuple deletion to apply to all or
> > to a subset of the tables that participate in join.
>
> You might want to take a look at CTE's:
>
> https://www.postgresql.org/docs/11/queries-with.html
>
> >
> > Now, the stuff you are trying seems to indicate you are trying to do
> > something in C, inside the engine itself, with all of this.  If that is
> > the case you may want to be more clear as to what you are attempting to
> > do.  But as far as server SQL goes the only persistence area are
> > tables/relations - including temporary ones.
> >
> >> I'm trying to modify the engine here.
> >
> > -SB
> >
> > On Wed, Apr 17, 2019 at 6:16 PM David G. Johnston
> > mailto:david.g.johns...@gmail.com>> wrote:
> >
> >
> > On Wed, Apr 17, 2019 at 3:04 PM Souvik Bhattacherjee
> > mailto:kivu...@gmail.com>> wrote:
> >
> > Hello,
> >
> > I'm trying to pass some values between queries in a
> > multi-statement transaction. For example, consider the following
> > representative multi-statement transaction:
> >
> > begin;
> > select * from table1 t1, table2 t2 where t1.cid = t2.cid;
> > delete from table1 where cid in
> > (values-to-be-populated-from-the-previous-query);
> > commit;
> >
> >
> > There are few if any situations where you need to immediately and
> > completely pass all values from one query to another in the same
> > transaction where the queries cannot just be combined into a single
> > statement.  Your representative example is one that is easily
> > combined into a single statement.
> >
> > Now, the stuff you are trying seems to indicate you are trying to do
> > something in C, inside the engine itself, with all of this.  If that
> > is the case you may want to be more clear as to what you are
> > attempting to do.  But as far as server SQL goes the only
> > persistence area are tables/relations - including temporary ones.
> >
> > David J.
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: Method to pass data between queries in a multi-statement transaction

2019-04-18 Thread Souvik Bhattacherjee

Or just add pid to table3...

> That's an application requirement. So pid cannot be added at will to
table3.

How much development and maintenance effort are you willing to spend here
to gain what is likely to amount to only a bit of efficiency?  Many things
are possible if you are going to modify the server code but why add grief?

> How much development / maintenance effort do you anticipate for
implementing this feature? This is something that my application would need
so I'm willing to dedicate some time to it.

By the way, I figured that the hash table deallocation issue could be
resolved (although in an inefficient way) by serializing the data to a
string and then copying that value into the hash table during insertion.
However the hash table is still visible to all the transactions I suppose
and as a result needs to be locked. Just wanted to let you know that I have
initialized the hash table within the PostgresMain() method and the hash
table is declared as an extern variable, which I anticipate to be accessed
by many methods. How difficult is it to make a hash table (or any data
structure) private to the current transaction so that I do not have to
resort to locking?

-SB


On Thu, Apr 18, 2019 at 5:34 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, Apr 18, 2019 at 9:03 AM Souvik Bhattacherjee 
> wrote:
>
>> Thanks Michel.
>>
>> However this only works if a is an unique attribute in the table that
>> would help us to identify tuples that participated in the join. Consider
>> the following join:
>>
>> insert into table3 (id, level, empname, salary)
>> (select  t0.cid, t0.level, t1.empname, t2.salary from table0 t0, table1
>> t1, table2 t2 where t0.cid = t1.cid and t1.pid = t2.pid);
>>
>> Now if I want to delete those tuples from table2 that satisfied the join
>> condition, I need to execute the join again with additional attributes.
>>
>
> Or just add pid to table3...
>
>> Also note that based on query plan, i.e. whether table0 and table1 were
>> joined first followed by table1 and table2, we have to execute one
>> additional join to get the tuples in table2 that satisfied the join
>> condition (t1.pid = t2.pid).
>>
>
> ???
>
>> Getting that information while the query is executed may not be
>> difficult. There are other use cases in my application that require me to
>> transfer the data from one query to the next within a transaction.
>>
>
> There may be some that benefit to some degree but its likely that you can
> write the application and queries in such a way to avoid a hard requirement.
>
> Thus, what I'm looking for here is way to store the information and then
>> pass that information to the next query efficiently.
>>
> For example, is it possible to define a struct of my choice, private to
>> the current transaction, that would store the data and then pass it around
>> to the next query in the transaction without having to materialize that
>> struct (or deal with concurrency issues as in the hash table approach
>> mentioned earlier) .
>>
>
> How much development and maintenance effort are you willing to spend here
> to gain what is likely to amount to only a bit of efficiency?  Many things
> are possible if you are going to modify the server code but why add grief?
>
> David J.
>
>

Is it possible to store the output of EXPLAIN into a table

2019-04-22 Thread Souvik Bhattacherjee

Hi,

I was just wondering if it is possible to store the output of EXPLAIN into
a table.

create temp table mytab as (select * from (explain select * from table1
where attr = 5) t);


Unfortunately, the query above does not work.


Thanks,

-Souvik

Quick hack for fetching the contents of a temp table

2019-04-23 Thread Souvik Bhattacherjee

Hi,

Is there a quick way to fetch the contents of a temp table from within
postgres server code?

The temp table has a single column and there is a single value in the
column. Thus, I'm using the temp table as a variable to store a string.

begin;
create temp table mytemp(metastr char(1024)) on commit drop;

The metastr attribute in the temp table is populated by a function as shown
below.

insert into mytemp(metastr) (select somefunction1());

I need to use the return value of the function somefunction1() to modify a
query and execute it.

While it is easy to fetch the desired result using plpgsql functions(),
however modifying and rewriting the query using the plpgsql does not seem
straightforward. Therefore I have already written the query modification
function within the postgres server since I need to use the results from
pg_parse_query() method.

I stepped through the "select metastr from mytemp;" query only to find that
the process of fetching a tuple and the corresponding attribute is a bit
convoluted and I was wondering if something in the lines of this code is
possible: https://www.postgresql.org/docs/11/xfunc-c.html#id-1.8.3.13.10

>From this code, it appears that the tuple has already been provided in t
through the statement:

HeapTupleHeader  t = PG_GETARG_HEAPTUPLEHEADER(0);

and before that through the query:

SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp
WHERE name = 'Bill' OR name = 'Sam';

What I'm looking for here is a method to fetch the tuple from mytemp
within the postgres server code and then extract the value from the
metastr attribute into a char[].
Since all statements above are within a transaction block, I think we
do not need to have read locks while accessing temp table.

Thanks,

-SB

Re: Is it possible to store the output of EXPLAIN into a table

2019-04-23 Thread Souvik Bhattacherjee

Thanks, works fine!

-Souvik

On Mon, Apr 22, 2019 at 12:02 PM Tom Lane  wrote:

> Souvik Bhattacherjee  writes:
> > I was just wondering if it is possible to store the output of EXPLAIN
> into
> > a table.
>
> EXPLAIN won't do that directly, but you could make a plpgsql function
> along the lines of
>
>   for t in execute explain ...
>  return next t;
>
> (too lazy to check the exact details here, but I believe you can find
> related examples in our regression tests) and then store the function's
> result into a table.
>
> regards, tom lane
>

Re: Quick hack for fetching the contents of a temp table

2019-04-23 Thread Souvik Bhattacherjee

This question seems related to the other thread you started on April 18th?

> This is related but perhaps a little complicated than the previous one,
since this involves modifying the query that would be executed.

Several of us suggested you use a CTE to materialize your intermediate
values.  Did you try that?

> I'm working with a temp table here used for storing the intermediate
results, which is one of the alternate solutions provided the other day.
Not sure how different this would be from using a CTE.

Without going into the motivation behind the application, what I'm
interested in here is to modify the query based on a prior query. A simple
example follows:

Q1: select * from table1 t1, table2 t2 where t1.cid = t2.cid and attr1 = 5
and attr2 = 10;

Step 1: extract the filter conditions on table1 and table2 from Q1. So we
perform the following:
explain select * from table1 t1, table2 t2 where t1.cid = t2.cid and attr1
= 5 and attr2 = 10;
and extract the filter conditions. Let's say, (attr1 = 5) belongs to table1
and (attr2 = 10) belongs to table2.
[We store the extracted filter conditions into a temp table]

Q2. insert into  table3(cid, attr1, attr3) (select * from table1 where
attr3 = 7);

Step 2: Our intent here is to use the extracted filter condition from
table1 and substitute it in Q2. Thus after substitution, Q2 would look like:

insert into table3(cid, attr1, attr3) (select * from table1 where attr1 = 5
and attr3 = 7);

We want to execute all of it within a transaction, so we have the following
multi-statement transaction:

begin;
create temp table mytemp(metastr char(1024)) on commit drop;
insert into mytemp(metastr) (select somefunction1('explain select * from
table1 t1, table2 t2 where t1.cid = t2.cid and attr1 = 5 and attr2 = 10'));
rewriteq table1 insert into  table3(cid, attr1, attr3) (select * from
table1 where attr3 = 7);
commit;

Note this line: *rewriteq table1* insert into  table3(cid, attr1, attr3)
(select * from table1 where attr3 = 7);

We have a flag *rewriteq* that indicates this query needs to be rewritten
and we have an argument *table1 *that indicates the filter condition in
table1 needs to be updated. We also need the contents of mytemp from where
we fetch the filter conditions extracted, if any. While this is not exactly
a rewrite, more like an update to the query, still we use the term rewrite
here. Thus the actual query that will be processed by the postgres is

insert into table3(cid, attr1, attr3) (select * from table1 where attr1 = 5
and attr3 = 7);

Hope this makes the question clearer.

-SB

On Tue, Apr 23, 2019 at 8:38 PM Michel Pelletier 
wrote:

> On Tue, Apr 23, 2019 at 2:56 PM Souvik Bhattacherjee 
> wrote:
>
>> Hi,
>>
>> I need to use the return value of the function somefunction1() to modify
>> a query and execute it.
>>
>> While it is easy to fetch the desired result using plpgsql functions(),
>> however modifying and rewriting the query using the plpgsql does not seem
>> straightforward. Therefore I have already written the query modification
>> function within the postgres server since I need to use the results from
>> pg_parse_query() method.
>>
>>
> This question seems related to the other thread you started on April
> 18th?  Several of us suggested you use a CTE to materialize your
> intermediate values.  Did you try that?  The example you provided was not
> complete, and I asked for a reproducible example of what you were trying to
> achieve and you didn't provide one.  I'm all for encouraging people to hack
> in the core, but it seems like you have the impression that your goal is
> not possible in SQL when it very likely is.  Changing the core is not only
> going to be a high maintenance burden on you, but it also implies you have
> a problem that no one else in 30 years of postgres development has had.
> What's more likely is that it is possible to do what you want in SQL and
> we'd like to help you, but we need more information and you need to try the
> suggestions and answer the questions we ask back of you.
>
> I'm happy to be wrong about this, because that means postgres can improve
> in some way, but first we need more information about what your problem
> actually is.
>
> -Michel
>

Allocating shared memory in Postgres

2019-07-03 Thread Souvik Bhattacherjee

Hi,

I need to allocate shared memory in Postgres 11.0 outside the
initialization phase. In order to achieve this I have done the following:

- increased the amount of shared memory by increasing the value of size in
CreateSharedMemoryAndSemaphores (int port) in ipci.c. I have made sure that
the amount of memory that I need, say m << M, where M is the amount of
additional shared memory that I have allocated.

- during a particular query, where I need to allocate shared memory (which
is a function of the sizes of the tables in the query), I invoke
ShmemInitStruct() in shmem.c

This seems to work in my case, although I haven't tested it extensively.

My concern here is that when I go through the NOTES in shmem.c, I find the
following line:

Fixed-size structures contain things like global variables for a module and
should never be allocated after the shared memory initialization phase.

I'm allocating a shared array data structure through ShmemInitStruct and
I'm not sure if the lines above apply to my case, since I'm doing the
allocation during a query.

Any help/clarifications in this regard would be appreciated.

Best,
-SB

Re: Allocating shared memory in Postgres

2019-07-03 Thread Souvik Bhattacherjee

For starters, what happens if two backends do this concurrently?

> I'm assuming here that a single backend process handles all queries from
a given client. In case of parallel queries, the master process will be
responsible for allocating the shared memory and not the workers. Please
let me know if this is not something that you implied by two backends.

Even with only one backend, if you do a query that requires X space, and
then you do another query that requires X+1 space, what's going to happen?

> In my application, every query that allocates shared memory is wrapped in
a separate txn, and the life of the shared memory is for the duration of
the query. The only purpose of allocating shared memory here is to make the
memory segment visible to the worker processes for that particular query.
No other txns/query actually accesses this shared memory. Also, when I
allocate shared memory, the txn id is used as a key to the ShmemIndex to
differentiate between two concurrent shared memory allocation requests.

Recent PG releases have a "DSM" mechanism for short-lived (query lifespan,
typically) shared memory that's separate from the core shmem pool.  That
might suit your needs better.   The system design is really not friendly
to demanding more core shmem after postmaster start.

> Yes, I understand that "DSM" mechanisms exist. But I wanted to know if
the approach that I had outlined will work even if there are certain
drawbacks to it such as
1. overestimating the initial shared memory size that needs to be
allocated,
2. not able to free the shared memory after use

Best,
-SB

On Wed, Jul 3, 2019 at 4:17 PM Tom Lane  wrote:

> Souvik Bhattacherjee  writes:
> > I need to allocate shared memory in Postgres 11.0 outside the
> > initialization phase. In order to achieve this I have done the following:
> > - during a particular query, where I need to allocate shared memory
> (which
> > is a function of the sizes of the tables in the query), I invoke
> > ShmemInitStruct() in shmem.c
>
> This seems like a pretty horrid idea.  For starters, what happens if two
> backends do this concurrently?  Even with only one backend, if you do a
> query that requires X space, and then you do another query that requires
> X+1 space, what's going to happen?
>
> Recent PG releases have a "DSM" mechanism for short-lived (query lifespan,
> typically) shared memory that's separate from the core shmem pool.  That
> might suit your needs better.   The system design is really not friendly
> to demanding more core shmem after postmaster start.
>
> regards, tom lane
>

Running concurrent txns and measuring the timings in Postgres

Re: Running concurrent txns and measuring the timings in Postgres

Re: Running concurrent txns and measuring the timings in Postgres

Re: Running concurrent txns and measuring the timings in Postgres

Re: Running concurrent txns and measuring the timings in Postgres

Bulk Inserts

Re: Bulk Inserts

Re: Bulk Inserts

Re: Bulk Inserts

Re: Bulk Inserts

Method to pass data between queries in a multi-statement transaction

Re: Method to pass data between queries in a multi-statement transaction

Re: Method to pass data between queries in a multi-statement transaction

Re: Method to pass data between queries in a multi-statement transaction

Re: Method to pass data between queries in a multi-statement transaction

Re: Method to pass data between queries in a multi-statement transaction

Is it possible to store the output of EXPLAIN into a table

Quick hack for fetching the contents of a temp table

Re: Is it possible to store the output of EXPLAIN into a table

Re: Quick hack for fetching the contents of a temp table

Allocating shared memory in Postgres

Re: Allocating shared memory in Postgres

22 matches

Site Navigation

Mail list logo

Footer information