How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Istvan Soos
I'm one of the developers of the Dart-language Postgresql client
package. I am working on adding a feature that may set the
statement_timeout value before a session or a query as the client
requests it, however, I'm stuck with the following error:

setup:
CREATE TABLE t (id INT PRIMARY KEY);
INSERT INTO t (id) values (1);

client-1:
BEGIN;
SELECT * FROM t WHERE id=1 FOR UPDATE;


client-2:
BEGIN;
SET statement_timeout TO 1000;
SELECT * FROM t WHERE id=1 FOR UPDATE;


After that any query I send through client-2 will get me the following error:

Severity.error 25P02: current transaction is aborted, commands ignored
until end of transaction block

Not even ROLLBACK or COMMIT is working. It is the same for both simple
and extended query protocol. Does the client need to send a non-query
message to cleanup the transaction state? Or is this connection now
gone for good?

Thanks,
  Istvan




Re: How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Laurenz Albe
On Sun, 2024-09-08 at 12:56 +0200, Istvan Soos wrote:
> I'm one of the developers of the Dart-language Postgresql client
> package. I am working on adding a feature that may set the
> statement_timeout value before a session or a query as the client
> requests it, however, I'm stuck with the following error:
> 
> setup:
> CREATE TABLE t (id INT PRIMARY KEY);
> INSERT INTO t (id) values (1);
> 
> client-1:
> BEGIN;
> SELECT * FROM t WHERE id=1 FOR UPDATE;
> 
> 
> client-2:
> BEGIN;
> SET statement_timeout TO 1000;
> SELECT * FROM t WHERE id=1 FOR UPDATE;
> 
> 
> After that any query I send through client-2 will get me the following error:
> 
> Severity.error 25P02: current transaction is aborted, commands ignored
> until end of transaction block
> 
> Not even ROLLBACK or COMMIT is working. It is the same for both simple
> and extended query protocol. Does the client need to send a non-query
> message to clean up the transaction state? Or is this connection now
> gone for good?

ROLLBACK and COMMIT are working: they end the transaction.
It is the atomicity guarantee of database transactions: either all statements
succeed, or all fail.

I am aware that other databases have a "statement rollback" feature that allows
the transaction to proceed after an error, but PostgreSQL doesn't.

To handle the failure of a statement while allowing the transaction to proceed,
you can use savepoints.  But be warned: don't even think of setting a savepoint
before each statement.  That would affect statement performance severely.

Yours,
Laurenz Albe




Re: How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Istvan Soos
On Sun, Sep 8, 2024 at 1:19 PM Laurenz Albe  wrote:
> ROLLBACK and COMMIT are working: they end the transaction.

I have this reproduction test, and ROLLBACK does fail:
https://github.com/isoos/postgresql-dart/pull/363/files#diff-4547e49b04ec8280fb8f4f1ebf695b77f9a2d9a4ac9bcfd685bcd570a46baa80R122

I've checked and nothing else is sent on the protocol, yet, for the
rollback statement it gets the 25P02 error.

> It is the atomicity guarantee of database transactions: either all statements
> succeed, or all fail.

Yeah, I thought so, that's why I'm struggling to see what's missing.

> To handle the failure of a statement while allowing the transaction to 
> proceed,
> you can use savepoints.  But be warned: don't even think of setting a 
> savepoint
> before each statement.  That would affect statement performance severely.

As the writer of the client library, I don't have the luxury of
telling users they need to change their way, hence I'm looking for any
pointer on the protocol level.

Thanks,
  Istvan




Re: How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Laurenz Albe
On Sun, 2024-09-08 at 15:01 +0200, Istvan Soos wrote:
> On Sun, Sep 8, 2024 at 1:19 PM Laurenz Albe  wrote:
> > ROLLBACK and COMMIT are working: they end the transaction.
> 
> I have this reproduction test, and ROLLBACK does fail:
> https://github.com/isoos/postgresql-dart/pull/363/files#diff-4547e49b04ec8280fb8f4f1ebf695b77f9a2d9a4ac9bcfd685bcd570a46baa80R122
> 
> I've checked and nothing else is sent on the protocol, yet, for the
> rollback statement it gets the 25P02 error.

You must be doing something wrong then, because it works:

test=> START TRANSACTION;
START TRANSACTION
test=*> SELECT 1 / 0;
ERROR:  division by zero
test=!> SELECT 42;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
test=!> SELECT 42;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
test=!> ROLLBACK;
ROLLBACK
test=> 

> 
> > To handle the failure of a statement while allowing the transaction to 
> > proceed,
> > you can use savepoints.  But be warned: don't even think of setting a 
> > savepoint
> > before each statement.  That would affect statement performance severely.
> 
> As the writer of the client library, I don't have the luxury of
> telling users they need to change their way, hence I'm looking for any
> pointer on the protocol level.

I understand.  But there is no way to influence this behavior.

Perhaps the best option for your library is not to try to "mask" the way 
PostgreSQL
behaves in this case.  It is then up to the users of the library to do the 
right thing.

Yours,
Laurenz Albe




Re: How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Istvan Soos
On Sun, Sep 8, 2024 at 3:41 PM Laurenz Albe  wrote:
> You must be doing something wrong then, because it works:

And you are right, I was doing something wrong:

The combination of the change and the library internals triggered a
SET statement_timeout TO 3000; before the ROLLBACK was sent, and the
error message I got was not clear on which statement triggered it.

I'm sorry for the noise.

Btw. Is there any guideline on how to write the client libraries or
recommendations/tests that a client library should follow?

Thanks,
  Istvan




Re: How to cleanup transaction after statement_timeout aborts a query?

2024-09-08 Thread Laurenz Albe
On Sun, 2024-09-08 at 16:02 +0200, Istvan Soos wrote:
> On Sun, Sep 8, 2024 at 3:41 PM Laurenz Albe  wrote:
> > You must be doing something wrong then, because it works:
> 
> And you are right, I was doing something wrong:
> 
> [...]
> 
> I'm sorry for the noise.

No problem; I'm glad you could find the stumbling stone.

> Btw. Is there any guideline on how to write the client libraries or
> recommendations/tests that a client library should follow?

I'm afraid this question is too generic for an answer...
I'd say it should be easy and intuitive to use, and should be as robust
as possible (return meaningful error messages rather than crash).

Yours,
Laurenz Albe




Re: Faster data load

2024-09-08 Thread Peter J. Holzer
On 2024-09-06 01:44:00 +0530, Lok P wrote:
> We are having a requirement to create approx 50 billion rows in a partition
> table(~1 billion rows per partition, 200+gb size daily partitions) for a
> performance test. We are currently using ' insert into  table_partition>
> select.. From  or ;' method .
> We have dropped all indexes and constraints First and then doing the load.
> Still it's taking 2-3 hours to populate one partition.

That seems quite slow. Is the table very wide or does it have a large
number of indexes?

> Is there a faster way to achieve this? 
> 
> Few teammate suggesting to use copy command and use file load instead, which
> will be faster.

I doubt that.

I benchmarked several strategies for populating tables 5 years ago and
(for my test data and on our hardware at the time - YMMV) s simple
INSERT ... SELECT was more than twice as fast as 8 parallel COPY
operations (and about 8 times as fast as a single COPY).

Details will have changed since then (I should rerun that benchmark on
a current system), but I'd be surprised if COPY became that much faster
relative to INSERT ... SELECT.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Faster data load

2024-09-08 Thread Adrian Klaver

On 9/8/24 10:45, Peter J. Holzer wrote:

On 2024-09-06 01:44:00 +0530, Lok P wrote:

We are having a requirement to create approx 50 billion rows in a partition
table(~1 billion rows per partition, 200+gb size daily partitions) for a
performance test. We are currently using ' insert into 
select.. From  or ;' method .
We have dropped all indexes and constraints First and then doing the load.
Still it's taking 2-3 hours to populate one partition.


That seems quite slow. Is the table very wide or does it have a large
number of indexes?


Is there a faster way to achieve this?

Few teammate suggesting to use copy command and use file load instead, which
will be faster.


I doubt that.

I benchmarked several strategies for populating tables 5 years ago and
(for my test data and on our hardware at the time - YMMV) s simple
INSERT ... SELECT was more than twice as fast as 8 parallel COPY
operations (and about 8 times as fast as a single COPY). >
Details will have changed since then (I should rerun that benchmark on
a current system), but I'd be surprised if COPY became that much faster
relative to INSERT ... SELECT.


Yeah they seem to have changed a great deal. Though you are correct in 
saying COPY is not faster then INSERT..SELECT



select version();
  version 


---
 PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit



select count(*) from nyc_taxi_duckdb ;
  count
-
 2846722

ll -h nyc_taxi.csv
-rw-rw-r-- 1 aklaver aklaver 252M Sep  8 10:54 nyc_taxi.csv

insert into nyc_duplicate select * from nyc_taxi_duckdb ;
INSERT 0 2846722
Time: 7015.072 ms (00:07.015)

truncate  nyc_duplicate ;

\copy nyc_duplicate from 'nyc_taxi.csv' with csv header
COPY 2846722
Time: 8760.197 ms (00:08.760)

copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header;
COPY 2846722
Time: 7904.279 ms (00:07.904)

Just to see what the coming attraction offers:

select version();
   version 


--
 PostgreSQL 17rc1 (Ubuntu 17~rc1-1.pgdg22.04+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit



insert into nyc_duplicate select * from nyc_taxi_duckdb ;
INSERT 0 2846722
Time: 5315.878 ms (00:05.316)

\copy nyc_duplicate from 'nyc_taxi.csv' with csv header
COPY 2846722
Time: 10042.129 ms (00:10.042)

copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header;
COPY 2846722
Time: 8422.503 ms (00:08.423)



 hp



--
Adrian Klaver
adrian.kla...@aklaver.com





postgresql FDW vs dblink for DDL

2024-09-08 Thread Achilleas Mantzios

Hi

for remote DDL execution (such as CREATE TABLE) is dblink my only option?

thanks






Re: postgresql FDW vs dblink for DDL

2024-09-08 Thread Adrian Klaver

On 9/8/24 13:04, Achilleas Mantzios wrote:

Hi

for remote DDL execution (such as CREATE TABLE) is dblink my only option?


You will need to define in what context you are considering options.

For instance you can do remote DDL operations by passing a command or 
script via psql.




thanks






--
Adrian Klaver
adrian.kla...@aklaver.com