Re: IO related waits

2024-09-18 Thread veem v
On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
wrote:

> On 9/17/24 12:34, veem v wrote:
> >
>
> It does if autocommit is set in the client, that is common to other
> databases also:
>
> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>
>
> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>
>
> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>
> You probably need to take a closer look at the client/driver you are
> using and the code that interacting with it.
>
> In fact I would say you need to review the entire data transfer process
> to see if there are performance gains that can be obtained without
> adding an entirely new async component.
>
>
>
You were spot on. When we turned off the "auto commit" we started seeing
less number of commits as per the number of batches.

However we also started seeing deadlock issues. We have foreign key
relationships between the tables and during the batch we do insert into the
parent first and then to the child , but this does happen from multiple
sessions for different batches. So why do we see below error, as we
ensure in each batch we first insert into parent and then into the child
tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 10443 waits for ShareLock on transaction 220972157;
blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked by
process 10443.
  Hint: See server log for query details.
  Where: while inserting index tuple (88736,28) in relation
"TAB1_p2024_08_29"


Re: IO related waits

2024-09-18 Thread veem v
On Thu, 19 Sept 2024 at 02:01, veem v  wrote:

>
> On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
> wrote:
>
>> On 9/17/24 12:34, veem v wrote:
>> >
>>
>> It does if autocommit is set in the client, that is common to other
>> databases also:
>>
>> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>>
>>
>> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>>
>>
>> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>>
>> You probably need to take a closer look at the client/driver you are
>> using and the code that interacting with it.
>>
>> In fact I would say you need to review the entire data transfer process
>> to see if there are performance gains that can be obtained without
>> adding an entirely new async component.
>>
>>
>>
> You were spot on. When we turned off the "auto commit" we started seeing
> less number of commits as per the number of batches.
>
> However we also started seeing deadlock issues. We have foreign key
> relationships between the tables and during the batch we do insert into the
> parent first and then to the child , but this does happen from multiple
> sessions for different batches. So why do we see below error, as we
> ensure in each batch we first insert into parent and then into the child
> tables?
>
> caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
>   Detail: Process 10443 waits for ShareLock on transaction 220972157;
> blocked by process 10454.
> Process 10454 waits for ShareLock on transaction 220972155; blocked by
> process 10443.
>   Hint: See server log for query details.
>   Where: while inserting index tuple (88736,28) in relation
> "TAB1_p2024_08_29"
>

As we are able to get hold of one session, we see "insert into " was blocked by "insert into ". And
the "insert into   " was experiencing a "client
read" wait event. Still unable to understand why it's happening and how to
fix it?


Re: CREATE DATABASE command concurrency

2024-09-18 Thread Muhammad Usman Khan
Hi,
In PostgreSQL, it's safe to run CREATE DATABASE at the same time from
different places. If two commands try to create the same database, one will
succeed, and the other will safely fail without causing any problems or
incomplete database creation.

On Wed, 18 Sept 2024 at 19:08, Wizard Brony  wrote:

> What are the concurrency guarantees of the CREATE DATABASE command? For
> example, is the CREATE DATABASE command safe to be called concurrently such
> that one command succeeds and the other reliably fails without corruption?
>
>
>


Re: CREATE DATABASE command concurrency

2024-09-18 Thread Christophe Pettus



> On Sep 17, 2024, at 14:52, Wizard Brony  wrote:
> 
> What are the concurrency guarantees of the CREATE DATABASE command? For 
> example, is the CREATE DATABASE command safe to be called concurrently such 
> that one command succeeds and the other reliably fails without corruption?

The concern is that two different sessions issue a CREATE DATABASE command 
using the same name?  In that case, it can be relied upon that one will succeed 
(unless it fails for some other reason, like lacking permissions), and the 
other will receive an error that the database already exists.



Re: load fom csv

2024-09-18 Thread Rob Sargent
On Sep 18, 2024, at 6:39 AM, Andy Hartman  wrote:psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand I'm wondering if it's waiting on P/w ?Thanks.Very likely.  Can you show the authentication mechanisms used (pg_hba)?On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman  wrote:I'll echo vars and see if something looks strange.THanks.On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:

> On Sep 17, 2024, at 12:25 PM, Adrian Klaver  wrote:
> 
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
> 
> Then you are going to need to use whatever debugging tools PowerShell has available to step through the script to figure out where the problem is.
> 
> 

Visual Studio can run/debug PS 1 files. I am not at my desk but have done ps1 oneliner queries against mssql

Suggest echoing ALL vars used in psql command

> 
> 




Re: load fom csv

2024-09-18 Thread Adrian Klaver

On 9/18/24 06:29, Rob Sargent wrote:




On Sep 18, 2024, at 6:39 AM, Andy Hartman  wrote:


psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand

I'm wondering if it's waiting on P/w ?


In a previous post I suggested:

"
To work through this you need to try what I call the crawl/walk/run 
process. In this case that is:


1) Crawl. Connect using psql and run the \copy in it with hard coded values.

2) Walk. Use psql with the -c argument and supply the command again with
hard coded values

3) Run. Then use PowerShell and do the variable substitution.
"

Did you do this with the same command at each stage? If so at either 1) 
or 2) where you asked for a password?


In a later posted I asked:

"1) Are you logging connections/disconnection per?:


https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT


If not do so as it will show you if a connection is being attempted.
"

Did you enable connection logging?

Did you look at the Postgres log?

If both the answers are yes you should see something like the below in 
case of password authentication:


2024-09-18 07:47:38.692 PDT [8090] [unknown]@[unknown] LOG:  connection 
received: host=127.0.0.1 port=44840
2024-09-18 07:47:42.410 PDT [8095] [unknown]@[unknown] LOG:  connection 
received: host=127.0.0.1 port=44848
2024-09-18 07:47:42.414 PDT [8095] aklaver@test LOG:  connection 
authenticated: identity="aklaver" method=md5 
(/etc/postgresql/16/main/pg_hba.conf:125)
2024-09-18 07:47:42.414 PDT [8095] aklaver@test LOG:  connection 
authorized: user=aklaver database=test application_name=psql SSL enabled 
(protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256)





Thanks.


Very likely.  Can you show the authentication
mechanisms used (pg_hba)?


On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman > wrote:


I'll echo vars and see if something looks strange.

THanks.

On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent mailto:robjsarg...@gmail.com>> wrote:



> On Sep 17, 2024, at 12:25 PM, Adrian Klaver
mailto:adrian.kla...@aklaver.com>>
wrote:
>
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
>
> Then you are going to need to use whatever debugging tools
PowerShell has available to step through the script to figure
out where the problem is.
>
>

Visual Studio can run/debug PS 1 files. I am not at my desk
but have done ps1 oneliner queries against mssql

Suggest echoing ALL vars used in psql command

>
>



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: CREATE DATABASE command concurrency

2024-09-18 Thread Tom Lane
Christophe Pettus  writes:
>> On Sep 17, 2024, at 14:52, Wizard Brony  wrote:
>> What are the concurrency guarantees of the CREATE DATABASE command? For 
>> example, is the CREATE DATABASE command safe to be called concurrently such 
>> that one command succeeds and the other reliably fails without corruption?

> The concern is that two different sessions issue a CREATE DATABASE command 
> using the same name?  In that case, it can be relied upon that one will 
> succeed (unless it fails for some other reason, like lacking permissions), 
> and the other will receive an error that the database already exists.

This is true, but it's possibly worth noting that the specific error
message you get could vary.  Normally it'd be something like

regression=# create database postgres;
ERROR:  database "postgres" already exists

but in a race condition it might look more like "duplicate key value
violates unique constraint".  In the end we rely on the system
catalogs' unique indexes to detect and prevent race conditions of
this sort.

regards, tom lane




Re: load fom csv

2024-09-18 Thread Andy Hartman
psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand

I'm wondering if it's waiting on P/w ?

Thanks.

On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman 
wrote:

> I'll echo vars and see if something looks strange.
>
> THanks.
>
> On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:
>
>>
>>
>> > On Sep 17, 2024, at 12:25 PM, Adrian Klaver 
>> wrote:
>> >
>> > On 9/17/24 09:21, Andy Hartman wrote:
>> >> The command work outside of powershell  yes
>> >
>> > Then you are going to need to use whatever debugging tools PowerShell
>> has available to step through the script to figure out where the problem is.
>> >
>> >
>>
>> Visual Studio can run/debug PS 1 files. I am not at my desk but have done
>> ps1 oneliner queries against mssql
>>
>> Suggest echoing ALL vars used in psql command
>>
>> >
>> >
>>
>


How batch processing works

2024-09-18 Thread Lok P
Hello,
Saw multiple threads around the same , so I want some clarification. As we
know row by row is slow by slow processing , so in heavy write systems(say
the client app is in Java) , people asked to do DMLS in batches rather in a
row by row fashion to minimize the chatting or context switches between
database and client which is resource intensive. What I understand is that
, a true batch processing means the client has to collect all the input
bind values and  prepare the insert statement and submit to the database at
oneshot and then commit.

What it means actually and if we divide the option as below, which method
truly does batch processing or there exists some other method for doing the
batch processing considering postgres as backend database?

I understand, the first method below is truly a row by row processing in
which context switches happen between client and database with each row,
whereas the second method is just batching the commits but not a true batch
processing as it will do the same amount of context switching between the
database and client. But regarding the third and fourth method, will both
execute in a similar fashion in the database with the same number of
context switches? If any other better method exists to do these inserts in
batches? Appreciate your guidance.


CREATE TABLE parent_table (
id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE child_table (
id SERIAL PRIMARY KEY,
parent_id INT REFERENCES parent_table(id),
value TEXT
);


Method-1

insert into parent_table values(1,'a');
commit;
insert into parent_table values(2,'a');
commit;
insert into child_table values(1,1,'a');
Commit;
insert into child_table values(1,2,'a');
commit;

VS

Method-2

insert into parent_table values(1,'a');
insert into parent_table values(2,'a');
insert into child_table values(1,1,'a');
insert into child_table values(1,2,'a');
Commit;

VS

Method-3

with
 a as ( insert into parent_table values(1,'a')  )
, a1 as (insert into parent_table values(2,'a') )
, b as (insert into child_table values(1,1,'a')  )
 , b1 as  (insert into child_table values(1,2,'a')  )
select;
commit;

Method-4

INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
commit;

Regards
Lok


Re: How batch processing works

2024-09-18 Thread Ron Johnson
On Thu, Sep 19, 2024 at 1:31 AM Lok P  wrote:

> Hello,
> Saw multiple threads around the same , so I want some clarification. As we
> know row by row is slow by slow processing , so in heavy write systems(say
> the client app is in Java) , people asked to do DMLS in batches rather in a
> row by row fashion to minimize the chatting or context switches between
> database and client which is resource intensive. What I understand is that
> , a true batch processing means the client has to collect all the input
> bind values and  prepare the insert statement and submit to the database at
> oneshot and then commit.
>
> What it means actually and if we divide the option as below, which method
> truly does batch processing or there exists some other method for doing the
> batch processing considering postgres as backend database?
>
> I understand, the first method below is truly a row by row processing in
> which context switches happen between client and database with each row,
> whereas the second method is just batching the commits but not a true batch
> processing as it will do the same amount of context switching between the
> database and client. But regarding the third and fourth method, will both
> execute in a similar fashion in the database with the same number of
> context switches? If any other better method exists to do these inserts in
> batches? Appreciate your guidance.
>
>
> CREATE TABLE parent_table (
> id SERIAL PRIMARY KEY,
> name TEXT
> );
>
> CREATE TABLE child_table (
> id SERIAL PRIMARY KEY,
> parent_id INT REFERENCES parent_table(id),
> value TEXT
> );
>
>
> Method-1
>
> insert into parent_table values(1,'a');
> commit;
> insert into parent_table values(2,'a');
> commit;
> insert into child_table values(1,1,'a');
> Commit;
> insert into child_table values(1,2,'a');
> commit;
>
> VS
>
> Method-2
>
> insert into parent_table values(1,'a');
> insert into parent_table values(2,'a');
> insert into child_table values(1,1,'a');
> insert into child_table values(1,2,'a');
> Commit;
>

As a former "DP" programmer, from an application point of view, this is
absolutely batch programming.

My experience was with COBOL and C, though, which were low overhead.  From
what I've seen in PG log files, JDBC is astoundingly chatty.

[snip]

>
> Method-4
>
> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
> commit;
>

If I knew that I had to load a structured input data file (even if it had
parent and child records), this is how I'd do it (but probably first try
and see if "in-memory COPY INTO" is such a thing).

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


CREATE DATABASE command concurrency

2024-09-18 Thread Wizard Brony
What are the concurrency guarantees of the CREATE DATABASE command? For 
example, is the CREATE DATABASE command safe to be called concurrently such 
that one command succeeds and the other reliably fails without corruption?




Re: IO related waits

2024-09-18 Thread Adrian Klaver




On 9/18/24 1:40 PM, veem v wrote:





You were spot on. When we turned off the "auto commit" we started
seeing less number of commits as per the number of batches.

However we also started seeing deadlock issues. We have foreign key
relationships between the tables and during the batch we do insert
into the parent first and then to the child , but this does happen
from multiple sessions for different batches. So why do we see below
error, as we ensure in each batch we first insert into parent and
then into the child tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
   Detail: Process 10443 waits for ShareLock on transaction
220972157; blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked
by process 10443.
   Hint: See server log for query details.
   Where: while inserting index tuple (88736,28) in relation
"TAB1_p2024_08_29"


As we are able to get hold of one session, we see "insert into partition table>" was blocked by "insert into ". 
And the "insert into  " was experiencing a 
"client read" wait event. Still unable to understand why it's happening 
and how to fix it?




This needs clarification.

1) To be clear when you refer to parent and child that is:
FK
parent_tbl.fld <--> child_tbl.fld_fk

not parent and child tables in partitioning scheme?

2) What are the table schemas?

3) What is the code that is generating the error?


Overall it looks like this process needs a top to bottom audit to map 
out what is actually being done versus what needs to be done.





--
Adrian Klaver
adrian.kla...@aklaver.com