load fom csv

2024-09-16 Thread Andy Hartman
I'm trying to run this piece of code from Powershell and it just sits there
and never comes back. There are only 131 records in the csv.

$connectionString =
"Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
$copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"

psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand


how can I debug this?

Table layout

[image: image.png]


Re: load fom csv

2024-09-16 Thread Adrian Klaver

On 9/16/24 08:35, Andy Hartman wrote:
I'm trying to run this piece of code from Powershell and it just sits 
there and never comes back. There are only 131 records in the csv.


$connectionString = 
"Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"

$copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"

psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand


how can I debug this?


1) Look at the Postgres log.

2) Run the psql command outside PowerShell with hard coded connection 
values and -c command.




Table layout

image.png


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: load fom csv

2024-09-16 Thread Ron Johnson
On Mon, Sep 16, 2024 at 11:36 AM Andy Hartman 
wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
>
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>
> how can I debug this?
>

Maybe I'm missing something obvious, but where in the psql command are you
using $connectionString?

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


Re: load fom csv

2024-09-16 Thread Francisco Olarte
On Mon, 16 Sept 2024 at 17:36, Andy Hartman  wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
> how can I debug this?
>

I would start by adding -a and -e after "psql".

IIRC Powershell is windows, and in windows shell do not pass command words
preparsed as in *ix to the executable, but a single command line with the
executable must parse. Given the amount of quoting, -a and -e will let you
see the commands are properly sent, and if it is trying to read something
what it is.

I will also try to substitute the -c with a pipe. If it heals, it is
probably a quoting issue.

Also, I just caught Ron's message, and psql might be waiting for a password.

Francisco Olarte.


Re: load fom csv

2024-09-16 Thread Andy Hartman
2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
"image_classification_master" does not exist
2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;

On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte 
wrote:

>
>
> On Mon, 16 Sept 2024 at 17:36, Andy Hartman 
> wrote:
>
>> I'm trying to run this piece of code from Powershell and it just sits
>> there and never comes back. There are only 131 records in the csv.
>> $connectionString =
>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>> how can I debug this?
>>
>
> I would start by adding -a and -e after "psql".
>
> IIRC Powershell is windows, and in windows shell do not pass command words
> preparsed as in *ix to the executable, but a single command line with the
> executable must parse. Given the amount of quoting, -a and -e will let you
> see the commands are properly sent, and if it is trying to read something
> what it is.
>
> I will also try to substitute the -c with a pipe. If it heals, it is
> probably a quoting issue.
>
> Also, I just caught Ron's message, and psql might be waiting for a
> password.
>
> Francisco Olarte.
>
>


Re: load fom csv

2024-09-16 Thread Adrian Klaver

On 9/16/24 09:12, Andy Hartman wrote:
2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation 
"image_classification_master" does not exist
2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY  
Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;


I'm assuming this is from the Postgres log.

Best guess is the table name in the database is mixed case and needs to 
be double quoted in the command to preserve the casing.


See:

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

for why.



On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte 
mailto:fola...@peoplecall.com>> wrote:




On Mon, 16 Sept 2024 at 17:36, Andy Hartman mailto:hartman60h...@gmail.com>> wrote:

I'm trying to run this piece of code from Powershell and it just
sits there and never comes back. There are only 131 records in
the csv.
$connectionString =

"Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
$copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV
HEADER;"
psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
how can I debug this? 



I would start by adding -a and -e after "psql".

IIRC Powershell is windows, and in windows shell do not pass command
words preparsed as in *ix to the executable, but a single command
line with the executable must parse. Given the amount of quoting, -a
and -e will let you see the commands are properly sent, and if it is
trying to read something what it is.

I will also try to substitute the -c with a pipe. If it heals, it is
probably a quoting issue.

Also, I just caught Ron's message, and psql might be waiting for a
password.

Francisco Olarte.



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: load fom csv

2024-09-16 Thread Andy Hartman
It Looks correct.

$pgTable = "image_classification_master"




On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver 
wrote:

> On 9/16/24 09:12, Andy Hartman wrote:
> > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
> > "image_classification_master" does not exist
> > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
> > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;
>
> I'm assuming this is from the Postgres log.
>
> Best guess is the table name in the database is mixed case and needs to
> be double quoted in the command to preserve the casing.
>
> See:
>
>
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
>
> for why.
>
> >
> > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
> > mailto:fola...@peoplecall.com>> wrote:
> >
> >
> >
> > On Mon, 16 Sept 2024 at 17:36, Andy Hartman  > > wrote:
> >
> > I'm trying to run this piece of code from Powershell and it just
> > sits there and never comes back. There are only 131 records in
> > the csv.
> > $connectionString =
> >
>  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> > $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV
> > HEADER;"
> > psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
> > how can I debug this?
> >
> >
> > I would start by adding -a and -e after "psql".
> >
> > IIRC Powershell is windows, and in windows shell do not pass command
> > words preparsed as in *ix to the executable, but a single command
> > line with the executable must parse. Given the amount of quoting, -a
> > and -e will let you see the commands are properly sent, and if it is
> > trying to read something what it is.
> >
> > I will also try to substitute the -c with a pipe. If it heals, it is
> > probably a quoting issue.
> >
> > Also, I just caught Ron's message, and psql might be waiting for a
> > password.
> >
> > Francisco Olarte.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: load fom csv

2024-09-16 Thread Adrian Klaver

On 9/16/24 09:46, Andy Hartman wrote:


It Looks correct.

$pgTable = "image_classification_master"


Connect to the database with psql and look at the table name. I'm 
betting it is not image_classification_master. Instead some mixed or all 
upper case version of the name.


I don't use PowerShell or Windows for that matter these days so I can't 
be of much use on the script. I do suspect you will need to some 
escaping to get the table name properly quoted in the script. To work 
through this you need to try what I call the crawl/walk/run process. In 
this case that is:


1) Crawl. Connect using psql and run the \copy in it with hard coded values.

2) Walk. Use psql with the -c argument and supply the command again with 
hard coded values


3) Run. Then use PowerShell and do the variable substitution.






On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 9/16/24 09:12, Andy Hartman wrote:
 > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
 > "image_classification_master" does not exist
 > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
 > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;

I'm assuming this is from the Postgres log.

Best guess is the table name in the database is mixed case and needs to
be double quoted in the command to preserve the casing.

See:


https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
 


for why.

 >
 > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
 > mailto:fola...@peoplecall.com>
>> wrote:
 >
 >
 >
 >     On Mon, 16 Sept 2024 at 17:36, Andy Hartman
mailto:hartman60h...@gmail.com>
 >     >> wrote:
 >
 >         I'm trying to run this piece of code from Powershell and
it just
 >         sits there and never comes back. There are only 131
records in
 >         the csv.
 >         $connectionString =
 >   
  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"

 >         $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER
',' CSV
 >         HEADER;"
 >         psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
 >         how can I debug this?
 >
 >
 >     I would start by adding -a and -e after "psql".
 >
 >     IIRC Powershell is windows, and in windows shell do not pass
command
 >     words preparsed as in *ix to the executable, but a single command
 >     line with the executable must parse. Given the amount of
quoting, -a
 >     and -e will let you see the commands are properly sent, and
if it is
 >     trying to read something what it is.
 >
 >     I will also try to substitute the -c with a pipe. If it
heals, it is
 >     probably a quoting issue.
 >
 >     Also, I just caught Ron's message, and psql might be waiting
for a
 >     password.
 >
 >     Francisco Olarte.
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: load fom csv

2024-09-16 Thread Andy Hartman
in LOG

2024-09-16 12:55:37.295 EDT [428] ERROR:  invalid byte sequence for
encoding "UTF8": 0x00
2024-09-16 12:55:37.295 EDT [428] CONTEXT:  COPY
image_classification_master, line 1, column spoolstarttime

On Mon, Sep 16, 2024 at 12:56 PM Adrian Klaver 
wrote:

> On 9/16/24 09:46, Andy Hartman wrote:
> >
> > It Looks correct.
> >
> > $pgTable = "image_classification_master"
>
> Connect to the database with psql and look at the table name. I'm
> betting it is not image_classification_master. Instead some mixed or all
> upper case version of the name.
>
> I don't use PowerShell or Windows for that matter these days so I can't
> be of much use on the script. I do suspect you will need to some
> escaping to get the table name properly quoted in the script. To work
> through this you need to try what I call the crawl/walk/run process. In
> this case that is:
>
> 1) Crawl. Connect using psql and run the \copy in it with hard coded
> values.
>
> 2) Walk. Use psql with the -c argument and supply the command again with
> hard coded values
>
> 3) Run. Then use PowerShell and do the variable substitution.
>
> >
> >
> >
> >
> > On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/16/24 09:12, Andy Hartman wrote:
> >  > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
> >  > "image_classification_master" does not exist
> >  > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
> >  > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;
> >
> > I'm assuming this is from the Postgres log.
> >
> > Best guess is the table name in the database is mixed case and needs
> to
> > be double quoted in the command to preserve the casing.
> >
> > See:
> >
> >
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
> <
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
> >
> >
> > for why.
> >
> >  >
> >  > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
> >  > mailto:fola...@peoplecall.com>
> > >>
> wrote:
> >  >
> >  >
> >  >
> >  > On Mon, 16 Sept 2024 at 17:36, Andy Hartman
> > mailto:hartman60h...@gmail.com>
> >  >  > >> wrote:
> >  >
> >  > I'm trying to run this piece of code from Powershell and
> > it just
> >  > sits there and never comes back. There are only 131
> > records in
> >  > the csv.
> >  > $connectionString =
> >  >
> >
>  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> >  > $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER
> > ',' CSV
> >  > HEADER;"
> >  > psql -h $pgServer -d $pgDatabase -U $pgUser -c
> $copyCommand
> >  > how can I debug this?
> >  >
> >  >
> >  > I would start by adding -a and -e after "psql".
> >  >
> >  > IIRC Powershell is windows, and in windows shell do not pass
> > command
> >  > words preparsed as in *ix to the executable, but a single
> command
> >  > line with the executable must parse. Given the amount of
> > quoting, -a
> >  > and -e will let you see the commands are properly sent, and
> > if it is
> >  > trying to read something what it is.
> >  >
> >  > I will also try to substitute the -c with a pipe. If it
> > heals, it is
> >  > probably a quoting issue.
> >  >
> >  > Also, I just caught Ron's message, and psql might be waiting
> > for a
> >  > password.
> >  >
> >  > Francisco Olarte.
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: load fom csv

2024-09-16 Thread Adrian Klaver

On 9/16/24 10:00, Andy Hartman wrote:

in LOG

2024-09-16 12:55:37.295 EDT [428] ERROR:  invalid byte sequence for 
encoding "UTF8": 0x00
2024-09-16 12:55:37.295 EDT [428] CONTEXT:  COPY 
image_classification_master, line 1, column spoolstarttime




I'm heading out the door I can't step you through the process, I can 
point you at:


https://www.postgresql.org/docs/current/multibyte.html#MULTIBYTE-AUTOMATIC-CONVERSION

Others will be able to answer the specifics.


--
Adrian Klaver
adrian.kla...@aklaver.com





IO related waits

2024-09-16 Thread veem v
Hi,
One of our application using RDS postgres. In one of our streaming
applications(using flink) which processes 100's of millions of transactions
each day, we are using row by row transaction processing for inserting data
into the postgres database and commit is performed for each row. We are
seeing heavy IO:XactSynch wait events during the data load and also high
overall response time.

Architecture team is suggesting to enable asynch io if possible, so that
the streaming client will not wait for the commit confirmation from the
database. So I want to understand , how asynch io can be enabled and if any
downsides of doing this?

Regards
Veem


Re: IO related waits

2024-09-16 Thread Christophe Pettus



> On Sep 16, 2024, at 13:24, veem v  wrote:
> Architecture team is suggesting to enable asynch io if possible, so that the 
> streaming client will not wait for the commit confirmation from the database. 
> So I want to understand , how asynch io can be enabled and if any downsides 
> of doing this? 

"Async I/O" has a specific meaning that's not currently applicable to 
PostgreSQL.  What is available is "synchronous_commit".  This setting is by 
default on.  When it's on, each commit waits until the associated WAL 
information has been flushed to disk, and then returns.  If it is turned off, 
the commit returns more or less immediately, and the WAL flush happens 
asynchronously from the commit.

The upside is that the session can proceed without waiting for the WAL flush.  
The downside is that on a server crash, some transactions may not have been 
fully committed to the database, and will be missing when the database 
restarts.  The database won't be corrupted (as in, you try to use it and get 
errors), but it will be "earlier in time" than the application might expect.  
It's pretty common to turn it off for high-ingestion-rate situations, 
especially where the application can detect and replay missing transactions on 
a crash.



Re: IO related waits

2024-09-16 Thread Adrian Klaver

On 9/16/24 13:24, veem v wrote:

Hi,
One of our application using RDS postgres. In one of our streaming 
applications(using flink) which processes 100's of millions of 
transactions each day, we are using row by row transaction processing 
for inserting data into the postgres database and commit is performed 
for each row. We are seeing heavy IO:XactSynch wait events during the 
data load and also high overall response time.


Architecture team is suggesting to enable asynch io if possible, so that 
the streaming client will not wait for the commit confirmation from the 
database. So I want to understand , how asynch io can be enabled and if 
any downsides of doing this?


Are you referring to this?:

https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/

If not then you will need to be more specific.



Regards
Veem


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: IO related waits

2024-09-16 Thread veem v
On Tue, 17 Sept 2024 at 03:41, Adrian Klaver 
wrote:

>
> Are you referring to this?:
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
>
> If not then you will need to be more specific.
>
>
Yes, I was referring to this one. So what can be the caveats in this
approach, considering transactions meant to be ACID compliant as financial
transactions.Additionally I was not aware of the parameter
"synchronous_commit" in DB side which will mimic the synchronous commit.

Would both of these mimic the same asynchronous behaviour and achieves the
same, which means the client data load throughput will increase because the
DB will not wait for those data to be written to the WAL and give a
confirmation back to the client and also the client will not wait for the
DB to give a confirmation back on the data to be persisted in the DB or
not?. Also, as in the backend the flushing of the WAL to the disk has to
happen anyway(just that it will be delayed now), so can this method cause
contention in the database storage side if the speed in which the data gets
ingested from the client is not getting written to the disk , and if it can
someway impact the data consistency for the read queries?


Re: update faster way

2024-09-16 Thread Muhammad Usman Khan
Hi,

You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism

SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;


On Sat, 14 Sept 2024 at 08:22, yudhi s  wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>


Re: load fom csv

2024-09-16 Thread Muhammad Usman Khan
Hi,
Try the following options:


   - Check if psql is working independently:
   psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
   - Check for permission issues on the CSV file
   - Run the command manually without variables
   psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
   'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
   - set a timeout using the PGCONNECT_TIMEOUT environment variable:
   $env:PGCONNECT_TIMEOUT=30



On Mon, 16 Sept 2024 at 20:35, Andy Hartman  wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
>
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>
>
> how can I debug this?
>
> Table layout
>
> [image: image.png]
>