On 12/19/2012 08:13 AM, Kirk Wythers wrote:
I am using version 9.1 and have a large number of files to insert. I am trying 
to use a simple COPY FROM command but have a couple questions.

1. There are a small number of instances where there are duplicate records that 
are being caught by the primary key (as it is supposed to do). However, the 
duplicate records are artifacts of a recording system and can be ignored. I 
would like to be able to simply skip the duplicate or UPDATE the table with the 
duplicate… Anything that allows the COPY FROM to proceed while adding only one 
of the duplicate records to the table.

2. SInce I have several hundred files to perform a COPY FROM on, I'd like to 
automate the import in some way… sort of a, grab all files in the directory 
approach:

COPY newtable FROM '/directory_of_files/*' WITH CSV HEADER DELIMITER AS ',' 
NULL AS 'NA';


I suppose you could use a trigger to check each record before inserting but that is likely to be inefficient for bulk loads. A quick bash loop is probably your best bet. Something along the lines of:

for inputfile in /infiledirectory/*.csv
do
cat inputfile | psql [connection-params] -c '\copy rawinput from stdin csv header...'
done

This imports everything into a "staging" table (I called it rawinput). From there you can create your final table with SELECT DISTINCT...

For speed make sure that you create your staging table as "unlogged".

Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to