On 2013.01.13 5:58 PM, Robert James wrote:
I need to INSERT a large number of records.  For performance reasons,
I'd rather send them to Postgres in one giant INSERT.

However, if there's a problem in one record (eg one row doesn't meet a
constraint), I'd still like the others saved.  That is, I specifically
DO NOT want atomic behavior.  It's okay to silently drop bad data in
this case - I don't even need to know about it.

Is there any way to do this, or am I forced to but each record into
its own INSERT?

Here's the best way:

1. Create a temporary staging table and bulk-insert all your data into it. This table would resemble the actual destination and has slots to hold all the data, but it would have weaker constraints, eg no unique/pk or foreign keys, such that your raw data is guaranteed to be accepted.

2. Use all the nice data analysis tools that SQL gives you and perform an INSERT...SELECT... into the actual destination from the staging table, and have any filters or tests or cleanups or joins with other tables (such as the destination table) that you desire so to preemptively take care of anything that would have caused a constraint failure.

Modify to taste.

In fact, this is my generally recommended method for doing any kind of bulk data import, because its much easier to clean data using SQL than otherwise, and its all very efficient resource-wise.

-- Darren Duncan



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to