On Thu, 8 Oct 2009, Rod Taylor wrote:
1) Having copy remember which specific line caused the error. So it can replace lines 1 through 487 in a subtransaction since it knows those are successful. Run 488 in its on subtransaction. Run 489 through ... in a new subtransaction.
This is the standard technique used in other bulk loaders I'm aware of.
2) Increasing the number of records per subtransaction if data is clean. It wouldn't take long until you were inserting millions of records per subtransaction for a large data set.
You can make it adaptive in both directions with some boundaries. If you double the batch size every time there's a clean commit, and halve it every time there's an error, start batching at 1024 and bound to the range [1,1048576]. That's close to optimal behavior here if combined with the targeted retry described in (1).
The retry scheduling and batch size parts are the trivial and well understood parts here. Actually getting all this to play nicely with transactions and commit failures (rather than just bad data failures) is what's difficult.
-- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers