Re: [HACKERS] COPY enhancements

Greg Smith Thu, 08 Oct 2009 10:19:55 -0700

On Thu, 8 Oct 2009, Rod Taylor wrote:

1) Having copy remember which specific line caused the error. So it canreplace lines 1 through 487 in a subtransaction since it knows those aresuccessful. Run 488 in its on subtransaction. Run 489 through ... in anew subtransaction.


This is the standard technique used in other bulk loaders I'm aware of.

2) Increasing the number of records per subtransaction if data is clean.It wouldn't take long until you were inserting millions of records persubtransaction for a large data set.

You can make it adaptive in both directions with some boundaries. If youdouble the batch size every time there's a clean commit, and halve itevery time there's an error, start batching at 1024 and bound to the range[1,1048576]. That's close to optimal behavior here if combined with thetargeted retry described in (1).

The retry scheduling and batch size parts are the trivial and wellunderstood parts here. Actually getting all this to play nicely withtransactions and commit failures (rather than just bad data failures) iswhat's difficult.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

Reply via email to