On Thu, 2009-10-08 at 18:23 -0400, Bruce Momjian wrote: > Dimitri Fontaine wrote: > > Simon Riggs <si...@2ndquadrant.com> writes: > > > It will be best to have the ability to have a specific rejection reason > > > for each row rejected. That way we will be able to tell the difference > > > between uniqueness violation errors, invalid date format on col7, value > > > fails check constraint on col22 etc.. > > > > In case that helps, what pgloader does is logging into two files, named > > after the table name (not scalable to server-side solution): > > table.rej --- lines it could not load, straight from source file > > table.rej.log --- errors as given by the server, plus pgloader comment > > > > The pgloader comment is necessary for associating each log line to the > > source file line, as it's operating by dichotomy, the server always > > report error on line 1. > > > > The idea of having two errors file could be kept though, the aim is to > > be able to fix the setup then COPY again the table.rej file when it > > happens the errors are not on the file content. Or for loading into > > another table, with all columns as text or bytea, then clean data from a > > procedure. > > What would be _cool_ would be to add the ability to have comments in the > COPY files, like \#, and then the copy data lines and errors could be > adjacent. (Because of the way we control COPY escaping, adding \# would > not be a problem. We have \N for null, for example.)
That was my idea also until I heard Dimitri's two file approach. Having a pristine data file and a matching error file means you can potentially just resubmit the error file again. Often you need to do things like trap RI errors and then resubmit them at a later time once the master rows have entered the system. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers