On 2023-12-15 05:48, Masahiko Sawada wrote:
Thanks for joining this discussion!
I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.
While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.
Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it.
+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.
I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1],
i.e. manually loading data.
[1]
https://www.postgresql.org/message-id/739953.1699467519%40sss.pgh.pa.us
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.
Regards,
[1]
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm%40awork3.anarazel.de
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation