On Tue, Jul 23, 2024 at 1:35 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:

Thanks for your review.

On 2024/07/22 21:37, torikoshia wrote:
On Fri, Jul 19, 2024 at 11:48 PM Junwang Zhao <zhjw...@gmail.com> wrote:
Thanks for the comment.

In patch 0002, the ratio is calculated by the already skipped/processed rows, but what if a user wants to copy 1000 rows, and he/she can tolerate 10 error rows, so he/she might set *reject_limit 0.01*, but one bad row in the first 100 rows will fail the entire command, this might surprise the user.

Since the ratio is calculated after all data is processed, the case "one bad row in the first 100 rows will fail the entire command" doesn't happen:

Yes, but is this the desired behavior when using the ratio threshold?
I was thinking that COPY should fail once the error ratio (errors vs.
total rows in the input file) reaches the threshold.

Users might expect like you.
However, implementing it would need a row number counting feature as you pointed out, and it seems an overkill.

Describing the difference between ratio and number in the manual might help, but it might be better to make REJECT_LIMIT support only the number of errors and leave it to the user to calculate the number from the ratio.

I'd like to hear if anyone has an opinion on the need for supporting ratio.
I remember David prefers ratio[1].

+ This option must be used with <literal>ON_ERROR</literal> to be set to
+      other than <literal>stop</literal>.

Regarding the ON_ERROR option, now it only has two values.
Instead of saying "other than stop," we should simply use "ignore"
for clarity and intuition?

I'll Modify it.
The reason for the roundabout wording was the expectation that on_error would support values other than these in the future, but as you point out, there are currently only two.


+ When specified <literal>INFINITY</literal>, <command>COPY</command> ignores all + the errors. This is a synonym for <literal>ON_ERROR</literal> <literal>ignore</literal>.

For the INFINITY value, the description is a bit unclear.
As I understand it, INFINITY is the default for REJECT_LIMIT.
So, if ON_ERROR=ignore is set without specifying REJECT_LIMIT,
COPY will ignore all errors, similar to when REJECT_LIMIT=INFINITY is used.


In line with my previous suggestion, if we support only REJECT_LIMIT
without ON_ERROR, having INFINITY makes sense. However,
with ON_ERROR supported, REJECT_LIMIT=INFINITY seems redundant.
Users can just set ON_ERROR=ignore to ignore all errors,
making INFINITY unnecessary for REJECT_LIMIT. This is open for
discussion, but I believe it's better to remove INFINITY from
the REJECT_LIMIT options.

Agreed.
Unless there are opposing opinions, I'm going to remove 'INFINITY' in the next patch.

+               else if (strcmp(defel->defname, "reject_limit") == 0)
+               {
+                       if (reject_limit_specified)
+                               errorConflictingDefElem(defel, pstate);
+                       if (!opts_out->on_error)
+                               ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("REJECT_LIMIT requires ON_ERROR to be set to other than stop")));

Using "ignore" instead of "other than stop" in the error message
is clearer and more intuitive.

Agreed.


Checking if ON_ERROR and REJECT_LIMIT are specified should be
done after the foreach() processing of the options. Otherwise,
if REJECT_LIMIT is specified before ON_ERROR, unexpected errors can occur.

---------------
=# copy test from '...' WITH (REJECT_LIMIT 7, ON_ERROR 'ignore');
ERROR:  REJECT_LIMIT requires ON_ERROR to be set to other than stop
---------------

Ugh, I'll modify it.

+                               ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("exceeded the number specified by REJECT_LIMIT \"%lld\"", + (long long) cstate->opts.reject_limits.num_err)));

The error message isn't clear about what exceeded REJECT_LIMIT.
How about saying "skipped more than REJECT_LIMIT rows" or something instead?

Agreed.

+/*
+ * A struct to hold reject_limit options, in a parsed form.
+ * More values to be added in another patch.
+ */

The latter comment doesn't seem necessary or helpful.

Agreed.

[1] https://www.postgresql.org/message-id/CAKFQuwYP91_G6tktYFTZq_CmkZ_%3DzuWjkz1%2B25Nd8bpsrDkx5Q%40mail.gmail.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation


Reply via email to