Thanks for your review!
On Thu, Oct 3, 2024 at 4:27 PM jian he <jian.universal...@gmail.com>
wrote:
mentioning <replaceable class="parameter">maxerror</replaceable> is a
bigint type
or explicitly mentioning the maximum allowed value of "maxerror" would
be great.
Added a description that it allows positive bigint.
On Thu, Oct 3, 2024 at 11:43 PM Fujii Masao
<masao.fu...@oss.nttdata.com> wrote:
+ if (opts_out->reject_limit && !opts_out->on_error)
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of
COPY
+ * option, e.g. ON_ERROR, third is the value of the
COPY option,
+ * e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set
to %s",
+ "REJECT_LIMIT",
"ON_ERROR", "IGNORE")));
This check might be better moved to other place, for example at
the bottom of ProcessCopyOptions(). I noticed a comment in the code:
"Check for incompatible options (must do these two before inserting
defaults)."
You added your condition after this comment and before inserting the
defaults,
but it's not related to that process. So, moving this check to other
place
seems more appropriate.
Agreed. Moved it to the bottom of ProcessCopyOptions().
While reviewing, I also noticed that the check for
"opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP"
is similarly placed before setting the defaults, which might not
be correct. This check should probably be moved as well.
Additionally, the comment mentioning "must do these two" should be
updated to "must do these three." These changes should be handled
in a separate patch.
Agreed and attached 0002 patch.
+ if (cstate->opts.reject_limit &&
Wouldn't it be clearer to use cstate->opts.reject_limit > 0 for better
readability?
Agreed.
+ cstate->num_errors >
cstate->opts.reject_limit)
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("skipped more
than REJECT_LIMIT rows: \"%lld\",",
+ (long
long) cstate->opts.reject_limit)));
To make the error message clearer, similar to the NOTICE about
skipped rows, how about rewording it to:
"skipped more than REJECT_LIMIT (%lld) rows due to data type
incompatibility"?
Agreed.
Also considering when REJECT_LIMIT is specified to 1, attached patch
uses errmsg_plural() instead of errmsg.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From 360e170b8aee378c0f498d8899d9f3b5d41d0310 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Mon, 7 Oct 2024 19:57:51 +0900
Subject: [PATCH v7 1/2] Subject: Add REJECT_LIMIT option to COPY command.
9e2d870 enabled the COPY FROM command to skip soft errors, but there
was no limitation about the number of errors and skipped all the soft
errors.
In some cases there would be a limitation how many errors are
tolerable. This patch introduces REJECT_LIMIT to specify how many
errors can be skipped.
---
doc/src/sgml/ref/copy.sgml | 19 +++++++++++++++++
src/backend/commands/copy.c | 33 +++++++++++++++++++++++++++++
src/backend/commands/copyfrom.c | 9 ++++++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 10 +++++++++
src/test/regress/sql/copy2.sql | 21 ++++++++++++++++++
6 files changed, 93 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index b9413d4892..59a17d7793 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ON_ERROR <replaceable class="parameter">error_action</replaceable>
+ REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
LOG_VERBOSITY <replaceable class="parameter">verbosity</replaceable>
</synopsis>
@@ -413,6 +414,24 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>REJECT_LIMIT</literal></term>
+ <listitem>
+ <para>
+ Specifies the maximum number of errors tolerated while converting a
+ column's input value to its data type, when <literal>ON_ERROR</literal> is
+ set to <literal>ignore</literal>.
+ If the input causes more errors than the specified value, the <command>COPY</command>
+ command fails, even with <literal>ON_ERROR</literal> set to <literal>ignore</literal>.
+ This clause must be used with <literal>ON_ERROR</literal>=<literal>ignore</literal>
+ and <replaceable class="parameter">maxerror</replaceable> must be positive <type>bigint</type>.
+ If not specified, <literal>ON_ERROR</literal>=<literal>ignore</literal>
+ allows an unlimited number of errors, meaning <command>COPY</command> will
+ skip all erroneous data.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 03eb7a4eba..befab92074 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -418,6 +418,23 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP; /* keep compiler quiet */
}
+/*
+ * Extract REJECT_LIMIT value from a DefElem.
+ */
+static int64
+defGetCopyRejectLimitOption(DefElem *def)
+{
+ int64 reject_limit = defGetInt64(def);
+
+ if (reject_limit <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REJECT_LIMIT (%lld) must be greater than zero",
+ (long long) reject_limit)));
+
+ return reject_limit;
+}
+
/*
* Extract a CopyLogVerbosityChoice value from a DefElem.
*/
@@ -472,6 +489,7 @@ ProcessCopyOptions(ParseState *pstate,
bool header_specified = false;
bool on_error_specified = false;
bool log_verbosity_specified = false;
+ bool reject_limit_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -638,6 +656,13 @@ ProcessCopyOptions(ParseState *pstate,
log_verbosity_specified = true;
opts_out->log_verbosity = defGetCopyLogVerbosityChoice(defel, pstate);
}
+ else if (strcmp(defel->defname, "reject_limit") == 0)
+ {
+ if (reject_limit_specified)
+ errorConflictingDefElem(defel, pstate);
+ reject_limit_specified = true;
+ opts_out->reject_limit = defGetCopyRejectLimitOption(defel);
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -874,6 +899,14 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
+ /* Check on_error */
+ if (opts_out->reject_limit && !opts_out->on_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*- translator: first and second %s are the names of COPY option, e.g.
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
+ errmsg("COPY %s requires %s to be set to %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 9139a40785..b04e770cfa 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1018,6 +1018,15 @@ CopyFrom(CopyFromState cstate)
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
cstate->num_errors);
+ if (cstate->opts.reject_limit > 0 && \
+ cstate->num_errors > cstate->opts.reject_limit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg_plural("skipped more than REJECT_LIMIT (%lld) row due to data type incompatibility",
+ "skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+ (long long) cstate->opts.reject_limit,
+ (long long) cstate->opts.reject_limit)));
+
/* Repeat NextCopyFrom() until no soft error occurs */
continue;
}
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6f64d97fdd..4002a7f538 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -85,6 +85,7 @@ typedef struct CopyFormatOptions
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */
+ int64 reject_limit; /* maximum tolerable number of errors */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 4e752977b5..d7d975522f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -116,6 +116,10 @@ COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
^
+COPY x from stdin with (reject_limit 1);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+COPY x from stdin with (on_error ignore, reject_limit 0);
+ERROR: REJECT_LIMIT (0) must be greater than zero
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -791,6 +795,12 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+-- tests for reject_limit option
+COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 3);
+ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
+CONTEXT: COPY check_ign_err, line 5, column n: ""
+COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
+NOTICE: 4 rows were skipped due to data type incompatibility
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index fa6aa17344..1aa0e41b68 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -82,6 +82,8 @@ COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
COPY x to stdin (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
+COPY x from stdin with (reject_limit 1);
+COPY x from stdin with (on_error ignore, reject_limit 0);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -561,6 +563,25 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+-- tests for reject_limit option
+COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 3);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
+COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.39.2
From cf0ce6d2e9362d3c36755a306b913c9cc89b0007 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Mon, 7 Oct 2024 20:10:22 +0900
Subject: [PATCH v7 2/2] Move check for binary mode and on_error option to appropriate location
The current location for checking for binary mode and on_error option
is not appropriate, as this is where checks should be performed before
inserting default values.
Additionally, the comment mentions two checks, but there are now three.
This patch moves the check for incompatible options to the end of
ProcessCopyOptions() and updates the comment accordingly.
---
src/backend/commands/copy.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index befab92074..0b093dbb2a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -672,7 +672,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these two before inserting
+ * Check for incompatible options (must do these three before inserting
* defaults)
*/
if (opts_out->binary && opts_out->delim)
@@ -691,11 +691,6 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
@@ -900,6 +895,11 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
+ if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
+
if (opts_out->reject_limit && !opts_out->on_error)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
--
2.39.2