On 2024-07-03 02:07, Fujii Masao wrote:
However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
option is still necessary.
I remembered another reason for the necessity of ON_ERROR.
ON_ERROR defines how to behave when encountering an error and it just
accepts 'ignore' and 'stop' currently, but is expected to support other
options such as saving details of errors to a table[1].
ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine
REJECT_LIMIT would not replace future options of ON_ERROR.
Considering this and the option we want to add this time is to specify
an upper limit on the number or ratio of errors, the name of this option
like "reject_limit" seems better than "ignore_errors".
On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikos...@oss.nttdata.com>
wrote:
On 2024-07-05 12:59, Fujii Masao wrote:
On 2024/07/04 12:05, torikoshia wrote:
I'm going to update it after discussing the option format as
described
below.
Updated the patch.
0001 sets limit by the absolute number of error rows and 0002 sets limit
by ratio of the error.
If we choose "all" as the keyword, renaming the option to
IGNORE_ERRORS
might be more intuitive and easier to understand than REJECT_LIMIT.
I feel that 'infinite' and 'unlimited' are unfamiliar values for
PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
be a better parameter name as your suggestion.
As described above, attached patch adopts REJECT_LIMIT, so it uses
"infinity".
This makes me think it might be better to treat REJECT_LIMIT as
an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
if we adopt your patch. Since ON_ERROR=stop is the default,
users could set the maximum number of allowed errors by specifying
only REJECT_LIMIT. Otherwise, they would need to specify both
ON_ERROR=ignore and REJECT_LIMIT.
That makes sense.
On my second thought, whatever value ON_ERROR is specified(e.g. ignore,
stop, table), it seems fine to use REJECT_LIMIT.
I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
meaning it ignores errors until it reaches REJECT_LIMIT and stops when
it exceeds the REJECT_LIMIT.
And REJECT_LIMIT seems orthogonal to 'table', which specifies where to
save error details.
Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR
option value.
[1]
https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxhj7hemcvatnepa5vaursena...@mail.gmail.com
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From d00e4a404fc9c6ccc8aec3885210909a21c55005 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Wed, 17 Jul 2024 21:47:52 +0900
Subject: [PATCH v2] Add new COPY option REJECT_LIMIT number
---
doc/src/sgml/ref/copy.sgml | 20 +++++++++++++
src/backend/commands/copy.c | 46 +++++++++++++++++++++++++++++
src/backend/commands/copyfrom.c | 6 ++++
src/include/commands/copy.h | 10 +++++++
src/test/regress/expected/copy2.out | 14 +++++++++
src/test/regress/sql/copy2.sql | 30 +++++++++++++++++++
6 files changed, 126 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..ee84d9661f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ON_ERROR <replaceable class="parameter">error_action</replaceable>
+ REJECT_LIMIT { <replaceable class="parameter">integer</replaceable> | INFINITY }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
LOG_VERBOSITY <replaceable class="parameter">verbosity</replaceable>
</synopsis>
@@ -411,6 +412,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>REJECT_LIMIT</literal></term>
+ <listitem>
+ <para>
+ When a positive integer value is specified, <command>COPY</command> limits
+ the maximum tolerable number of errors while converting a column's input
+ value into its data type.
+ If input data caused more errors than the specified value, entire
+ <command>COPY</command> fails.
+ Otherwise, <command>COPY</command> discards the input row and continues
+ with the next one.
+ </para>
+ <para>
+ When specified <literal>INFINITY</literal>, <command>COPY</command> ignores all
+ the errors. This is a synonym for <literal>ON_ERROR</literal> <literal>ignore</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index df7a4a21c9..f298614043 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -415,6 +415,43 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP; /* keep compiler quiet */
}
+/*
+ * Extract a CopyRejectLimits values from a DefElem.
+ */
+static CopyRejectLimits
+defGetCopyRejectLimitOptions(DefElem *def)
+{
+ CopyRejectLimits limits;
+ int64 num_err;
+
+ switch(nodeTag(def->arg))
+ {
+ case T_Integer:
+ num_err = defGetInt64(def);
+ if (num_err <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number for REJECT_LIMIT must be greater than zero")));
+ break;
+ case T_String:
+ if (pg_strcasecmp(defGetString(def), "INFINITY") == 0)
+ /* when set to 0, it is treated as no limit */
+ num_err = 0;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("string for REJECT_LIMIT must be 'INFINITY'")));
+ break;
+ default:
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value for REJECT_LIMIT must be positive integer or 'INFINITY'")));
+ }
+ limits.num_err = num_err;
+
+ return limits;
+}
+
/*
* Extract a CopyLogVerbosityChoice value from a DefElem.
*/
@@ -466,6 +503,7 @@ ProcessCopyOptions(ParseState *pstate,
bool header_specified = false;
bool on_error_specified = false;
bool log_verbosity_specified = false;
+ bool reject_limit_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -632,6 +670,14 @@ ProcessCopyOptions(ParseState *pstate,
log_verbosity_specified = true;
opts_out->log_verbosity = defGetCopyLogVerbosityChoice(defel, pstate);
}
+ else if (strcmp(defel->defname, "reject_limit") == 0)
+ {
+ if (reject_limit_specified)
+ errorConflictingDefElem(defel, pstate);
+ reject_limit_specified = true;
+ opts_out->on_error = COPY_ON_ERROR_IGNORE;
+ opts_out->reject_limits = defGetCopyRejectLimitOptions(defel);
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..65d950ef07 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1012,6 +1012,12 @@ CopyFrom(CopyFromState cstate)
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
++skipped);
+ if (cstate->opts.reject_limits.num_err &&
+ skipped > cstate->opts.reject_limits.num_err)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("exceeded the number specified by REJECT_LIMIT \"%lld\"",
+ (long long) cstate->opts.reject_limits.num_err)));
continue;
}
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..86ba1cbe16 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,15 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * A struct to hold reject_limit options, in a parsed form.
+ * More values to be added in another patch.
+ */
+typedef struct CopyRejectLimits
+{
+ int64 num_err; /* maximum tolerable number of errors */
+} CopyRejectLimits;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -83,6 +92,7 @@ typedef struct CopyFormatOptions
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */
+ CopyRejectLimits reject_limits; /* thresholds of reject_limit */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 931542f268..be04bac6b1 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -85,6 +85,10 @@ COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
^
+COPY x from stdin (reject_limit 'INFINITY', reject_limit 3);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (reject_limit 'INFINITY', reject_limit 3);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
@@ -116,6 +120,8 @@ COPY x to stdout (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x to stdout (log_verbosity unsupported);
^
+COPY x from stdin with (reject_limit 0);
+ERROR: number for REJECT_LIMIT must be greater than zero
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -789,6 +795,14 @@ CONTEXT: COPY check_ign_err, line 1: "1 {1}"
COPY check_ign_err FROM STDIN WITH (on_error ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
+-- tests for reject_limit option
+COPY check_ign_err FROM STDIN WITH (reject_limit 3);
+ERROR: exceeded the number specified by REJECT_LIMIT "3"
+CONTEXT: COPY check_ign_err, line 5, column n: ""
+COPY check_ign_err FROM STDIN WITH (reject_limit 4);
+NOTICE: 4 rows were skipped due to data type incompatibility
+COPY check_ign_err FROM STDIN WITH (reject_limit 'INFINITY');
+NOTICE: 4 rows were skipped due to data type incompatibility
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..2ddc50ce8d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -68,6 +68,7 @@ COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
+COPY x from stdin (reject_limit 'INFINITY', reject_limit 3);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
@@ -82,6 +83,7 @@ COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
COPY x to stdin (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
+COPY x from stdin with (reject_limit 0);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -557,6 +559,34 @@ COPY check_ign_err FROM STDIN WITH (on_error ignore);
1 {1} 3 abc
\.
+-- tests for reject_limit option
+COPY check_ign_err FROM STDIN WITH (reject_limit 3);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
+COPY check_ign_err FROM STDIN WITH (reject_limit 4);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
+COPY check_ign_err FROM STDIN WITH (reject_limit 'INFINITY');
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
base-commit: 86d33987e8b0364b468c9b40c5f2a0a1aed87ef1
--
2.39.2
From 3ba99a391d96cda49c1d45ed8c965bc65cb675c3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Wed, 17 Jul 2024 21:55:27 +0900
Subject: [PATCH v2] Add new COPY option REJECT_LIMIT ratio
---
doc/src/sgml/ref/copy.sgml | 11 ++++++++++-
src/backend/commands/copy.c | 18 +++++++++++++-----
src/backend/commands/copyfrom.c | 10 ++++++++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 7 +++++++
src/test/regress/sql/copy2.sql | 19 +++++++++++++++++++
6 files changed, 60 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index ee84d9661f..bdeed92e8e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,7 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ON_ERROR <replaceable class="parameter">error_action</replaceable>
- REJECT_LIMIT { <replaceable class="parameter">integer</replaceable> | INFINITY }
+ REJECT_LIMIT { <replaceable class="parameter">integer</replaceable> | <replaceable class="parameter">floating point</replaceable> | INFINITY }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
LOG_VERBOSITY <replaceable class="parameter">verbosity</replaceable>
</synopsis>
@@ -424,6 +424,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Otherwise, <command>COPY</command> discards the input row and continues
with the next one.
</para>
+ <para>
+ When a positive floating point value is specified, <command>COPY</command>
+ limits the maximum ratio of errors while converting a column's input
+ value into its data type.
+ If input data caused an error ratio greater than the specified value,
+ entire <command>COPY</command> fails.
+ Otherwise, <command>COPY</command> discards only the input rows where
+ errors occured and copies all the other rows.
+ </para>
<para>
When specified <literal>INFINITY</literal>, <command>COPY</command> ignores all
the errors. This is a synonym for <literal>ON_ERROR</literal> <literal>ignore</literal>.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f298614043..0930826d70 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -422,7 +422,8 @@ static CopyRejectLimits
defGetCopyRejectLimitOptions(DefElem *def)
{
CopyRejectLimits limits;
- int64 num_err;
+ uint64 num_err = 0;
+ float ratio_err = 0;
switch(nodeTag(def->arg))
{
@@ -433,14 +434,20 @@ defGetCopyRejectLimitOptions(DefElem *def)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("number for REJECT_LIMIT must be greater than zero")));
break;
+ case T_Float:
+ ratio_err = defGetNumeric(def);
+ if (ratio_err <= 0 || ratio_err >= 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("float for REJECT_LIMIT must be greater than zero and smaller than 1")));
+ break;
case T_String:
- if (pg_strcasecmp(defGetString(def), "INFINITY") == 0)
- /* when set to 0, it is treated as no limit */
- num_err = 0;
- else
+ if (pg_strcasecmp(defGetString(def), "INFINITY") != 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("string for REJECT_LIMIT must be 'INFINITY'")));
+
+ /* when set to 0, it is treated as no limit */
break;
default:
ereport(ERROR,
@@ -448,6 +455,7 @@ defGetCopyRejectLimitOptions(DefElem *def)
errmsg("value for REJECT_LIMIT must be positive integer or 'INFINITY'")));
}
limits.num_err = num_err;
+ limits.ratio_err = ratio_err;
return limits;
}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 65d950ef07..69485a25a1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -646,6 +646,7 @@ CopyFrom(CopyFromState cstate)
int64 processed = 0;
int64 excluded = 0;
int64 skipped = 0;
+ double ratio_err = 0;
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
@@ -1310,6 +1311,15 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
}
+ ratio_err = (double) skipped / (processed + skipped);
+ if (cstate->opts.reject_limits.ratio_err &&
+ cstate->opts.reject_limits.ratio_err < ratio_err)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("exceeded the ratio specified by REJECT_LIMIT \"%f\", the error ratio was: \"%f\"",
+ cstate->opts.reject_limits.ratio_err,
+ ratio_err)));
+
/* Done, clean up */
error_context_stack = errcallback.previous;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 86ba1cbe16..ad2606763d 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyLogVerbosityChoice
typedef struct CopyRejectLimits
{
int64 num_err; /* maximum tolerable number of errors */
+ float ratio_err; /* maximum tolerable ratio of errors */
} CopyRejectLimits;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index be04bac6b1..584abac15c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -122,6 +122,8 @@ LINE 1: COPY x to stdout (log_verbosity unsupported);
^
COPY x from stdin with (reject_limit 0);
ERROR: number for REJECT_LIMIT must be greater than zero
+COPY x from stdin with (reject_limit 1.1);
+ERROR: float for REJECT_LIMIT must be greater than zero and smaller than 1
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -803,6 +805,11 @@ COPY check_ign_err FROM STDIN WITH (reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
COPY check_ign_err FROM STDIN WITH (reject_limit 'INFINITY');
NOTICE: 4 rows were skipped due to data type incompatibility
+COPY check_ign_err FROM STDIN WITH (reject_limit 0.6);
+ERROR: exceeded the ratio specified by REJECT_LIMIT "0.600000", the error ratio was: "0.666667"
+CONTEXT: COPY check_ign_err, line 7: ""
+COPY check_ign_err FROM STDIN WITH (reject_limit 0.7);
+NOTICE: 4 rows were skipped due to data type incompatibility
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 2ddc50ce8d..39f44d941f 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -84,6 +84,7 @@ COPY x to stdin (format CSV, force_null(a));
COPY x to stdin (format BINARY, on_error unsupported);
COPY x to stdout (log_verbosity unsupported);
COPY x from stdin with (reject_limit 0);
+COPY x from stdin with (reject_limit 1.1);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -587,6 +588,24 @@ a {7} 7
10 {10} 10
\.
+COPY check_ign_err FROM STDIN WITH (reject_limit 0.6);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
+COPY check_ign_err FROM STDIN WITH (reject_limit 0.7);
+6 {6} 6
+a {7} 7
+8 {8} 8888888888
+9 {a, 9} 9
+
+10 {10} 10
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
base-commit: 86d33987e8b0364b468c9b40c5f2a0a1aed87ef1
--
2.39.2