On Wed, Jan 8, 2025 at 3:05 PM Kirill Reshke <reshkekir...@gmail.com> wrote: > > So, in this version you essentially removed support for REJECT_LIMIT + > SET_TO_NULL feature? Looks like a promising change. It is more likely > to see this committed. > So, +1 on that too. > > However, v9 lacks tests for REJECT_LIMIT vs erroneous rows tests. > In short, we need this message somewhere in a regression test. > ``` > ERROR: skipped more than REJECT_LIMIT (xxx) rows due to data type > incompatibility > ``` >
hi. you already answered this question. since we do not support REJECT_LIMIT+SET_TO_NULL, so these code path would not be reachable. > Also, please update commit msg with all authors and reviewers. This > will make committer job a little bit easier > commit message polished. here and there cosmetic changes. I think there are three remaining issues that may need more attention 1. Table 27.42. pg_stat_progress_copy View (<structname>pg_stat_progress_copy</structname>) column pg_stat_progress_copy.tuples_skipped now the description is "" When the ON_ERROR option is set to ignore, this value shows the number of tuples skipped due to malformed data. When the ON_ERROR option is set to set_to_null, this value shows the number of tuples where malformed data was converted to NULL. """ now the column name tuples_skipped would not be that suitable for (on_error set_to_null). since now it is not tuple skipped, it is in a tuple some value was set to null. Or we can skip progress reports for (on_error set_to_null) case. 2. The doc is not very great, I guess. 3. do we settled (on_error set_to_null) syntax.
From 2f77abbb058c952715838d31b1e04f678a079e30 Mon Sep 17 00:00:00 2001 From: jian he <jian.universal...@gmail.com> Date: Fri, 10 Jan 2025 14:33:55 +0800 Subject: [PATCH v10 1/1] extent "on_error action", introduce new option: on_error set_to_null. due to current grammar, we cannot use "on_error null", so i choose on_error set_to_null. Any data type conversion errors during the COPY FROM process will result in the affected column being set to NULL. This only applicable when using the non-binary format for COPY FROM. However, the not-null constraint will still be enforced. If a conversion error leads to a NULL value in a column that has a not-null constraint, a not-null constraint violation error will be triggered. A regression test for a domain with a not-null constraint has been added. Author: Jian He <jian.universal...@gmail.com>, Author: Kirill Reshke <reshkekir...@gmail.com> Reviewed-by: Fujii Masao <masao.fu...@oss.nttdata.com>, Jim Jones <jim.jo...@uni-muenster.de>, "David G. Johnston" <david.g.johns...@gmail.com>, Yugo NAGATA <nag...@sraoss.co.jp>, torikoshia <torikos...@oss.nttdata.com> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=bp3d1_asfe...@mail.gmail.com --- doc/src/sgml/monitoring.sgml | 8 ++-- doc/src/sgml/ref/copy.sgml | 27 +++++++---- src/backend/commands/copy.c | 6 ++- src/backend/commands/copyfrom.c | 33 +++++++++---- src/backend/commands/copyfromparse.c | 46 +++++++++++++++++- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 4 +- src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++ src/test/regress/sql/copy2.sql | 46 ++++++++++++++++++ 10 files changed, 205 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index d0d176cc54..6639561384 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5975,10 +5975,10 @@ FROM pg_stat_get_backend_idset() AS backendid; <structfield>tuples_skipped</structfield> <type>bigint</type> </para> <para> - Number of tuples skipped because they contain malformed data. - This counter only advances when a value other than - <literal>stop</literal> is specified to the <literal>ON_ERROR</literal> - option. + When the <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>, + this value shows the number of tuples skipped due to malformed data. + When the <literal>ON_ERROR</literal> option is set to <literal>set_to_null</literal>, + this value shows the number of tuples where malformed data was converted to NULL. </para></entry> </row> </tbody> diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 8394402f09..4346fb0756 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, and + <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is + For <literal>ignore</literal> option, + a <literal>NOTICE</literal> message containing the ignored row count is emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null happened is + emitted at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> + <para> + When <literal>LOG_VERBOSITY</literal> option is set to + <literal>verbose</literal>, for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. + conversion has failed is emitted for each discarded row; + for <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message specifies the line number of the input file and column name + where the input value was replaced with NULL due to input conversion failure. When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + regarding input conversion failed rows. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc2..afe60758d4 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_to_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_to_null") == 0) + return COPY_ON_ERROR_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0cbd05f560..33bd67767e 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1029,6 +1029,10 @@ CopyFrom(CopyFromState cstate) continue; } + /* Report that this tuple some value was replaced with NULL by the ON_ERROR clause */ + if (cstate->opts.on_error == COPY_ON_ERROR_NULL && cstate->num_errors > 0) + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, + cstate->num_errors); ExecStoreVirtualTuple(myslot); /* @@ -1321,14 +1325,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%llu row was skipped due to data type incompatibility", - "%llu rows were skipped due to data type incompatibility", - (unsigned long long) cstate->num_errors, - (unsigned long long) cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("Erroneous values in %llu row was replaced with NULL", + "Erroneous values in %llu rows were replaced with NULL", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1474,10 +1486,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index caccdc8563..8d5ab08491 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; /* read raw fields in the next line */ if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) @@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, /* * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors + * errors. If ON_ERROR is specified with set_to_null, try + * to replace with NULL. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + { + /* + * we use this count the number of rows (not fields) that + * successfully applied the on_error set_to_null + */ + if (!current_row_erroneous) + current_row_erroneous = true; + + /* + * we need another InputFunctionCallSafe so we can error out + * not-null violation for domain with not-null constraint. + */ + cstate->escontext->error_occurred = false; + if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + ereport(NOTICE, + errmsg("column \"%s\" was set to NULL due to data type incompatibility at line %llu", + cstate->cur_attname, + (unsigned long long) cstate->cur_lineno)); + continue; + } + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + } + cstate->num_errors++; - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE && + cstate->opts.on_error == COPY_ON_ERROR_IGNORE) { /* * Since we emit line number and column info in the below @@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, cstate->cur_attval = NULL; } + if (current_row_erroneous) + cstate->num_errors++; Assert(fieldno == attr_count); } else diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 81cbf10aa2..04a155ad5f 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", - "ON_ERROR", "LOG_VERBOSITY"); + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); /* Complete COPY <sth> FROM|TO filename WITH (FORMAT */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef72..7ebf4f7893 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 1d8ac8f62e..50759eaf1c 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -98,7 +98,9 @@ typedef struct CopyFromStateData ErrorSaveContext *escontext; /* soft error trapped during in_functions * execution */ uint64 num_errors; /* total number of rows which contained soft - * errors */ + * errors, for ON_ERROR set_to_null, it's the + * number of rows successfully converted to null + */ int *defmap; /* array of default att numbers related to * missing att */ ExprState **defexprs; /* array of default att expressions for all diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae..377be5b99d 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_to_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_to_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x FROM stdin (on_error set_to_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdin (on_error set_to_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdin (on_error set_to_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to NULL value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "a" +--fail, column a cannot set to NULL value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +NOTICE: column "b" was set to NULL due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column b: "a" +NOTICE: column "c" was set to NULL due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column c: "d" +NOTICE: column "b" was set to NULL due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null, line 2, column b: "b" +NOTICE: column "c" was set to NULL due to data type incompatibility at line 3 +CONTEXT: COPY t_on_error_null, line 3, column c: "e" +NOTICE: Erroneous values in 3 rows were replaced with NULL +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce..6cd477af14 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_to_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_to_null); +COPY x FROM stdin (on_error set_to_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdin (on_error set_to_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -534,6 +538,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to NULL value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +a 11 14 +\. + +--fail, column a cannot set to NULL value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +10 a d +11 b 12 +13 14 e +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1