On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:

Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to
implement "error tables" feature.

+1.
I'm also going to make patch for "logging errors", since this functionality is isolated from v7 patch.

Seems promising. I'll look at the patch.
Thanks a lot!
Sorry to attach v2 if you already reviewed v1..

On 2024-01-11 12:13, jian he wrote:
On Tue, Jan 9, 2024 at 10:36 PM torikoshia <torikos...@oss.nttdata.com> wrote:

On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada <sawada.m...@gmail.com>
wrote:
> If we want only such a feature we need to implement it together (the
> patch could be split, though). But if some parts of the feature are
> useful for users as well, I'd recommend implementing it incrementally.
> That way, the patches can get small and it would be easy for reviewers
> and committers to review/commit them.

Jian, how do you think this comment?

Looking back at the discussion so far, it seems that not everyone thinks saving table information is the best idea[1] and some people think just
skipping error data is useful.[2]

Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.

Wouldn't it be better to follow the following advice and develop the
functionality incrementally?

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> wrote:
> So I'm thinking we may be able to implement this
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.


Attached a patch for this "first step" with reference to v7 patch, which
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only supports
'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.

Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)

* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState field.
* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext *escontext.
So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, <command>COPY</command>
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
 so we should set cstate->escontext->details_wanted to false.

BTW I have question and comment about v15 patch:

> +           {
> +               /*
> +               *
> +               * InputFunctionCall is more faster than
> InputFunctionCallSafe.
> +               *
> +               */

Have you measured this?
When I tested it in an older patch, there were no big difference[3].
Thanks for pointing it out, I probably was over thinking.

> - SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY
SELECT
> + SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
SECURITY SELECT

There was a comment that we shouldn't add new keyword for this[4].

Thanks for pointing it out.

Thanks for reviewing!

Updated the patch merging your suggestions except below points:

+   cstate->num_errors = 0;

Since cstate is already initialized in below lines, this may be redundant.

|     /* Allocate workspace and zero all fields */
|     cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));


 +                   Assert(!cstate->escontext->details_wanted);

I'm not sure this is necessary, considering we're going to add other options like 'table' and 'log', which need details_wanted soon.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
From a3f14a0e7e9a7b5fb961ad6b6b7b163cf6534a26 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Fri, 12 Jan 2024 11:32:00 +0900
Subject: [PATCH v2] Add new COPY option SAVE_ERROR_TO

Currently when source data contains unexpected data regarding data type or
range, entire COPY fails. However, in some cases such data can be ignored and
just copying normal data is preferable.

This patch adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.

Currently SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, referenced with jian he's patch.
---
 doc/src/sgml/ref/copy.sgml               | 21 +++++++++++-
 src/backend/commands/copy.c              | 19 +++++++++++
 src/backend/commands/copyfrom.c          | 43 ++++++++++++++++++++++++
 src/backend/commands/copyfromparse.c     | 17 +++++++---
 src/bin/psql/tab-complete.c              |  7 +++-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out      | 28 +++++++++++++++
 src/test/regress/sql/copy2.sql           | 27 +++++++++++++++
 9 files changed, 159 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c33..71941c4ee5 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+    SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
 </synopsis>
  </refsynopsisdiv>
@@ -373,6 +374,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>SAVE_ERROR_TO</literal></term>
+    <listitem>
+     <para>
+      Specifies save error information to <replaceable class="parameter">
+      location</replaceable> when there are malformed data in the input.
+      If this option is specified, <command>COPY</command> skips malformed data
+      and continues copying data.
+      Currently only <literal>none</literal> is supported.
+      If this option is omitted, <command>COPY</command> stops operation at the
+      first error.
+      This option is allowed only in <command>COPY FROM</command>, and only when
+      not using <literal>binary</literal> format.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ENCODING</literal></term>
     <listitem>
@@ -556,7 +574,8 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error. This
+    <command>COPY</command> stops operation at the first error when
+    <literal>SAVE_ERROR_TO</literal> is not specified. This
     should not lead to problems in the event of a <command>COPY
     TO</command>, but the target table will already have received
     earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d7..5e5e8a5f34 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -571,6 +571,20 @@ ProcessCopyOptions(ParseState *pstate,
 								defel->defname),
 						 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "save_error_to") == 0)
+		{
+			char	   *location = defGetString(defel);
+
+			if (opts_out->save_error_to)
+				errorConflictingDefElem(defel, pstate);
+			else if (strcmp(location, "none") == 0)
+				opts_out->save_error_to = location;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY save_error_to \"%s\" not recognized", location),
+						 parser_errposition(pstate, defel->location)));
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +612,11 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->save_error_to)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c..48484a2597 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "nodes/miscnodes.h"
 #include "optimizer/optimizer.h"
 #include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
 
+	if (cstate->opts.save_error_to)
+		Assert(cstate->escontext);
+
 	/*
 	 * The target must be a plain, foreign, or partitioned relation, or have
 	 * an INSTEAD OF INSERT row trigger.  (Currently, such triggers are only
@@ -992,6 +996,26 @@ CopyFrom(CopyFromState cstate)
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
 			break;
 
+		if (cstate->opts.save_error_to && cstate->escontext->error_occurred)
+		{
+			/*
+			 * Soft error occured, skip this tuple and save error information
+			 * according to SAVE_ERROR_TO.
+			 */
+			if (strcmp(cstate->opts.save_error_to, "none") == 0)
+				/*
+				 * Just make ErrorSaveContext ready for the next NextCopyFrom.
+				 * Since we don't set details_wanted and error_data is not to be
+				 * filled, just resetting error_occurred is enough.
+				 */
+				cstate->escontext->error_occurred = false;
+			else
+				elog(ERROR, "unexpected SAVE_ERROR_TO location : %s",
+						cstate->opts.save_error_to);
+
+			continue;
+		}
+
 		ExecStoreVirtualTuple(myslot);
 
 		/*
@@ -1281,6 +1305,11 @@ CopyFrom(CopyFromState cstate)
 			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
 	}
 
+	if (cstate->opts.save_error_to && cstate->num_errors > 0)
+		ereport(WARNING,
+				errmsg("%zd rows were skipped due to data type incompatibility",
+					   cstate->num_errors));
+
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
@@ -1419,6 +1448,20 @@ BeginCopyFrom(ParseState *pstate,
 		}
 	}
 
+	/* Set up soft error handler for SAVE_ERROR_TO */
+	if (cstate->opts.save_error_to)
+	{
+		cstate->escontext = makeNode(ErrorSaveContext);
+		cstate->escontext->type = T_ErrorSaveContext;
+		cstate->escontext->error_occurred = false;
+
+		 /* Currently we only support "none". We'll add other options later */
+		if (strcmp(cstate->opts.save_error_to, "none") == 0)
+			cstate->escontext->details_wanted = false;
+	}
+	else
+		cstate->escontext = NULL;
+
 	/* Convert FORCE_NULL name list to per-column flags, check validity */
 	cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
 	if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f645..7041815dee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/miscnodes.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 				values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
 			}
-			else
-				values[m] = InputFunctionCall(&in_functions[m],
-											  string,
-											  typioparams[m],
-											  att->atttypmod);
+			/* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+			else if (!InputFunctionCallSafe(&in_functions[m],
+										   string,
+										   typioparams[m],
+										   att->atttypmod,
+										   (Node *) cstate->escontext,
+										   &values[m]))
+			{
+				cstate->num_errors++;
+				return true;
+			}
 
 			cstate->cur_attname = NULL;
 			cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e4..efe2b7cc10 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
-					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+					  "SAVE_ERROR_TO");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
 		COMPLETE_WITH("binary", "csv", "text");
 
+	/* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+		COMPLETE_WITH("none");
+
 	/* Complete COPY <sth> FROM <sth> WITH (<options>) */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
 		COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2f..f890b66f26 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
 	bool		force_null_all; /* FORCE_NULL *? */
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
 	bool		convert_selectively;	/* do selective binary conversion? */
+	char	   *save_error_to;	/* where to save error information */
 	List	   *convert_select; /* list of column names (can be NIL) */
 } CopyFormatOptions;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a907..3744fac017 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
 
 #include "commands/copy.h"
 #include "commands/trigger.h"
+#include "nodes/miscnodes.h"
 
 /*
  * Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
 								 * default value */
 	FmgrInfo   *in_functions;	/* array of input functions for each attrs */
 	Oid		   *typioparams;	/* array of element types for in_functions */
+	ErrorSaveContext *escontext;	/* soft error trapper during in_functions execution */
+	uint64		num_errors; 	/* total number of rows which contained soft errors */
 	int		   *defmap;			/* array of default att numbers related to
 								 * missing att */
 	ExprState **defexprs;		/* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07..4a1777a4fa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
 ERROR:  cannot specify DELIMITER in BINARY mode
 COPY x to stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, save_error_to none);
+ERROR:  cannot specify SAVE_ERROR_TO in BINARY mode
 COPY x to stdin (format TEXT, force_quote(a));
 ERROR:  COPY FORCE_QUOTE requires CSV mode
 COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +96,10 @@ COPY x to stdout (format TEXT, force_null(a));
 ERROR:  COPY FORCE_NULL requires CSV mode
 COPY x to stdin (format CSV, force_null(a));
 ERROR:  COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR:  COPY save_error_to "unsupported" not recognized
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+                                        ^
 -- too many columns in column list: should fail
 COPY x (a, b, c, d, e, d, c) from stdin;
 ERROR:  column "d" specified more than once
@@ -710,6 +716,26 @@ SELECT * FROM instead_of_insert_tbl;
 (2 rows)
 
 COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+WARNING:  4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n |  m  | k 
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR:  invalid input syntax for type widget: "1"
+CONTEXT:  COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR:  missing data for column "k"
+CONTEXT:  COPY check_ign_err, line 1: "1	{1}"
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
@@ -724,6 +750,8 @@ DROP TABLE instead_of_insert_tbl;
 DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
 --
 -- COPY FROM ... DEFAULT
 --
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f6086..17c5764b42 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,12 +70,14 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 -- incorrect options
 COPY x to stdin (format BINARY, delimiter ',');
 COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, save_error_to none);
 COPY x to stdin (format TEXT, force_quote(a));
 COPY x from stdin (format CSV, force_quote(a));
 COPY x to stdout (format TEXT, force_not_null(a));
 COPY x to stdin (format CSV, force_not_null(a));
 COPY x to stdout (format TEXT, force_null(a));
 COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
 
 -- too many columns in column list: should fail
 COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +496,29 @@ test1
 SELECT * FROM instead_of_insert_tbl;
 COMMIT;
 
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1	{1}	1
+a	{2}	2
+3	{3}	3333333333
+4	{a, 4}	4
+
+5	{5}	5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1	{1}
+\.
+
 -- clean up
 DROP TABLE forcetest;
 DROP TABLE vistest;
@@ -508,6 +533,8 @@ DROP TABLE instead_of_insert_tbl;
 DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
 
 --
 -- COPY FROM ... DEFAULT

base-commit: 08c3ad27eb5348d0cbffa843a3edb11534f9904a
-- 
2.39.2

Reply via email to