On 2024/10/02 9:27, Masahiko Sawada wrote:
Sorry for being late in joining the review of this patch. Both 0001 and 0003 look good to me. I have two comments on the 0002 patch:
Thanks for the review!
I think that while scanning a file_fdw foreign table with log_verbosity='silent' the query is not interruptible.
You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
Also, we don't switch to the per-tuple memory context when retrying due to a soft error. I'm not sure it's okay as in CopyFrom(), a similar function for COPY command, we switch to the per-tuple memory context every time before parsing an input line. Would it be problematic if we switch to another memory context while parsing an input line? In CopyFrom() we also call ResetPerTupleExprContext() and ExecClearTuple() for every input, so we might want to consider calling them for every input.
Yes, I've updated the patch based on your comment. Could you please review the latest version? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
From 2c455d62aad84267e987b07a1287fd979abc0995 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi <torikos...@oss.nttdata.com> Date: Wed, 25 Sep 2024 21:28:15 +0900 Subject: [PATCH v7 1/3] Add log_verbosity = 'silent' support to COPY command. Previously, when the on_error option was set to ignore, the COPY command would always log NOTICE messages for input rows discarded due to data type incompatibility. Users had no way to suppress these messages. This commit introduces a new log_verbosity setting, 'silent', which prevents the COPY command from emitting NOTICE messages when on_error = 'ignore' is used, even if rows are discarded. This feature is particularly useful when processing malformed files frequently, where a flood of NOTICE messages can be undesirable. For example, when frequently loading malformed files via the COPY command or querying foreign tables using file_fdw (with an upcoming patch to add on_error support for file_fdw), users may prefer to suppress these messages to reduce log noise and improve clarity. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- doc/src/sgml/ref/copy.sgml | 10 +++++++--- src/backend/commands/copy.c | 4 +++- src/backend/commands/copyfrom.c | 3 ++- src/bin/psql/tab-complete.c | 2 +- src/include/commands/copy.h | 4 +++- src/test/regress/expected/copy2.out | 4 +++- src/test/regress/sql/copy2.sql | 4 ++++ 7 files changed, 23 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index fdbd20bc50..58a14bc427 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable <literal>verbose</literal>, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input conversion has failed is emitted for each discarded row. + When it is set to <literal>silent</literal>, no message is emitted + regarding ignored rows. </para> </listitem> </varlistentry> @@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable <listitem> <para> Specify the amount of messages emitted by a <command>COPY</command> - command: <literal>default</literal> or <literal>verbose</literal>. If - <literal>verbose</literal> is specified, additional messages are emitted - during processing. + command: <literal>default</literal>, <literal>verbose</literal>, or + <literal>silent</literal>. + If <literal>verbose</literal> is specified, additional messages are + emitted during processing. + <literal>silent</literal> suppresses both verbose and default messages. </para> <para> This is currently used in <command>COPY FROM</command> command when diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3bb579a3a4..03eb7a4eba 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) char *sval; /* - * Allow "default", or "verbose" values. + * Allow "silent", "default", or "verbose" values. */ sval = defGetString(def); + if (pg_strcasecmp(sval, "silent") == 0) + return COPY_LOG_VERBOSITY_SILENT; if (pg_strcasecmp(sval, "default") == 0) return COPY_LOG_VERBOSITY_DEFAULT; if (pg_strcasecmp(sval, "verbose") == 0) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 2d3462913e..47879994f7 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate) error_context_stack = errcallback.previous; if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0) + cstate->num_errors > 0 && + cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) ereport(NOTICE, errmsg_plural("%llu row was skipped due to data type incompatibility", "%llu rows were skipped due to data type incompatibility", diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index a7ccde6d7d..6530b0f1ce 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end) /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY")) - COMPLETE_WITH("default", "verbose"); + COMPLETE_WITH("silent", "default", "verbose"); /* Complete COPY <sth> FROM <sth> WITH (<options>) */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny)) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 141fd48dc1..6f64d97fdd 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -45,7 +45,9 @@ typedef enum CopyOnErrorChoice */ typedef enum CopyLogVerbosityChoice { - COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is + * the default, assign 0 */ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ } CopyLogVerbosityChoice; diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 61a19cdc4c..4e752977b5 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose); NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input CONTEXT: COPY check_ign_err2 NOTICE: 1 row was skipped due to data type incompatibility +COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent); -- reset context choice \set SHOW_CONTEXT errors SELECT * FROM check_ign_err; @@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2; n | m | k | l ---+-----+---+------- 1 | {1} | 1 | 'foo' -(1 row) + 3 | {3} | 3 | 'bar' +(2 rows) -- test datatype error that can't be handled as soft: should fail CREATE TABLE hard_err(foo widget); diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 8b14962194..fa6aa17344 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose); 1 {1} 1 'foo' 2 {2} 2 \N \. +COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent); +3 {3} 3 'bar' +4 {4} 4 \N +\. -- reset context choice \set SHOW_CONTEXT errors -- 2.45.2
From f4458a36698997da23f813f858e6680c3e1daefa Mon Sep 17 00:00:00 2001 From: Fujii Masao <fu...@postgresql.org> Date: Mon, 30 Sep 2024 23:05:26 +0900 Subject: [PATCH v7 2/3] file_fdw: Add on_error and log_verbosity options to file_fdw. In v17, the on_error and log_verbosity options were introduced for the COPY command. This commit extends support for these options to file_fdw. Setting on_error = 'ignore' for a file_fdw foreign table allows users to query it without errors, even when the input file contains malformed rows, by skipping the problematic rows. Both on_error and log_verbosity options apply to SELECT and ANALYZE operations on file_fdw foreign tables. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- contrib/file_fdw/expected/file_fdw.out | 19 +++++ contrib/file_fdw/file_fdw.c | 107 +++++++++++++++++++++---- contrib/file_fdw/sql/file_fdw.sql | 7 ++ doc/src/sgml/file-fdw.sgml | 23 ++++++ 4 files changed, 140 insertions(+), 16 deletions(-) diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out index 86c148a86b..593fdc782e 100644 --- a/contrib/file_fdw/expected/file_fdw.out +++ b/contrib/file_fdw/expected/file_fdw.out @@ -206,6 +206,25 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a; SELECT * FROM agg_bad; -- ERROR ERROR: invalid input syntax for type real: "aaa" CONTEXT: COPY agg_bad, line 3, column b: "aaa" +-- on_error and log_verbosity tests +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +SELECT * FROM agg_bad; +NOTICE: 1 row was skipped due to data type incompatibility + a | b +-----+-------- + 100 | 99.097 + 42 | 324.78 +(2 rows) + +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent'); +SELECT * FROM agg_bad; + a | b +-----+-------- + 100 | 99.097 + 42 | 324.78 +(2 rows) + +ANALYZE agg_bad; -- misc query tests \t on SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv'); diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index d16821f8e1..043204c3e7 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -22,6 +22,7 @@ #include "catalog/pg_authid.h" #include "catalog/pg_foreign_table.h" #include "commands/copy.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "commands/explain.h" #include "commands/vacuum.h" @@ -74,6 +75,8 @@ static const struct FileFdwOption valid_options[] = { {"null", ForeignTableRelationId}, {"default", ForeignTableRelationId}, {"encoding", ForeignTableRelationId}, + {"on_error", ForeignTableRelationId}, + {"log_verbosity", ForeignTableRelationId}, {"force_not_null", AttributeRelationId}, {"force_null", AttributeRelationId}, @@ -723,38 +726,74 @@ fileIterateForeignScan(ForeignScanState *node) FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state; EState *estate = CreateExecutorState(); ExprContext *econtext; - MemoryContext oldcontext; + MemoryContext oldcontext = CurrentMemoryContext; TupleTableSlot *slot = node->ss.ss_ScanTupleSlot; - bool found; + CopyFromState cstate = festate->cstate; ErrorContextCallback errcallback; /* Set up callback to identify error line number. */ errcallback.callback = CopyFromErrorCallback; - errcallback.arg = (void *) festate->cstate; + errcallback.arg = (void *) cstate; errcallback.previous = error_context_stack; error_context_stack = &errcallback; /* - * The protocol for loading a virtual tuple into a slot is first - * ExecClearTuple, then fill the values/isnull arrays, then - * ExecStoreVirtualTuple. If we don't find another row in the file, we - * just skip the last step, leaving the slot empty as required. - * * We pass ExprContext because there might be a use of the DEFAULT option * in COPY FROM, so we may need to evaluate default expressions. */ - ExecClearTuple(slot); econtext = GetPerTupleExprContext(estate); +retry: + /* * DEFAULT expressions need to be evaluated in a per-tuple context, so * switch in case we are doing that. */ - oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); - found = NextCopyFrom(festate->cstate, econtext, - slot->tts_values, slot->tts_isnull); - if (found) + MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); + + /* + * The protocol for loading a virtual tuple into a slot is first + * ExecClearTuple, then fill the values/isnull arrays, then + * ExecStoreVirtualTuple. If we don't find another row in the file, we + * just skip the last step, leaving the slot empty as required. + * + */ + ExecClearTuple(slot); + + if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull)) + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. + */ + cstate->escontext->error_occurred = false; + + /* Switch back to original memory context */ + MemoryContextSwitchTo(oldcontext); + + /* + * Make sure we are interruptible while repeatedly calling + * NextCopyFrom() until no soft error occurs. + */ + CHECK_FOR_INTERRUPTS(); + + /* + * Reset the per-tuple exprcontext, to clean-up after expression + * evaluations etc. + */ + ResetPerTupleExprContext(estate); + + /* Repeat NextCopyFrom() until no soft error occurs */ + goto retry; + } + ExecStoreVirtualTuple(slot); + } /* Switch back to original memory context */ MemoryContextSwitchTo(oldcontext); @@ -796,8 +835,19 @@ fileEndForeignScan(ForeignScanState *node) FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state; /* if festate is NULL, we are in EXPLAIN; nothing to do */ - if (festate) - EndCopyFrom(festate->cstate); + if (!festate) + return; + + if (festate->cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + festate->cstate->num_errors > 0 && + festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) festate->cstate->num_errors, + (unsigned long long) festate->cstate->num_errors)); + + EndCopyFrom(festate->cstate); } /* @@ -1113,7 +1163,8 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel, * which must have at least targrows entries. * The actual number of rows selected is returned as the function result. * We also count the total number of rows in the file and return it into - * *totalrows. Note that *totaldeadrows is always set to 0. + * *totalrows. Rows skipped due to on_error = 'ignore' are not included + * in this count. Note that *totaldeadrows is always set to 0. * * Note that the returned list of rows is not always in order by physical * position in the file. Therefore, correlation estimates derived later @@ -1191,6 +1242,21 @@ file_acquire_sample_rows(Relation onerel, int elevel, if (!found) break; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. + */ + cstate->escontext->error_occurred = false; + + /* Repeat NextCopyFrom() until no soft error occurs */ + continue; + } + /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample until we @@ -1236,6 +1302,15 @@ file_acquire_sample_rows(Relation onerel, int elevel, /* Clean up. */ MemoryContextDelete(tupcontext); + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->num_errors > 0 && + cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + EndCopyFrom(cstate); pfree(values); diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql index f0548e14e1..edd77c5cd2 100644 --- a/contrib/file_fdw/sql/file_fdw.sql +++ b/contrib/file_fdw/sql/file_fdw.sql @@ -150,6 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a; -- error context report tests SELECT * FROM agg_bad; -- ERROR +-- on_error and log_verbosity tests +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +SELECT * FROM agg_bad; +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent'); +SELECT * FROM agg_bad; +ANALYZE agg_bad; + -- misc query tests \t on SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv'); diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml index f2f2af9a59..bb3579b077 100644 --- a/doc/src/sgml/file-fdw.sgml +++ b/doc/src/sgml/file-fdw.sgml @@ -126,6 +126,29 @@ </listitem> </varlistentry> + <varlistentry> + <term><literal>on_error</literal></term> + + <listitem> + <para> + Specifies how to behave when encountering an error converting a column's + input value into its data type, + the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>log_verbosity</literal></term> + + <listitem> + <para> + Specifies the amount of messages emitted by <literal>file_fdw</literal>, + the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option. + </para> + </listitem> + </varlistentry> + </variablelist> <para> -- 2.45.2
From 6c132dc601d979ed8ffa4fadc0af9a8c73f88fce Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi <torikos...@oss.nttdata.com> Date: Wed, 25 Sep 2024 21:30:26 +0900 Subject: [PATCH v7 3/3] Refactor CopyFrom() in copyfrom.c. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit simplifies CopyFrom() by removing the unnecessary local variable 'skipped', which tracked the number of rows skipped due to on_error = 'ignore'. That count is already handled by cstate->num_errors, so the 'skipped' variable was redundant. Additionally, the condition on_error != COPY_ON_ERROR_STOP is removed. Since on_error == COPY_ON_ERROR_IGNORE is already checked, and on_error only has two values (ignore and stop), the additional check was redundant and made the logic harder to read. Seemingly this was introduced in preparation for a future patch, but the current checks don’t offer clear value and have been removed to improve readability. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- src/backend/commands/copyfrom.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 47879994f7..9139a40785 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate) CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */ int64 processed = 0; int64 excluded = 0; - int64 skipped = 0; bool has_before_insert_row_trig; bool has_instead_insert_row_trig; bool leafpart_use_multi_insert = false; @@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate) if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull)) break; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && cstate->escontext->error_occurred) { /* - * Soft error occurred, skip this tuple and deal with error - * information according to ON_ERROR. + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) - - /* - * Just make ErrorSaveContext ready for the next NextCopyFrom. - * Since we don't set details_wanted and error_data is not to - * be filled, just resetting error_occurred is enough. - */ - cstate->escontext->error_occurred = false; + cstate->escontext->error_occurred = false; /* Report that this tuple was skipped by the ON_ERROR clause */ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, - ++skipped); + cstate->num_errors); + /* Repeat NextCopyFrom() until no soft error occurs */ continue; } -- 2.45.2