On 2024-11-13 23:17, Yugo NAGATA wrote:
On Wed, 13 Nov 2024 21:48:10 +0900
torikoshia <torikos...@oss.nttdata.com> wrote:

On 2024-11-12 14:51, Yugo Nagata wrote:

Thanks for your review!

> On Tue, 12 Nov 2024 10:16:50 +0900
> torikoshia <torikos...@oss.nttdata.com> wrote:
>
>> On 2024-11-12 01:49, Fujii Masao wrote:
>> > On 2024/11/11 21:45, torikoshia wrote:
>> >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT
>> >>> can be
>> >>> a single-quoted string. However, it might also be helpful to mention
>> >>> that
>> >>> it can be provided as an int64 in the COPY command option. How about
>> >>> updating it like this?
>> >>>
>> >>> ------------------------------------
>> >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
>> >>> command
>> >>> option or as a single-quoted string for the foreign table option
>> >>> using
>> >>> file_fdw. Therefore this function needs to handle both formats.
>> >>> ------------------------------------
>> >>
>> >> Thanks! it seems better.
>> >>
>> >>
>> >> Attached v3 patch.
>> >
>> > Thanks for updating the patch! It looks like you forgot to attach it,
>> > though.
>>
>> Oops, thanks for pointing it out.
>> Here it is.
>
> I have one minor comment.
>
> +                          ereport(ERROR,
> +                                          
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
> +                                           errmsg("skipped more than 
REJECT_LIMIT (%lld) rows due to data
> type incompatibility",
> +                                                          (long long) 
cstate->opts.reject_limit)));
>
> Do we have to keep this message consistent with ones of COPY command?
> With foreign data wrappers, it seems common that the option name is
> passed in
> lowercase, so is it better to use "skipped more than reject_limit ..."
> rather
> than using uppercase?

Agreed.
Attached v4 patch.


Thank you for updating the patch.

However, I noticed that file_fdw could output the following error
using uppercase
when an invalid value is set to the reject_limit option,

 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '-1');
 ERROR:  REJECT_LIMIT (-1) must be greater than zero

as well as if reject_limit is set when on_error != 'ignore'.

 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
 ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE

Also, in the regression tests, I found similar behaviours on existing options,
for example;

 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'xml');  -- ERROR
 ERROR:  COPY format "xml" not recognized
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'text', quote ':');          -- ERROR
 ERROR:  COPY QUOTE requires CSV mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
'text', escape ':');         -- ERROR
 ...

These messages may be unexpected for some users because the documentation of fild_fdw does not explicitly describe that file_fdw uses COPY internally.
(I can find several wordings like "as COPY", though.)
However, since the current file_fdw already has such messages, I came to think making the messages on "reject_limit" consistent with COPY is reasonable.
I mean, the previous version of the message seems fine.

Agreed. Thanks for your investigation.

As another comment, do we need to add validator test for on_error and
reject_limit as similar to other options?

That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or "on_error, log_verbosity and reject_limit tests" section, but I chose "validator test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to reject_limit but just on_error.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
From 661937869d696bc640ff54c97631c27671502e60 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Tue, 19 Nov 2024 21:06:37 +0900
Subject: [PATCH v5 1/2] file_fdw: Add reject_limit option to file_fdw

Commit 4ac2a9bece introduced REJECT_LIMIT option for COPY. This patch
extends support for this option to file_fdw.

As well as REJECT_LIMIT option for COPY, this option limits the
maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
reject_limit, the access to the file_fdw foreign table will fail
with an error, even when on_error = 'ignore'.

Based on the synopsis of the CREATE/ALTER FOREIGN TABLE commands, the
value for the foreign table's option must be single-quoted.
To accommodate this, the patch modifies defGetCopyRejectLimitOption
to accept not only int64 but also strings.
---
 contrib/file_fdw/data/agg.bad          |  1 +
 contrib/file_fdw/expected/file_fdw.out | 18 ++++++++++++++++--
 contrib/file_fdw/file_fdw.c            |  8 ++++++++
 contrib/file_fdw/sql/file_fdw.sql      |  7 ++++++-
 doc/src/sgml/file-fdw.sgml             | 12 ++++++++++++
 src/backend/commands/copy.c            | 16 +++++++++++++++-
 6 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/contrib/file_fdw/data/agg.bad b/contrib/file_fdw/data/agg.bad
index 3415b15007..04279ce55b 100644
--- a/contrib/file_fdw/data/agg.bad
+++ b/contrib/file_fdw/data/agg.bad
@@ -2,3 +2,4 @@
 100;@99.097@
 0;@aaa@
 42;@324.78@
+1;@bbb@
diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 593fdc782e..4f63c047ec 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,6 +90,8 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 ERROR:  either filename or program is required for file_fdw foreign tables
 \set filename :abs_srcdir '/data/agg.data'
@@ -206,10 +208,10 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
-NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  2 rows were skipped due to data type incompatibility
   a  |   b    
 -----+--------
  100 | 99.097
@@ -224,6 +226,18 @@ SELECT * FROM agg_bad;
   42 | 324.78
 (2 rows)
 
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type incompatibility
+CONTEXT:  COPY agg_bad, line 5, column b: "bbb"
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
 ANALYZE agg_bad;
 -- misc query tests
 \t on
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 043204c3e7..9e2896f32a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -77,6 +77,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"encoding", ForeignTableRelationId},
 	{"on_error", ForeignTableRelationId},
 	{"log_verbosity", ForeignTableRelationId},
+	{"reject_limit", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
 	{"force_null", AttributeRelationId},
 
@@ -788,6 +789,13 @@ retry:
 			 */
 			ResetPerTupleExprContext(estate);
 
+			if (cstate->opts.reject_limit > 0 &&
+				cstate->num_errors > cstate->opts.reject_limit)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+						 errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data type incompatibility",
+								(long long) cstate->opts.reject_limit)));
+
 			/* Repeat NextCopyFrom() until no soft error occurs */
 			goto retry;
 		}
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index edd77c5cd2..4805ca8c04 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,6 +77,7 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter
 ');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
 \set filename :abs_srcdir '/data/agg.data'
@@ -150,11 +151,15 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
--- on_error and log_verbosity tests
+-- on_error, log_verbosity and reject_limit tests
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
 SELECT * FROM agg_bad;
 ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
 SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (SET reject_limit '2');
+SELECT * FROM agg_bad;
 ANALYZE agg_bad;
 
 -- misc query tests
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index bb3579b077..882d9a76d2 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -138,6 +138,18 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>reject_limit</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the maximum number of errors tolerated while converting a column's
+     input value to its data type, the same as <command>COPY</command>'s
+    <literal>REJECT_LIMIT</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
   <varlistentry>
    <term><literal>log_verbosity</literal></term>
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663..2d98ecf3f4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -420,11 +420,25 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 
 /*
  * Extract REJECT_LIMIT value from a DefElem.
+ *
+ * REJECT_LIMIT can be specified in two ways: as an int64 for the COPY command
+ * option or as a single-quoted string for the foreign table option using
+ * file_fdw. Therefore this function needs to handle both formats.
  */
 static int64
 defGetCopyRejectLimitOption(DefElem *def)
 {
-	int64		reject_limit = defGetInt64(def);
+	int64		reject_limit;
+
+	if (def->arg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("%s requires a numeric value",
+						def->defname)));
+	else if (nodeTag(def->arg) == T_String)
+		reject_limit = pg_strtoint64(strVal(def->arg));
+	else
+		reject_limit = defGetInt64(def);
 
 	if (reject_limit <= 0)
 		ereport(ERROR,
-- 
2.39.2

From ab3c38571761507a9e1b3073ffa9c2fdbc50c7de Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Tue, 19 Nov 2024 21:12:52 +0900
Subject: [PATCH v5 2/2] Add validator tests for on_error options

---
 contrib/file_fdw/expected/file_fdw.out | 4 ++++
 contrib/file_fdw/sql/file_fdw.sql      | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 4f63c047ec..f383aec6d8 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -90,6 +90,10 @@ ERROR:  COPY delimiter cannot be newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'aaa');       -- ERROR
+ERROR:  COPY ON_ERROR "aaa" not recognized
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql
index 4805ca8c04..1a76aebde6 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -77,6 +77,8 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter
 ');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'aaa');       -- ERROR
+CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1');       -- ERROR
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
 
-- 
2.39.2

Reply via email to