On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA <[email protected]> wrote:
> On Sun, 28 Jan 2024 19:14:58 -0700
> "David G. Johnston" <[email protected]> wrote:
>
> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> > > COPY FROM raises an error when the number of input column does not match
> > > to the table schema, but this error is not ignored by ON_ERROR while
> > > this seems to fall into the category of "invalid input syntax".
> >
> >
> >
> > It is literally the error text that appears if one were not to ignore it.
> > It isn’t a category of errors. But I’m open to ideas here. But being
> > explicit with what on actually sees in the system seemed preferable to
> > inventing new classification terms not otherwise used.
>
> Thank you for explanation! I understood the words was from the error messages
> that users actually see. However, as Torikoshi-san said in [1], errors other
> than valid input syntax (e.g. range error) can be also ignored, therefore it
> would be better to describe to be ignored errors more specifically.
>
> [1]
> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
>
> >
> > >
> > > So, keeping consistency with the existing description, we can say:
> > >
> > > "Specifies which how to behave when encountering an error due to
> > > column values unacceptable to the input function of each attribute's
> > > data type."
> >
> >
> > Yeah, I was considering something along those lines as an option as well.
> > But I’d rather add that wording to the glossary.
>
> Although I am still be not convinced if we have to introduce the words
> "soft error" to the documentation, I don't care it if there are no other
> opposite opinions.
Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".
> >
> > > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> > > it more simply without introducing the new concept, "soft error" to users.
> > >
> > >
> > Good point. Seems we should define what user-facing errors are ignored
> > anywhere in the system and if we aren’t consistently leveraging these in
> > all areas/commands make the necessary qualifications in those specific
> > places.
> >
>
> > > I think "left in a deleted state" is also unclear for users because this
> > > explains the internal state but not how looks from user's view.How about
> > > leaving the explanation "These rows will not be visible or accessible" in
> > > the existing statement?
> > >
> >
> > Just visible then, I don’t like an “or” there and as tuples at least they
> > are accessible to the system, in vacuum especially. But I expected the
> > user to understand “as if you deleted it” as their operational concept more
> > readily than visible. I think this will be read by people who haven’t read
> > MVCC to fully understand what visible means but know enough to run vacuum
> > to clean up updated and deleted data as a rule.
>
> Ok, I agree we can omit "or accessible". How do you like the followings?
> Still redundant?
>
> "If the command fails, these rows are left in a deleted state;
> these rows will not be visible, but they still occupy disk space. "
Also, the above statement is used in the patch.
Regards,
Yugo Nagata
> Regards,
> Yugo Nagata
>
> --
> Yugo NAGATA <[email protected]>
>
>
--
Yugo NAGATA <[email protected]>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..10cfc3f0ad 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
in the <structname>pg_stat_progress_copy</structname> view. See
<xref linkend="copy-progress-reporting"/> for details.
</para>
+
+ <para>
+ By default, <command>COPY</command> will fail if it encounters an error
+ during processing. For use cases where a best-effort attempt at loading
+ the entire file is desired, the <literal>ON_ERROR</literal> clause can
+ be used to specify some other behavior.
+ </para>
</refsect1>
<refsect1>
@@ -378,17 +385,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>ON_ERROR</literal></term>
<listitem>
<para>
- Specifies which <replaceable class="parameter">
- error_action</replaceable> to perform when there is malformed data in the input.
- Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
- values are supported.
- If the <literal>stop</literal> value is specified,
- <command>COPY</command> stops operation at the first error.
- If the <literal>ignore</literal> value is specified,
- <command>COPY</command> skips malformed data and continues copying data.
- The option is allowed only in <command>COPY FROM</command>.
- Only <literal>stop</literal> value is allowed when
- using <literal>binary</literal> format.
+ Specifies which how to behave when encountering an error due to column values
+ unacceptable to the input function of each attribute's data type.
+ An <replaceable class="parameter">error_action</replaceable> value of
+ <literal>stop</literal> means fail the command, while
+ <literal>ignore</literal> means discard the input row and continue with the next one.
+ The default is <literal>stop</literal>
+ </para>
+ <para>
+ The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+ </para>
+ <para>
+ An <literal>NOTICE</literal> level context message containing the ignored row count is
+ emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
</para>
</listitem>
</varlistentry>
@@ -576,15 +586,13 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error when
- <literal>ON_ERROR</literal> is not specified. This
- should not lead to problems in the event of a <command>COPY
- TO</command>, but the target table will already have received
- earlier rows in a <command>COPY FROM</command>. These rows will not
- be visible or accessible, but they still occupy disk space. This might
- amount to a considerable amount of wasted disk space if the failure
- happened well into a large copy operation. You might wish to invoke
- <command>VACUUM</command> to recover the wasted space.
+ The <command>COPY FROM</command> command physically inserts input rows
+ into the table as it progresses. If the command fails, these rows are
+ left in a deleted state; these rows will not be visible, but still
+ occupying disk space. This might amount to a considerable amount of
+ wasted disk space if the failure happened well into a large copy
+ operation. <command>VACUUM</command> should be used to recover the
+ wasted space.
</para>
<para>