On Mon, 29 Jan 2024 15:47:25 +0900 Yugo NAGATA <nag...@sraoss.co.jp> wrote:
> On Sun, 28 Jan 2024 19:14:58 -0700 > "David G. Johnston" <david.g.johns...@gmail.com> wrote: > > > > Also, I think "invalid input syntax" is a bit ambiguous. For example, > > > COPY FROM raises an error when the number of input column does not match > > > to the table schema, but this error is not ignored by ON_ERROR while > > > this seems to fall into the category of "invalid input syntax". > > > > > > > > It is literally the error text that appears if one were not to ignore it. > > It isn’t a category of errors. But I’m open to ideas here. But being > > explicit with what on actually sees in the system seemed preferable to > > inventing new classification terms not otherwise used. > > Thank you for explanation! I understood the words was from the error messages > that users actually see. However, as Torikoshi-san said in [1], errors other > than valid input syntax (e.g. range error) can be also ignored, therefore it > would be better to describe to be ignored errors more specifically. > > [1] > https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com > > > > > > > > > So, keeping consistency with the existing description, we can say: > > > > > > "Specifies which how to behave when encountering an error due to > > > column values unacceptable to the input function of each attribute's > > > data type." > > > > > > Yeah, I was considering something along those lines as an option as well. > > But I’d rather add that wording to the glossary. > > Although I am still be not convinced if we have to introduce the words > "soft error" to the documentation, I don't care it if there are no other > opposite opinions. Attached is a updated patch v3, which is a version that uses the above wording instead of "soft error". > > > > > Currently, ON_ERROR doesn't support other soft errors, so it can explain > > > it more simply without introducing the new concept, "soft error" to users. > > > > > > > > Good point. Seems we should define what user-facing errors are ignored > > anywhere in the system and if we aren’t consistently leveraging these in > > all areas/commands make the necessary qualifications in those specific > > places. > > > > > > I think "left in a deleted state" is also unclear for users because this > > > explains the internal state but not how looks from user's view.How about > > > leaving the explanation "These rows will not be visible or accessible" in > > > the existing statement? > > > > > > > Just visible then, I don’t like an “or” there and as tuples at least they > > are accessible to the system, in vacuum especially. But I expected the > > user to understand “as if you deleted it” as their operational concept more > > readily than visible. I think this will be read by people who haven’t read > > MVCC to fully understand what visible means but know enough to run vacuum > > to clean up updated and deleted data as a rule. > > Ok, I agree we can omit "or accessible". How do you like the followings? > Still redundant? > > "If the command fails, these rows are left in a deleted state; > these rows will not be visible, but they still occupy disk space. " Also, the above statement is used in the patch. Regards, Yugo Nagata > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nag...@sraoss.co.jp> > > -- Yugo NAGATA <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..10cfc3f0ad 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable in the <structname>pg_stat_progress_copy</structname> view. See <xref linkend="copy-progress-reporting"/> for details. </para> + + <para> + By default, <command>COPY</command> will fail if it encounters an error + during processing. For use cases where a best-effort attempt at loading + the entire file is desired, the <literal>ON_ERROR</literal> clause can + be used to specify some other behavior. + </para> </refsect1> <refsect1> @@ -378,17 +385,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable <term><literal>ON_ERROR</literal></term> <listitem> <para> - Specifies which <replaceable class="parameter"> - error_action</replaceable> to perform when there is malformed data in the input. - Currently, only <literal>stop</literal> (default) and <literal>ignore</literal> - values are supported. - If the <literal>stop</literal> value is specified, - <command>COPY</command> stops operation at the first error. - If the <literal>ignore</literal> value is specified, - <command>COPY</command> skips malformed data and continues copying data. - The option is allowed only in <command>COPY FROM</command>. - Only <literal>stop</literal> value is allowed when - using <literal>binary</literal> format. + Specifies which how to behave when encountering an error due to column values + unacceptable to the input function of each attribute's data type. + An <replaceable class="parameter">error_action</replaceable> value of + <literal>stop</literal> means fail the command, while + <literal>ignore</literal> means discard the input row and continue with the next one. + The default is <literal>stop</literal> + </para> + <para> + The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. + </para> + <para> + An <literal>NOTICE</literal> level context message containing the ignored row count is + emitted at the end of the <command>COPY FROM</command> if at least one row was discarded. </para> </listitem> </varlistentry> @@ -576,15 +586,13 @@ COPY <replaceable class="parameter">count</replaceable> </para> <para> - <command>COPY</command> stops operation at the first error when - <literal>ON_ERROR</literal> is not specified. This - should not lead to problems in the event of a <command>COPY - TO</command>, but the target table will already have received - earlier rows in a <command>COPY FROM</command>. These rows will not - be visible or accessible, but they still occupy disk space. This might - amount to a considerable amount of wasted disk space if the failure - happened well into a large copy operation. You might wish to invoke - <command>VACUUM</command> to recover the wasted space. + The <command>COPY FROM</command> command physically inserts input rows + into the table as it progresses. If the command fails, these rows are + left in a deleted state; these rows will not be visible, but still + occupying disk space. This might amount to a considerable amount of + wasted disk space if the failure happened well into a large copy + operation. <command>VACUUM</command> should be used to recover the + wasted space. </para> <para>