On 2024-11-12 14:17, Yugo Nagata wrote:
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nag...@sraoss.co.jp> wrote:

On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekir...@gmail.com> wrote:

> On Mon, 11 Nov 2024 at 16:11, torikoshia <torikos...@oss.nttdata.com> wrote:
> >
> > On 2024-11-09 21:55, Kirill Reshke wrote:
> >
> > Thanks for working on this!
>
> Thanks for reviewing the v7 patch series!
>
> > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fu...@oss.nttdata.com>
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 2024/10/26 6:03, Kirill Reshke wrote:
> > >> > when the REJECT LIMIT is set to some non-zero number and the number of
> > >> > row NULL replacements exceeds the limit, is it OK to fail. Because
> > >> > there WAS errors, and we should not tolerate more than $limit errors .
> > >> > I do find this behavior to be consistent.
> > >>
> > >> +1
> > >>
> > >>
> > >> > But what if we don't set a REJECT LIMIT, it is sane to do all
> > >> > replacements, as if REJECT LIMIT is inf.
> > >>
> > >> +1
> > >
> > > After thinking for a while, I'm now more opposed to this approach. I
> > > think we should count rows with erroneous data as errors only if
> > > null substitution for these rows failed, not the total number of rows
> > > which were modified.
> > > Then, to respect the REJECT LIMIT option, we compare this number with
> > > the limit. This is actually simpler approach IMHO. What do You think?
> >
> > IMHO I prefer the previous interpretation.
> > I'm not sure this is what people expect, but I assume that REJECT_LIMIT
> > is used to specify how many malformed rows are acceptable in the
> > "original" data source.

I also prefer the previous version.

> I do like the first version of interpretation, but I have a struggle
> with it. According to this interpretation, we will fail COPY command
> if the number
> of malformed data rows exceeds the limit, not the number of rejected
> rows (some percentage of malformed rows are accepted with null
> substitution)

I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input value to its data type, but if we introduce set_to_null option 'reject' also occurs when inserting null, i.e. not null constraint.

> So, a proper name for the limit will be MALFORMED_LIMIT, or something.
> However, we are unable to change the name since the REJECT_LIMIT
> option has already been committed.
> I guess I'll just have to put up with this contradiction. I will send
> an updated patch shortly...
I think we can rename the REJECT_LIMIT option because it is not yet released.

+1

The documentation says that REJECT_LIMIT "Specifies the maximum number of errors", and there are no wording "reject" in the description, so I wonder it is unclear what means in "REJECT" in REJECT_LIMIT. It may be proper to use ERROR_LIMIT
since it is supposed to be used with ON_ERROR.

Alternatively, if we emphasize that errors are handled other than terminating the command,perhaps MALFORMED_LIMIT as proposed above or TOLERANCE_LIMIT may be
good, for example.

I might misunderstand the meaning of the name. If REJECT_LIMIT means "a limit on the number of rows with any malformed value allowed before the COPY command is
rejected", we would not have to rename it.

The meaning of REJECT_LIMIT is what you described, and I think Kirill worries about cases when malformed rows are accepted(=not REJECTed) with null substitution. REJECT_LIMIT counts this case as REJECTed.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.


Reply via email to