On 2024-11-12 14:17, Yugo Nagata wrote:
On Tue, 12 Nov 2024 14:03:50 +0900
Yugo Nagata <nag...@sraoss.co.jp> wrote:
On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <reshkekir...@gmail.com> wrote:
> On Mon, 11 Nov 2024 at 16:11, torikoshia <torikos...@oss.nttdata.com> wrote:
> >
> > On 2024-11-09 21:55, Kirill Reshke wrote:
> >
> > Thanks for working on this!
>
> Thanks for reviewing the v7 patch series!
>
> > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fu...@oss.nttdata.com>
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 2024/10/26 6:03, Kirill Reshke wrote:
> > >> > when the REJECT LIMIT is set to some non-zero number and the number of
> > >> > row NULL replacements exceeds the limit, is it OK to fail. Because
> > >> > there WAS errors, and we should not tolerate more than $limit errors .
> > >> > I do find this behavior to be consistent.
> > >>
> > >> +1
> > >>
> > >>
> > >> > But what if we don't set a REJECT LIMIT, it is sane to do all
> > >> > replacements, as if REJECT LIMIT is inf.
> > >>
> > >> +1
> > >
> > > After thinking for a while, I'm now more opposed to this approach. I
> > > think we should count rows with erroneous data as errors only if
> > > null substitution for these rows failed, not the total number of rows
> > > which were modified.
> > > Then, to respect the REJECT LIMIT option, we compare this number with
> > > the limit. This is actually simpler approach IMHO. What do You think?
> >
> > IMHO I prefer the previous interpretation.
> > I'm not sure this is what people expect, but I assume that REJECT_LIMIT
> > is used to specify how many malformed rows are acceptable in the
> > "original" data source.
I also prefer the previous version.
> I do like the first version of interpretation, but I have a struggle
> with it. According to this interpretation, we will fail COPY command
> if the number
> of malformed data rows exceeds the limit, not the number of rejected
> rows (some percentage of malformed rows are accepted with null
> substitution)
I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input value
to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.
> So, a proper name for the limit will be MALFORMED_LIMIT, or something.
> However, we are unable to change the name since the REJECT_LIMIT
> option has already been committed.
> I guess I'll just have to put up with this contradiction. I will send
> an updated patch shortly...
I think we can rename the REJECT_LIMIT option because it is not yet
released.
+1
The documentation says that REJECT_LIMIT "Specifies the maximum number
of errors",
and there are no wording "reject" in the description, so I wonder it
is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use
ERROR_LIMIT
since it is supposed to be used with ON_ERROR.
Alternatively, if we emphasize that errors are handled other than
terminating
the command,perhaps MALFORMED_LIMIT as proposed above or
TOLERANCE_LIMIT may be
good, for example.
I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
limit on
the number of rows with any malformed value allowed before the COPY
command is
rejected", we would not have to rename it.
The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed) with
null substitution. REJECT_LIMIT counts this case as REJECTed.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.