On 2024-11-13 22:02, Yugo NAGATA wrote:
On Tue, 12 Nov 2024 17:38:25 +0900
torikoshia <torikos...@oss.nttdata.com> wrote:
On 2024-11-12 14:17, Yugo Nagata wrote:
> On Tue, 12 Nov 2024 14:03:50 +0900
> Yugo Nagata <nag...@sraoss.co.jp> wrote:
>
>> On Tue, 12 Nov 2024 01:27:53 +0500
>> Kirill Reshke <reshkekir...@gmail.com> wrote:
>>
>> > On Mon, 11 Nov 2024 at 16:11, torikoshia <torikos...@oss.nttdata.com>
wrote:
>> > >
>> > > On 2024-11-09 21:55, Kirill Reshke wrote:
>> > >
>> > > Thanks for working on this!
>> >
>> > Thanks for reviewing the v7 patch series!
>> >
>> > > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fu...@oss.nttdata.com>
>> > > > wrote:
>> > > >>
>> > > >>
>> > > >>
>> > > >> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > > >> > when the REJECT LIMIT is set to some non-zero number and the number
of
>> > > >> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > > >> > there WAS errors, and we should not tolerate more than $limit
errors .
>> > > >> > I do find this behavior to be consistent.
>> > > >>
>> > > >> +1
>> > > >>
>> > > >>
>> > > >> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > > >> > replacements, as if REJECT LIMIT is inf.
>> > > >>
>> > > >> +1
>> > > >
>> > > > After thinking for a while, I'm now more opposed to this approach. I
>> > > > think we should count rows with erroneous data as errors only if
>> > > > null substitution for these rows failed, not the total number of rows
>> > > > which were modified.
>> > > > Then, to respect the REJECT LIMIT option, we compare this number with
>> > > > the limit. This is actually simpler approach IMHO. What do You think?
>> > >
>> > > IMHO I prefer the previous interpretation.
>> > > I'm not sure this is what people expect, but I assume that REJECT_LIMIT
>> > > is used to specify how many malformed rows are acceptable in the
>> > > "original" data source.
>>
>> I also prefer the previous version.
>>
>> > I do like the first version of interpretation, but I have a struggle
>> > with it. According to this interpretation, we will fail COPY command
>> > if the number
>> > of malformed data rows exceeds the limit, not the number of rejected
>> > rows (some percentage of malformed rows are accepted with nulnot-null
constraintl
>> > substitution)
I feel your concern is valid.
Currently 'reject' can occur only when converting a column's input
value
to its data type, but if we introduce set_to_null option 'reject' also
occurs when inserting null, i.e. not null constraint.
I can suppose "reject" means failure of COPY command here, that is, a
reject
of executing the command, not an error of data input. If so, we can
interpret
REJECT_LIMIT as "the number of malformed rows allowed before the COPY
command
is REJECTed" (not the number of rejected rows). In this case, I think
we don't
have to rename the option name.
Of course, if there is more proper name that makes it easy for users to
understand the behaviour of the option, renaming should be nice.
>> The documentation says that REJECT_LIMIT "Specifies the maximum number
>> of errors",
>> and there are no wording "reject" in the description, so I wonder it
>> is unclear
>> what means in "REJECT" in REJECT_LIMIT. It may be proper to use
>> ERROR_LIMIT
>> since it is supposed to be used with ON_ERROR.
>>
>> Alternatively, if we emphasize that errors are handled other than
>> terminating
>> the command,perhaps MALFORMED_LIMIT as proposed above or
>> TOLERANCE_LIMIT may be
>> good, for example.
>
> I might misunderstand the meaning of the name. If REJECT_LIMIT means "a
> limit on
> the number of rows with any malformed value allowed before the COPY
> command is
> rejected", we would not have to rename it.
The meaning of REJECT_LIMIT is what you described, and I think Kirill
worries about cases when malformed rows are accepted(=not REJECTed)
with
null substitution. REJECT_LIMIT counts this case as REJECTed.
I am a bit confused.
Me too:)
Let me explain my understanding.
I believe there are now two candidates that count as REJECT_LIMIT
number:
(1) error converting a column's input value into its data type(soft
error)
(2) NULL substitution failure(this comes from the proposed patch)
And I understood Kirill's idea to be the following:
1st idea: REJECT_LIMIT counts (1)
2nd idea: REJECT_LIMIT counts (2)
And I've agreed with the 1st one.
You mean "REJECT" is raising a soft error of data
input here instead of terminating COPY?
Yes.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.