Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

Masahiko Sawada Mon, 18 Dec 2023 17:14:36 -0800

On Mon, Dec 18, 2023 at 4:41 PM jian he <[email protected]> wrote:
>
> On Mon, Dec 18, 2023 at 1:09 PM torikoshia <[email protected]> wrote:
> >
> > Hi,
> >
> > > save the error metadata to  system catalogs would be more expensive,
> > > please see below explanation.
> > > I have no knowledge of publications.
> > > but i feel there is a feature request: publication FOR ALL TABLES
> > > exclude regex_pattern.
> > > Anyway, that would be another topic.
> >
> > I think saving error metadata to system catalog is not a good idea, too.
> > And I believe Sawada-san just pointed out missing features and did not
> > suggested that we use system catalog.
> >
> > > I don't think "specify the maximum number  of errors to tolerate
> > > before raising an ERROR." is very useful....
> >
> > That may be so.
> > I imagine it's useful in some use case since some loading tools have
> > such options.
> > Anyway I agree it's not necessary for initial patch as mentioned in [1].
> >
> > > I suppose we can specify an ERRORFILE directory. similar
> > > implementation [2], demo in [3]
> > > it will generate 2 files, one file shows the malform line content as
> > > is, another file shows the error info.
> >
> > That may be a good option when considering "(2) logging errors to
> > somewhere".
> >
> > What do you think about the proposal to develop these features in
> > incrementally?
> >
>
> I am more with  tom's idea [1], that is when errors happen (data type
> conversion only), do not fail, AND we save the error to a table.  I
> guess we can implement this logic together, only with a new COPY
> option.


If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.

>
> imagine a case (it's not that contrived, imho), while conversion from
> text to table's int, postgres isspace is different from the source
> text file's isspace logic.
> then all the lines are malformed. If we just say on error continue and
> not save error meta info, the user is still confused which field has
> the wrong data, then the user will probably try to incrementally test
> which field contains malformed data.
>
> Since we need to save the error somewhere.
> Everyone has the privilege to INSERT can do COPY.
> I think we also need to handle the access privilege also.
> So like I mentioned above, one copy_error error table hub, then
> everyone can view/select their own copy failure record.

The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

Reply via email to