Hi Etienne,

ParquetColumnarRowInputFormat is not fully functional yet, it has a good
performance, but it is hard to support complex types, like array and map...
So I think a migrated ParquetInputFormat version is required.

Best,
Jingsong

On Wed, Feb 24, 2021 at 3:43 PM Etienne Chauchot <echauc...@apache.org>
wrote:

> Hi,
>
> Thanks guys for the comments !
>
> I did not know it was legacy. I will give the new sources a try.
>
> Jingsong, when you say "migrate ParquetInputFormat to the new BulkFormat
> interface", do you mean that the new ParquetColumnarRowInputFormat is
> not fully functional yet?
>
> In the meantime, if you agree, I think I'm still gonna submit a PR for
> https://issues.apache.org/jira/browse/FLINK-21393 because I need it on
> an urgent task I'm doing.
>
> Best
>
> Etienne
>
> On 24/02/2021 03:41, Peter Huang wrote:
> > Hi Jingsong,
> >
> > Thanks for pointing this out. Actually, I planned to work on changing
> > interfaces ParquetTableSource and ParquetInputFormat.
> > After refactoring the code, I may also help to fix the issue in
> > https://issues.apache.org/jira/browse/FLINK-21468.
> >
> > Best Regards
> > Peter Huang
> >
> > On Tue, Feb 23, 2021 at 6:35 PM Jingsong Li <jingsongl...@gmail.com>
> wrote:
> >
> >> Hi Etienne,
> >>
> >> Thanks for your reporting.
> >>
> >> There are indeed many problems. There is no doubt that we need to
> improve
> >> our current format implementation.
> >>
> >> But ParquetTableSource and ParquetInputFormat are legacy implementations
> >> with legacy interfaces. We have introduced new interfaces for execution
> and
> >> SQL. You can see:
> >> - ParquetColumnarRowInputFormat with BulkFormat interface. It is just
> for
> >> columnar row reading, not support complex types, we need
> >> migrate ParquetInputFormat to the new BulkFormat interface.
> >> - FileSystemTableSource with DynamicTableSource interface, It is a
> generic
> >> FileSystem source for all formats, we can just use it for parquet too.
> >>
> >> Considering ParquetTableSource and ParquetInputFormat are legacy
> >> interfaces, I think we can finish migration work first, what do you
> think?
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Wed, Feb 24, 2021 at 12:46 AM Etienne Chauchot <echauc...@apache.org
> >
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I've been playing with Parquet with SQL and Avro lately. I've found
> some
> >>> bugs:
> >>>
> >>> 1. https://issues.apache.org/jira/browse/FLINK-21388 : I already
> >>> submitted a PR on this one (https://github.com/apache/flink/pull/14961
> )
> >>>
> >>> 2. https://issues.apache.org/jira/browse/FLINK-21389
> >>>
> >>> 3. https://issues.apache.org/jira/browse/FLINK-21468
> >>>
> >>> I've already started to work on this ticket:
> >>> https://issues.apache.org/jira/browse/FLINK-21393
> >>>
> >>>
> >>> I'd be happy to receive your comments on these tickets
> >>>
> >>>
> >>> Best
> >>>
> >>> Etienne Chauchot
> >>>
> >>>
> >>>
> >> --
> >> Best, Jingsong Lee
> >>
>


-- 
Best, Jingsong Lee

Reply via email to