Hi Etienne,

Sorry for the late reply,

I just merged your bug fixing.
I think you can submit a PR for release-1.12.

Best,
Jingsong

On Fri, Mar 12, 2021 at 12:22 AM Etienne Chauchot <echauc...@apache.org>
wrote:

> Hi,
>
> I forgot to mention that I submitted the new ParquetAvroInputFormat to
> master (1.13) but it is made to work for 1.12.x (last release) also and
> I'm using it with Flink 1.12.x.
>
> Maybe it could be a good candidate to be included in an upcoming 1.12.3
> release, WDYT ?
>
> Best
>
> Etienne
>
> On 11/03/2021 17:17, Etienne Chauchot wrote:
> >
> > Hi all,
> >
> > I just submitted another parquet PR that adds ParquetAvroInputFormat
> > (I'm using it in a benchmark I'm coding). If anyone is interested in
> > reviewing it, be my guest:
> >
> > https://github.com/apache/flink/pull/15156
> >
> > I have also an older parquet PR that fixes a format conversion bug
> > that is waiting for merge if anyone can review it also (already 1
> > approval of a non-committer, thanks @HuangZhenQiu
> > <https://github.com/HuangZhenQiu>):
> >
> > https://github.com/apache/flink/pull/14961
> >
> > If I have time, I'll also tackle the other parquet tickets that I
> > opened lately
> >
> > Best
> >
> > Etienne
> >
> > On 25/02/2021 08:34, Jingsong Li wrote:
> >> Hi Etienne,
> >>
> >> ParquetColumnarRowInputFormat is not fully functional yet, it has a good
> >> performance, but it is hard to support complex types, like array and
> map...
> >> So I think a migrated ParquetInputFormat version is required.
> >>
> >> Best,
> >> Jingsong
> >>
> >> On Wed, Feb 24, 2021 at 3:43 PM Etienne Chauchot<echauc...@apache.org>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Thanks guys for the comments !
> >>>
> >>> I did not know it was legacy. I will give the new sources a try.
> >>>
> >>> Jingsong, when you say "migrate ParquetInputFormat to the new
> BulkFormat
> >>> interface", do you mean that the new ParquetColumnarRowInputFormat is
> >>> not fully functional yet?
> >>>
> >>> In the meantime, if you agree, I think I'm still gonna submit a PR for
> >>> https://issues.apache.org/jira/browse/FLINK-21393  because I need it
> on
> >>> an urgent task I'm doing.
> >>>
> >>> Best
> >>>
> >>> Etienne
> >>>
> >>> On 24/02/2021 03:41, Peter Huang wrote:
> >>>> Hi Jingsong,
> >>>>
> >>>> Thanks for pointing this out. Actually, I planned to work on changing
> >>>> interfaces ParquetTableSource and ParquetInputFormat.
> >>>> After refactoring the code, I may also help to fix the issue in
> >>>> https://issues.apache.org/jira/browse/FLINK-21468.
> >>>>
> >>>> Best Regards
> >>>> Peter Huang
> >>>>
> >>>> On Tue, Feb 23, 2021 at 6:35 PM Jingsong Li<jingsongl...@gmail.com>
> >>> wrote:
> >>>>> Hi Etienne,
> >>>>>
> >>>>> Thanks for your reporting.
> >>>>>
> >>>>> There are indeed many problems. There is no doubt that we need to
> >>> improve
> >>>>> our current format implementation.
> >>>>>
> >>>>> But ParquetTableSource and ParquetInputFormat are legacy
> implementations
> >>>>> with legacy interfaces. We have introduced new interfaces for
> execution
> >>> and
> >>>>> SQL. You can see:
> >>>>> - ParquetColumnarRowInputFormat with BulkFormat interface. It is just
> >>> for
> >>>>> columnar row reading, not support complex types, we need
> >>>>> migrate ParquetInputFormat to the new BulkFormat interface.
> >>>>> - FileSystemTableSource with DynamicTableSource interface, It is a
> >>> generic
> >>>>> FileSystem source for all formats, we can just use it for parquet
> too.
> >>>>>
> >>>>> Considering ParquetTableSource and ParquetInputFormat are legacy
> >>>>> interfaces, I think we can finish migration work first, what do you
> >>> think?
> >>>>> Best,
> >>>>> Jingsong
> >>>>>
> >>>>> On Wed, Feb 24, 2021 at 12:46 AM Etienne Chauchot <
> echauc...@apache.org
> >>>>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I've been playing with Parquet with SQL and Avro lately. I've found
> >>> some
> >>>>>> bugs:
> >>>>>>
> >>>>>> 1.https://issues.apache.org/jira/browse/FLINK-21388  : I already
> >>>>>> submitted a PR on this one (
> https://github.com/apache/flink/pull/14961
> >>> )
> >>>>>> 2.https://issues.apache.org/jira/browse/FLINK-21389
> >>>>>>
> >>>>>> 3.https://issues.apache.org/jira/browse/FLINK-21468
> >>>>>>
> >>>>>> I've already started to work on this ticket:
> >>>>>> https://issues.apache.org/jira/browse/FLINK-21393
> >>>>>>
> >>>>>>
> >>>>>> I'd be happy to receive your comments on these tickets
> >>>>>>
> >>>>>>
> >>>>>> Best
> >>>>>>
> >>>>>> Etienne Chauchot
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> Best, Jingsong Lee
> >>>>>
>


-- 
Best, Jingsong Lee

Reply via email to