Hi Etienne, ParquetColumnarRowInputFormat is not fully functional yet, it has a good performance, but it is hard to support complex types, like array and map... So I think a migrated ParquetInputFormat version is required.
Best, Jingsong On Wed, Feb 24, 2021 at 3:43 PM Etienne Chauchot <echauc...@apache.org> wrote: > Hi, > > Thanks guys for the comments ! > > I did not know it was legacy. I will give the new sources a try. > > Jingsong, when you say "migrate ParquetInputFormat to the new BulkFormat > interface", do you mean that the new ParquetColumnarRowInputFormat is > not fully functional yet? > > In the meantime, if you agree, I think I'm still gonna submit a PR for > https://issues.apache.org/jira/browse/FLINK-21393 because I need it on > an urgent task I'm doing. > > Best > > Etienne > > On 24/02/2021 03:41, Peter Huang wrote: > > Hi Jingsong, > > > > Thanks for pointing this out. Actually, I planned to work on changing > > interfaces ParquetTableSource and ParquetInputFormat. > > After refactoring the code, I may also help to fix the issue in > > https://issues.apache.org/jira/browse/FLINK-21468. > > > > Best Regards > > Peter Huang > > > > On Tue, Feb 23, 2021 at 6:35 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > >> Hi Etienne, > >> > >> Thanks for your reporting. > >> > >> There are indeed many problems. There is no doubt that we need to > improve > >> our current format implementation. > >> > >> But ParquetTableSource and ParquetInputFormat are legacy implementations > >> with legacy interfaces. We have introduced new interfaces for execution > and > >> SQL. You can see: > >> - ParquetColumnarRowInputFormat with BulkFormat interface. It is just > for > >> columnar row reading, not support complex types, we need > >> migrate ParquetInputFormat to the new BulkFormat interface. > >> - FileSystemTableSource with DynamicTableSource interface, It is a > generic > >> FileSystem source for all formats, we can just use it for parquet too. > >> > >> Considering ParquetTableSource and ParquetInputFormat are legacy > >> interfaces, I think we can finish migration work first, what do you > think? > >> > >> Best, > >> Jingsong > >> > >> On Wed, Feb 24, 2021 at 12:46 AM Etienne Chauchot <echauc...@apache.org > > > >> wrote: > >> > >>> Hi all, > >>> > >>> I've been playing with Parquet with SQL and Avro lately. I've found > some > >>> bugs: > >>> > >>> 1. https://issues.apache.org/jira/browse/FLINK-21388 : I already > >>> submitted a PR on this one (https://github.com/apache/flink/pull/14961 > ) > >>> > >>> 2. https://issues.apache.org/jira/browse/FLINK-21389 > >>> > >>> 3. https://issues.apache.org/jira/browse/FLINK-21468 > >>> > >>> I've already started to work on this ticket: > >>> https://issues.apache.org/jira/browse/FLINK-21393 > >>> > >>> > >>> I'd be happy to receive your comments on these tickets > >>> > >>> > >>> Best > >>> > >>> Etienne Chauchot > >>> > >>> > >>> > >> -- > >> Best, Jingsong Lee > >> > -- Best, Jingsong Lee