Hi all,
The fix (https://issues.apache.org/jira/browse/FLINK-21388) is now also
available for flink 1.12 also (thanks Jingsong for merging the
cherrypick PR)
But before releasing 1.12 branch, I'd like this other PR to be merged:
https://github.com/apache/flink/pull/15156 that introduces
ParquetAvroInputFormat.
In this PR I just added deep field copy (in case the source schema is
multi-level) and fixed serialization issues that I found testing on
flink 1.11. It should be ready for review.
Once it is merged I'll cherry pick to 1.12 branch.
=> When is the next 1.12 release scheduled ? Do we have enough time to
include this second parquet feature ?
Best
Etienne Chauchot
On 12/03/2021 15:31, Etienne Chauchot wrote:
Hi Jingsong,
I just submitted a cherry-pick PR
https://github.com/apache/flink/pull/15172 of (1) to release-.1.12 branch
[1] https://github.com/apache/flink/pull/14961
Etienne
On 12/03/2021 14:55, Etienne Chauchot wrote:
Hi Jingsong,
No problem for the delay. Thanks for merging the first parquet PR.
I'll submit the 2 PRs to 1.12 when they're all merged to master. For
that, I just have to submit a PR against this branch:
https://github.com/apache/flink/tree/release-1.12 ?
Best,
Etienne
On 12/03/2021 03:56, Jingsong Li wrote:
Hi Etienne,
Sorry for the late reply,
I just merged your bug fixing.
I think you can submit a PR for release-1.12.
Best,
Jingsong
On Fri, Mar 12, 2021 at 12:22 AM Etienne Chauchot
<echauc...@apache.org>
wrote:
Hi,
I forgot to mention that I submitted the new ParquetAvroInputFormat to
master (1.13) but it is made to work for 1.12.x (last release) also
and
I'm using it with Flink 1.12.x.
Maybe it could be a good candidate to be included in an upcoming
1.12.3
release, WDYT ?
Best
Etienne
On 11/03/2021 17:17, Etienne Chauchot wrote:
Hi all,
I just submitted another parquet PR that adds ParquetAvroInputFormat
(I'm using it in a benchmark I'm coding). If anyone is interested in
reviewing it, be my guest:
https://github.com/apache/flink/pull/15156
I have also an older parquet PR that fixes a format conversion bug
that is waiting for merge if anyone can review it also (already 1
approval of a non-committer, thanks @HuangZhenQiu
<https://github.com/HuangZhenQiu>):
https://github.com/apache/flink/pull/14961
If I have time, I'll also tackle the other parquet tickets that I
opened lately
Best
Etienne
On 25/02/2021 08:34, Jingsong Li wrote:
Hi Etienne,
ParquetColumnarRowInputFormat is not fully functional yet, it has
a good
performance, but it is hard to support complex types, like array and
map...
So I think a migrated ParquetInputFormat version is required.
Best,
Jingsong
On Wed, Feb 24, 2021 at 3:43 PM Etienne
Chauchot<echauc...@apache.org>
wrote:
Hi,
Thanks guys for the comments !
I did not know it was legacy. I will give the new sources a try.
Jingsong, when you say "migrate ParquetInputFormat to the new
BulkFormat
interface", do you mean that the new
ParquetColumnarRowInputFormat is
not fully functional yet?
In the meantime, if you agree, I think I'm still gonna submit a
PR for
https://issues.apache.org/jira/browse/FLINK-21393 because I need it
on
an urgent task I'm doing.
Best
Etienne
On 24/02/2021 03:41, Peter Huang wrote:
Hi Jingsong,
Thanks for pointing this out. Actually, I planned to work on
changing
interfaces ParquetTableSource and ParquetInputFormat.
After refactoring the code, I may also help to fix the issue in
https://issues.apache.org/jira/browse/FLINK-21468.
Best Regards
Peter Huang
On Tue, Feb 23, 2021 at 6:35 PM Jingsong
Li<jingsongl...@gmail.com>
wrote:
Hi Etienne,
Thanks for your reporting.
There are indeed many problems. There is no doubt that we need to
improve
our current format implementation.
But ParquetTableSource and ParquetInputFormat are legacy
implementations
with legacy interfaces. We have introduced new interfaces for
execution
and
SQL. You can see:
- ParquetColumnarRowInputFormat with BulkFormat interface. It
is just
for
columnar row reading, not support complex types, we need
migrate ParquetInputFormat to the new BulkFormat interface.
- FileSystemTableSource with DynamicTableSource interface, It
is a
generic
FileSystem source for all formats, we can just use it for parquet
too.
Considering ParquetTableSource and ParquetInputFormat are legacy
interfaces, I think we can finish migration work first, what
do you
think?
Best,
Jingsong
On Wed, Feb 24, 2021 at 12:46 AM Etienne Chauchot <
echauc...@apache.org
wrote:
Hi all,
I've been playing with Parquet with SQL and Avro lately. I've
found
some
bugs:
1.https://issues.apache.org/jira/browse/FLINK-21388 : I already
submitted a PR on this one (
https://github.com/apache/flink/pull/14961
)
2.https://issues.apache.org/jira/browse/FLINK-21389
3.https://issues.apache.org/jira/browse/FLINK-21468
I've already started to work on this ticket:
https://issues.apache.org/jira/browse/FLINK-21393
I'd be happy to receive your comments on these tickets
Best
Etienne Chauchot
--
Best, Jingsong Lee