Hi, Shun. 
Thanks for the contribution.  I'll have a look first and then find some 
committers help review & merge.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "sunshun18" <sunshu...@126.com>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期一, 2022年 12 月 05日 上午 11:54:38
主题: Patch to support Parquet schema evolution

Hi there,


I find an null-value issue when using Flink to read parquet files with multi 
versions of schema (V1->V2->V3->..->Vn).
Assuming there are two fileds in given parquet schema as below, and filed F2 
only exist in version 2.


Version1: F1
Version2: F1, F2


Currently the value of filed F2 will be empty when reading data from parquet 
file using schema version2.
I explore the implementation, and find Flink use a collection named 
`unknownFieldsIndices` to track the nonexistent fields, applied to all parquet 
files under given path.


I draft a patch to fix this issue with unit test.


https://issues.apache.org/jira/browse/FLINK-29527
https://github.com/apache/flink/pull/21149


As these PR is pending for a long time, I hope any commitor can help review it 
and provide any feedback if possible.


Thanks!
Shun

Reply via email to