[jira] [Commented] (FLINK-33611) Support Large Protobuf Schemas

Benchao Li (Jira) Mon, 25 Dec 2023 03:41:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800303#comment-17800303
 ]


Benchao Li commented on FLINK-33611:
------------------------------------

[~dsaisharath] Sure, I'll review the PR. I may respond slow since I'm 
participating in the community in my spare time.

> Support Large Protobuf Schemas
> ------------------------------
>
>                 Key: FLINK-33611
>                 URL: https://issues.apache.org/jira/browse/FLINK-33611
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.18.0
>            Reporter: Sai Sharath Dandi
>            Assignee: Sai Sharath Dandi
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the 
> ability to split the generated code to improve the performance for large 
> Protobuf schemas. However, this is still not sufficient to support some 
> larger protobuf schemas as the generated code exceeds the java constant pool 
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] 
> and we can see errors like "Too many constants" when trying to compile the 
> generated code. 
> *Solution*
> Since we already have the split code functionality already introduced, the 
> main proposal here is to now reuse the variable names across different split 
> method scopes. This will greatly reduce the constant pool size. One more 
> optimization is to only split the last code segment also only when the size 
> exceeds split threshold limit. Currently, the last segment of the generated 
> code is always being split which can lead to too many split methods and thus 
> exceed the constant pool size limit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33611) Support Large Protobuf Schemas

Reply via email to