[ 
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Sharath Dandi updated FLINK-33611:
--------------------------------------
    Description: 
h3. Background

Flink serializes and deserializes protobuf format data by calling the decode or 
encode method in GeneratedProtoToRow_XXX.java generated by codegen to parse 
byte[] data into Protobuf Java objects. FLINK-32650 has introduced the ability 
to split the generated code to improve the performance for large Protobuf 
schemas. However, this is still not sufficient to support some larger protobuf 
schemas as the generated code exceeds the java constant pool size 
[limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] and we 
can see errors like "Too many constants" when trying to compile the generated 
code. 

*Solution*

Since we already have the split code functionality already introduced, the main 
proposal here is to now reuse the variable names across different split method 
scopes. This will greatly reduce the constant pool size. One more optimization 
is to only split the last code segment also only when the size exceeds split 
threshold limit. Currently, the last segment of the generated code is always 
being split which can lead to too many split methods and thus exceed the 
constant pool size limit

  was:
h3. Background

Flink serializes and deserializes protobuf format data by calling the decode or 
encode method in GeneratedProtoToRow_XXX.java generated by codegen to parse 
byte[] data into Protobuf Java objects. FLINK-32650 has introduced the ability 
to split the generated code to improve the performance for large Protobuf 
schemas. However, this is still not sufficient to support some larger protobuf 
schemas as the generated code exceeds the java constant pool size 
[limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] and we 
can see errors like "Too many constants" when trying to compile the generated 
code. 

*Solution*

Since we already have the split code functionality already introduced, the main 
proposal here is to now reuse the variable names across different split method 
scopes. This will greatly reduce the constant pool size. One more optimization 
is to only split the last code segment also only when the size exceeds split 
threshold limit. Currently, the last segment of the generated code is always 
being split which can lead to too many split methods.


> Add the ability to reuse variable names across different split method scopes
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-33611
>                 URL: https://issues.apache.org/jira/browse/FLINK-33611
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.18.0
>            Reporter: Sai Sharath Dandi
>            Priority: Major
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the 
> ability to split the generated code to improve the performance for large 
> Protobuf schemas. However, this is still not sufficient to support some 
> larger protobuf schemas as the generated code exceeds the java constant pool 
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] 
> and we can see errors like "Too many constants" when trying to compile the 
> generated code. 
> *Solution*
> Since we already have the split code functionality already introduced, the 
> main proposal here is to now reuse the variable names across different split 
> method scopes. This will greatly reduce the constant pool size. One more 
> optimization is to only split the last code segment also only when the size 
> exceeds split threshold limit. Currently, the last segment of the generated 
> code is always being split which can lead to too many split methods and thus 
> exceed the constant pool size limit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to