李精卫 created FLINK-32650:
---------------------------

             Summary: Added the ability to split flink-protobuf codegen code
                 Key: FLINK-32650
                 URL: https://issues.apache.org/jira/browse/FLINK-32650
             Project: Flink
          Issue Type: Improvement
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.17.0
            Reporter: 李精卫
             Fix For: 1.17.0


Flink serializes and deserializes protobuf format data by calling the decode or 
encode method in GeneratedProtoToRow_XXX.java generated by codegen to parse 
byte[] data into protobuf java objects. The size of the decode/encode codegen 
method body is strongly related to the number of defined fields in protobuf. 
When the number of fields exceeds a certain threshold and the compiled method 
body exceeds 8k, the decode/encode method will not be optimized by JIT, 
seriously affecting serialization or deserialization performance. Even if the 
compiled method body exceeds 64k, it will directly cause the task to fail to 
start.
So I proposed Codegen Splitter for protobuf parsing to split the encode/decode 
method to solve this problem.
The specific idea is as follows. In the current decode/encode method, each 
field defined for the protobuf message is placed in the method body. In fact, 
there are no shared parameters between the fields, so multiple fields can be 
merged and parsed and written into the split method body. If the number of 
strings in the current method body exceeds the threshold, a split method will 
be generated, these fields will be parsed in the split method, and the split 
method will be called in the decode/encode method. By analogy, the 
decode/encode method including the split method is finally generated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to