[ https://issues.apache.org/jira/browse/FLINK-33817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817008#comment-17817008 ]
Nathan Taylor Armstrong Lewis commented on FLINK-33817: ------------------------------------------------------- I can confirm that this issue affects version 1.17.x as well. > Allow ReadDefaultValues = False for non primitive types on Proto3 > ----------------------------------------------------------------- > > Key: FLINK-33817 > URL: https://issues.apache.org/jira/browse/FLINK-33817 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Affects Versions: 1.18.0 > Reporter: Sai Sharath Dandi > Priority: Major > Labels: pull-request-available > > *Background* > > The current Protobuf format > [implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java] > always sets ReadDefaultValues=False when using Proto3 version. This can > cause severe performance degradation for large Protobuf schemas with OneOf > fields as the entire generated code needs to be executed during > deserialization even when certain fields are not present in the data to be > deserialized and all the subsequent nested Fields can be skipped. Proto3 > supports hasXXX() methods for checking field presence for non primitive types > since Proto version > [3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In > the internal performance benchmarks in our company, we've seen almost 10x > difference in performance for one of our real production usecase when > allowing to set ReadDefaultValues=False with proto3 version. The exact > difference in performance depends on the schema complexity and data payload > but we should allow user to set readDefaultValue=False in general. > > *Solution* > > Support using ReadDefaultValues=False when using Proto3 version. We need to > be careful to check for field presence only on non-primitive types if > ReadDefaultValues is false and version used is Proto3 -- This message was sent by Atlassian Jira (v8.20.10#820010)