Sai Sharath Dandi created FLINK-37528: -----------------------------------------
Summary: Protobuf Format (proto3): Handle default values for optional primitive types and primitive types in one of fields Key: FLINK-37528 URL: https://issues.apache.org/jira/browse/FLINK-37528 Project: Flink Issue Type: Improvement Affects Versions: 2.0-preview Reporter: Sai Sharath Dandi The Read Default Values is [forced|https://github.com/apache/flink/blob/master/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java#L74] to be true for primitive types in proto3. This can cause bugs in some cases for messages like below {code:java} oneof test { string aa = 1; int32 bb = 2; bool cc = 3; Corpus dd = 4; } {code} Even if a only the first field is set in the oneOf, reading default values makes it so that all the fields are non-null after decoding. When such data is encoded back to protobuf, it will produce a different protobuf message than the original and cause data correctness issues. solution: {code:java} if (PbFormatUtils.isSimpleType(subType) && !(elementFd.getContainingOneof() != null || elementFd.hasOptionalKeyword())) { readDefaultValues = formatContext.isReadDefaultValuesForPrimitiveTypes(); } {code} For primitive types in proto3, we can still do field presence checks when it is defined an optional field or it is part of a oneOf message. -- This message was sent by Atlassian Jira (v8.20.10#820010)