Copilot commented on code in PR #16670:
URL: https://github.com/apache/pinot/pull/16670#discussion_r2293651578
##########
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java:
##########
@@ -381,7 +381,9 @@ protected byte[] serializeStringDictionary()
int numEntriesAdded = 0;
for (String entry : _stringDictionary) {
Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically(numEntriesAdded);
- byte[] valueBytes = entry.getBytes(UTF_8);
+ // It is strange to find nulls in the dictionary, but it could happen
when
+ // string arrays contain nulls
+ byte[] valueBytes = entry == null ? new byte[0] : entry.getBytes(UTF_8);
dataOutputStream.writeInt(valueBytes.length);
dataOutputStream.write(valueBytes);
Review Comment:
Converting null strings to empty byte arrays during serialization will cause
data loss. When deserialized, empty strings will be read back instead of null
values. This breaks the round-trip integrity of null values in string arrays.
```suggestion
if (entry == null) {
dataOutputStream.writeBoolean(false); // null marker
} else {
dataOutputStream.writeBoolean(true); // non-null marker
byte[] valueBytes = entry.getBytes(UTF_8);
dataOutputStream.writeInt(valueBytes.length);
dataOutputStream.write(valueBytes);
}
```
##########
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java:
##########
@@ -381,7 +381,9 @@ protected byte[] serializeStringDictionary()
int numEntriesAdded = 0;
for (String entry : _stringDictionary) {
Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically(numEntriesAdded);
- byte[] valueBytes = entry.getBytes(UTF_8);
+ // It is strange to find nulls in the dictionary, but it could happen
when
+ // string arrays contain nulls
Review Comment:
[nitpick] The comment suggests uncertainty about whether nulls in string
arrays are a valid use case. Consider clarifying whether this is expected
behavior or documenting the design decision to convert nulls to empty strings.
```suggestion
// Nulls in the dictionary can occur when string arrays contain nulls.
// By design, nulls are serialized as empty byte arrays (i.e.,
converted to empty strings).
// This ensures consistent handling during serialization and
deserialization.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]