yx-keith commented on PR #64152:
URL: https://github.com/apache/doris/pull/64152#issuecomment-4667434898
The fix is correct: write/read_column_to_pb now serialize/deserialize the
logical value via get_element()/serialize()/deserialize() + insert_value(),
which is consistent with the existing write_column_to_arrow,
write_column_to_orc, to_string paths in this file, and with the bitmap/hll
SerDes. The added set_id(PGenericType::QUANTILE_STATE) and the switch to
assert_cast are also correct. The same-class bug does not exist in the
bitmap/hll SerDes — they were already correct — so QuantileState was the only
outlier.
Four points to address before merge:
deserialize return value is ignored. QuantileState::deserialize returns
bool. Since this PR fixes a data-corruption path, malformed input should
surface as an error rather than be silently accepted:
QuantileState value;
if (!value.deserialize(Slice(arg.bytes_value(i)))) {
return Status::InternalError("Failed to deserialize QuantileState from
pb");
}
col.insert_value(std::move(value));
The test's equality check is partial. Comparing only
get_value_by_percentile(0.5/1.0) samples a few quantile points and can miss
corruption that preserves those points. A round-trip test can assert exact
equality by serializing both elements and comparing the byte strings
(optionally also get_serialized_size()). Keep the percentile checks as a
supplement if desired.
This is a wire-format change. The protobuf payload for QuantileState changes
from fixed-size raw memory to variable-length serialized bytes. In a rolling
upgrade, old and new BEs are incompatible on the RPC UDF / RPC aggregate /
fold-constant paths. Real-world risk is low because the old format was already
broken (it transmitted pointer values, not data) and never produced correct
cross-node results. This PR currently has no backport labels — if it is
backported, please call out the format change so the merge can ensure affected
nodes are upgraded together.
No SQL-level regression test. The unit test covers the SerDe round-trip but
not the actual RPC UDF / fold-constant path from the issue. If a reproducing
query exists, adding a regression test (fold-constant on a constant
QuantileState expression is the easiest to construct) would cover the
end-to-end path. Optional
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]