github-actions[bot] commented on code in PR #64205:
URL: https://github.com/apache/doris/pull/64205#discussion_r3425284777


##########
be/src/storage/segment/variant/variant_column_reader.cpp:
##########
@@ -882,8 +882,12 @@ Status VariantColumnReader::_build_read_plan(ReadPlan* 
plan, const TabletColumn&
     }
 
     // Check if path is prefix, example sparse columns path: a.b.c, a.b.e, 
access prefix: a.b.
-    // Or access root path
-    if (_has_prefix_path_unlocked(relative_path)) {
+    // Or access root path. If sparse stats reached the configured limit, an 
exact sparse path can
+    // still have unrecorded sparse children such as a.b.c.
+    const bool has_prefix_path = _has_prefix_path_unlocked(relative_path);
+    const bool sparse_stats_may_have_unrecorded_children =
+            exceeded_sparse_column_limit && existed_in_sparse_column;
+    if (has_prefix_path || sparse_stats_may_have_unrecorded_children) {

Review Comment:
   This only treats truncated stats as ambiguous when the exact target path is 
recorded in sparse stats, but the same ambiguity exists when the target path is 
a materialized leaf and an unrecorded sparse child exists. A concrete shape is 
`variant_max_subcolumns_count = 1`, sparse stats limit = 1, with `b` having the 
highest non-null count so it is kept as an extracted leaf, while overflow 
sparse paths include `a` and `b.c`; `serialize_sparse_columns()` iterates the 
remaining `std::map`, so the single recorded sparse stat can be `a` and `b.c` 
is physically present but absent from stats. For `v['b']`, `node` is the leaf 
for `b`, `_has_prefix_path_unlocked("b")` is false because there is no child 
node and no recorded `b.` sparse prefix, and `existed_in_sparse_column` is 
false, so this branch is skipped and `_try_build_leaf_plan()` returns only the 
`b` leaf. Rows whose `b` value only comes from sparse `b.c` are not merged. 
When sparse stats are truncated, a materialized exact node also n
 eeds a hierarchical read unless there is a persisted proof that no sparse 
child with this prefix exists.



##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -3597,6 +3597,7 @@ public void 
checkAnnIndexCandidateRowsPercentThreshold(String value) {
     @VarAttrDef.VarAttr(
             name = DEFAULT_VARIANT_MAX_SPARSE_COLUMN_STATISTICS_SIZE,
             needForward = true,
+            checker = "checkDefaultVariantMaxSparseColumnStatisticsSize",
             fuzzy = true

Review Comment:
   This checker rejects new `SET` statements, but it does not cover upgraded 
metadata that was valid before this PR. `SessionVariable.readFromJson()` loads 
persisted int fields directly, and `VariableMgr.replayGlobalVariableV2()` 
catches and ignores checker failures, so an image/edit log containing 
`default_variant_max_sparse_column_statistics_size = 0` can still leave new 
sessions with value 0 after upgrade. Existing table metadata can also already 
contain 0, as shown by this PR changing the nested schema-change expected 
output from 0 to 1. Those zero-valued tables/sessions still reach BE, where a 
writer with limit 0 records no sparse stats and the reader no longer treats 
empty stats plus a sparse reader as truncated. Please add a compatibility 
normalization/migration for persisted zero values, or keep a BE compatibility 
branch for `limit == 0` while still rejecting new user input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to