github-actions[bot] commented on code in PR #63564:
URL: https://github.com/apache/doris/pull/63564#discussion_r3295777359


##########
be/src/core/column/column_string.h:
##########
@@ -298,7 +298,10 @@ class ColumnStr final : public COWHelper<IColumn, 
ColumnStr<T>> {
             running_offset += lengths[i];
             offsets_ptr[i] = static_cast<T>(running_offset);
         }
-        chars.resize(offsets[old_rows + num - 1]);
+        // OFFSET_ONLY columns carry valid offsets but no real string payload. 
Use non-zero
+        // placeholders so char-padding shrink logic cannot recompute these 
offsets as zero-length
+        // strings when this column is nested under a struct that also 
contains CHAR fields.
+        chars.resize_fill(offsets[old_rows + num - 1], 1);

Review Comment:
   This makes every OFFSET_ONLY string read write one synthetic byte for every 
logical byte in the column. The previous `resize()` only advanced the chars 
size after allocating, while `resize_fill(..., 1)` touches the whole appended 
range. For a query such as `select length(big_string_col) ...` with nested 
pruning enabled, the BE still only needs offsets, but this now performs O(total 
string bytes) memory writes per block and can dominate the scan for large 
values, even though the CHAR/struct shrink issue only applies to the later 
`shrink_padding_chars()` path. Please keep the general OFFSET_ONLY path sparse 
and fix the shrink path more narrowly, e.g. by preventing shrink from 
recomputing offsets for offset-only string children or only materializing 
placeholders when that specific shrink path is actually required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to