github-actions[bot] commented on code in PR #64010:
URL: https://github.com/apache/doris/pull/64010#discussion_r3388021729
##########
be/src/storage/segment/segment_iterator.cpp:
##########
@@ -586,6 +470,48 @@ Status SegmentIterator::_init_impl(const
StorageReadOptions& opts) {
return Status::OK();
}
+void SegmentIterator::_init_schema_block_id_map() {
+ _schema_block_id_map.assign(_schema->columns().size(), -1);
+ for (int i = 0; i < _schema->num_column_ids(); i++) {
+ auto cid = _schema->column_id(i);
+ _schema_block_id_map[cid] = i;
+ }
+}
+
+void SegmentIterator::_init_project_schema() {
+ _init_schema_block_id_map();
+ if (_opts.project_columns == nullptr || *_opts.project_columns ==
_schema->column_ids()) {
Review Comment:
这里现在有明确的生产路径会为 null:不经过 `BetaRowsetReader` 的直接 `Segment::new_iterator()`
调用不会设置 `StorageReadOptions::project_columns`,然后 `SegmentIterator::init()` 会进入
`_init_project_schema()` 并触发这个 `DORIS_CHECK`。例如
`be/src/storage/task/index_builder.cpp:563-570` 直接构造 `StorageReadOptions
read_options` 后调用 `seg_ptr->new_iterator(schema, read_options,
&iter)`;`be/src/storage/rowset/segcompaction.cpp:93-115` 也是同样模式。这些路径不经过 rowset
reader 的 `origin_return_columns` 赋值,所以需要在所有直接构造 `StorageReadOptions` 的路径补上
`project_columns`,或者让无投影扩展的 direct SegmentIterator 路径从 `_schema->column_ids()`
安全派生 project schema。
##########
be/src/storage/rowset/beta_rowset_reader.cpp:
##########
@@ -100,6 +100,9 @@ Status
BetaRowsetReader::get_segment_iterators(RowsetReaderContext* read_context
_read_options.stats = _stats;
_read_options.push_down_agg_type_opt =
_read_context->push_down_agg_type_opt;
_read_options.common_expr_ctxs_push_down =
_read_context->common_expr_ctxs_push_down;
+ _read_options.project_columns = _read_context->origin_return_columns !=
nullptr
Review Comment:
这里也有直接的 `RowsetReaderContext`
路径还没有设置:`be/src/storage/schema_change/schema_change.cpp:1057-1076` 和
`be/src/cloud/cloud_schema_change_job.cpp:258-280` 都是手动构造 `reader_context` 后直接
`rs_reader->init(&reader_context)`,当前只设置了 `return_columns`,没有设置
`origin_return_columns`。这些路径进入 `BetaRowsetReader::get_segment_iterators()`
后会直接触发这里的 `DORIS_CHECK(_read_context->origin_return_columns != nullptr)`,schema
change/rollup 读历史 rowset 会失败。需要在这些 direct `RowsetReaderContext` 调用点补
`reader_context.origin_return_columns = &return_columns`,或者在 rowset reader
层对无扩展路径做明确派生。
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]