liuchunhua commented on issue #41460: URL: https://github.com/apache/doris/issues/41460#issuecomment-2533482065
读取iceberg v2表be经常在iceberg_read.cpp的_sort_delete_rows函数挂掉, 统计添加日志分析,发行有的data file会读取两次,但是结果不同。 ``` c++ void IcebergTableReader::_sort_delete_rows(std::vector<std::vector<int64_t>*>& delete_rows_array, int64_t num_delete_rows) { LOG(INFO) << "num_delete_rows: " << num_delete_rows; LOG(INFO) << "delete_rows_array size: " << delete_rows_array.size(); if (delete_rows_array.empty()) { return; } if (delete_rows_array.size() == 1) { LOG(INFO) << "delete_rows_array[0] size: " << delete_rows_array.front() ->size(); _iceberg_delete_rows.resize(num_delete_rows); memcpy(&_iceberg_delete_rows[0], &((*delete_rows_array.front())[0]), sizeof(int64_t) * num_delete_rows); LOG(INFO) << "end"; return; } if (delete_rows_array.size() == 2) { _iceberg_delete_rows.resize(num_delete_rows); std::merge(delete_rows_array.front()->begin(), delete_rows_array.front()->end(), delete_rows_array.back()->begin(), delete_rows_array.back()->end(), _iceberg_delete_rows.begin()); LOG(INFO) << "end"; return; } ``` ``` log ~/doris$ grep -E "3833737|3833797" be/log/be.INFO I20241210 22:53:31.775945 3833737 iceberg_reader.cpp:302] date file path: oss://xxxxxx/warehouse/ods/xxxxx/data/gmt_create_month=2024-12/00001-286-cc6330a4-2b6a-4e9d-b76d-b5dc5 2a96007-00001.parquet I20241210 22:53:31.775998 3833737 iceberg_reader.cpp:307] delete file: s3://xxxxxx/warehouse/ods/xxxxxx/data/gmt_create_month=2024-12/00113-5464-e7204633-b48c-4df7-83a8-f6ad27c5 697c-00002-deletes.parquet I20241210 22:53:31.776429 3833797 iceberg_reader.cpp:302] date file path: oss://xxxxxx/warehouse/ods/xxxxxx/data/gmt_create_month=2024-12/00001-286-cc6330a4-2b6a-4e9d-b76d-b5dc5 2a96007-00001.parquet I20241210 22:53:31.776443 3833797 iceberg_reader.cpp:307] delete file: s3://xxxxxx/warehouse/ods/xxxxxxx/data/gmt_create_month=2024-12/00113-5464-e7204633-b48c-4df7-83a8-f6ad27c5 697c-00002-deletes.parquet I20241210 22:53:31.857023 3833737 iceberg_reader.cpp:307] delete file: s3://xxxxxxx/warehouse/ods/xxxxxxx/data/gmt_create_month=2024-12/00113-5445-33eab2a7-294d-4086-bfcd-63eac5ae 0f5e-00002-deletes.parquet I20241210 22:53:31.857031 3833797 iceberg_reader.cpp:307] delete file: s3://xxxxxxx/warehouse/ods/xxxxxx/data/gmt_create_month=2024-12/00113-5445-33eab2a7-294d-4086-bfcd-63eac5ae 0f5e-00002-deletes.parquet I20241210 22:53:31.915529 3833737 iceberg_reader.cpp:403] num_delete_rows: 8 I20241210 22:53:31.915545 3833737 iceberg_reader.cpp:404] delete_rows_array size: 2 I20241210 22:53:31.915551 3833737 iceberg_reader.cpp:421] end I20241210 22:53:31.915912 3833797 iceberg_reader.cpp:403] num_delete_rows: 3 I20241210 22:53:31.915922 3833797 iceberg_reader.cpp:404] delete_rows_array size: 1 I20241210 22:53:31.915927 3833797 iceberg_reader.cpp:409] delete_rows_array[0] size: 18446709446057983746 ``` 怀疑是多线程并发问题, 通过注销掉以下代码后查询正常。 ``` c++ DeleteFile& delete_file_map = *((DeleteFile*)delete_file_cache); auto get_value = [&](const auto& v) { DeleteRows* row_ids = v.second.get(); if (row_ids->size() > 0) { delete_rows_array.emplace_back(row_ids); num_delete_rows += row_ids->size(); // 不释放内存 //erase_data.emplace_back(delete_file_cache); } }; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org