nsivarajan opened a new issue, #64124: URL: https://github.com/apache/doris/issues/64124
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 4.1 Decoupled mode ### What's Wrong? BE crashes with SIGSEGV when writing to an Iceberg table that has a sort order defined (ORDER BY), if the data volume is large enough to trigger multiple file flushes. The crash occurs in `MergeSorterState::update_batch_size() with a null pointer dereference at address 0x0.` ### What You Expected? INSERT OVERWRITE into a sorted Iceberg table should complete successfully regardless of data volume. ### How to Reproduce? -- Source: TPC-DS 10TB store_sales (~2.8B rows) in internal catalog -- Target: Iceberg table with ORDER BY CREATE TABLE store_sales_opt (...) ENGINE = ICEBERG_EXTERNAL_TABLE ORDER BY (`ss_sold_date_sk` ASC NULLS FIRST) LOCATION 'oss://...'; -- Triggers crash when data volume causes multiple file flushes INSERT OVERWRITE TABLE iceberg_hms.tpcds10t.store_sales_opt SELECT * FROM internal.tpcds10t.store_sales; Key conditions: - Iceberg table must have a sort order (ORDER BY) - Data volume must exceed the target file size threshold, causing at least two consecutive flushes within a single write task ### Anything Else? **Root cause**: MergeSorterState::reset() was not clearing _queue between flush cycles. After the first file was flushed and the sorter reset, stale cursors remained in the queue. On the next flush, update_batch_size() dereferenced those stale cursor pointers → SIGSEGV. **Stack**: write() → _flush_to_file() → _write_sorted_data() → _merge_sort_read_impl() → update_batch_size() @ sort_cursor.h:430 **Fix**: clear _queue and reset _num_rows in MergeSorterState::reset(). Also move _update_spill_block_batch_row_count() before append_block(), which clears the source block after copying. ``` 2026-05-04 10:17:37 INFO [Thread-7] c.a.j.j.NativeLogger:34 - JdoAliyunMetaClient.cpp:236] Successfully get Secrets with AccessKeyId: STS.NZfRo9uGMPd4VqBKyrkcNDYkh from http://100.100.100.200/latest/meta-data/ram/security-credentials/dev-eval-decoupled-dev-role 2026-05-04 10:17:37 INFO [Thread-7] c.a.j.j.NativeLogger:34 - JdoAuthStsCredentialsProvider.cpp:136] Update auth retry 0: key updated = 1, secret updated = 1, token updated = 1 2026-05-04 10:17:37 INFO [Thread-7] c.a.j.j.NativeLogger:34 - JdoAuthStsCredentialsProvider.cpp:162] Auth updated, current time: 1777904257734, updated time: 1777904257734, force update: 1, accessKeyId: STS.NZfRo9uGMPd4VqBKyrkcNDYkh, time elapsed: 7.702743MS 2026-05-04 10:17:37 INFO [Thread-20] c.a.j.c.FsStats:18 - cmd=open, src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m3.avro, dst=null, size=0, parameter=hasGetFileLength:true,readProfile:columnar, time-in-ms=556, version=6.10.4-nextarch 2026-05-04 10:17:37 INFO [Thread-20] c.a.j.c.FsStats:18 - cmd=read, src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m3.avro, dst=null, size=93145, parameter=byteReaded:93145,byteNeeded:93145,readTimes:15,BackendRequestCountTotal:0,uuid:6583b276-d663-4cc7-ae9b-7eceb6effcda, time-in-ms=65, version=6.10.4-nextarch 2026-05-04 10:18:02 INFO [Thread-22] c.a.j.o.a.HadoopLoginUserInfo:49 - User: hadoop, authMethod: SIMPLE, ugi: hadoop (auth:SIMPLE) 2026-05-04 10:18:02 INFO [Thread-22] c.a.j.c.JindoHadoopSystem:257 - Initialized native file system: true, userName: hadoop, authMethod: SIMPLE 2026-05-04 10:18:02 INFO [Thread-22] c.a.j.c.FsStats:18 - cmd=open, src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m0.avro, dst=null, size=0, parameter=hasGetFileLength:true,readProfile:columnar, time-in-ms=7, version=6.10.4-nextarch 2026-05-04 10:18:02 INFO [Thread-22] c.a.j.c.FsStats:18 - cmd=read, src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m0.avro, dst=null, size=253833, parameter=byteReaded:253833,byteNeeded:253833,readTimes:40,BackendRequestCountTotal:0,uuid:c820f3c3-fbd2-4618-91ae-b60b6a0659f7, time-in-ms=10, version=6.10.4-nextarch *** Query id: c3db8feece39405d-8ac88ac72a7487d5 *** *** is nereids: 1 *** *** tablet id: 0 *** *** Aborted at 1778338941 (unix time) try "date -d @1778338941" if you are using GNU date *** *** Current BE git commitID: 635a6e1c302 *** *** SIGSEGV unknown detail explain (@0x0) received by PID 203408 (TID 463290 OR 0x6fe5e7d95700) from PID 0; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/common/signal_handler.h:420 1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.10] in /usr/java/applejdk-17.0.9.9.1/lib/server/[libjvm.so](http://libjvm.so/) 2# JVM_handle_linux_signal in /usr/java/applejdk-17.0.9.9.1/lib/server/[libjvm.so](http://libjvm.so/) 3# 0x0000700258ADBCF0 in /lib64/[libpthread.so](http://libpthread.so/).0 4# doris::SortingQueueBatch<doris::MergeSortCursor>::update_batch_size() at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sort_cursor.h:430 5# doris::MergeSorterState::_merge_sort_read_impl(int, doris::Block*, bool*) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sorter.cpp:123 6# doris::FullSorter::get_next(doris::RuntimeState*, doris::Block*, bool*) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sorter.cpp:270 7# doris::VIcebergSortWriter::_write_sorted_data() at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:184 8# doris::VIcebergSortWriter::_flush_to_file() at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:170 9# doris::VIcebergSortWriter::write(doris::Block&) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:72 10# doris::VIcebergTableWriter::_write_prepared_block(doris::Block&) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_table_writer.cpp:329 11# doris::VIcebergTableWriter::write(doris::RuntimeState*, doris::Block&) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_table_writer.cpp:213 12# doris::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) in /ngs/app/doris/doris-current/be/lib/doris_be 13# std::_Function_handler<void (), doris::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0>::_M_invoke(std::_Any_data const&) at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292 14# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/util/threadpool.cpp:623 15# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/util/thread.cpp:461 16# start_thread in /lib64/[libpthread.so](http://libpthread.so/).0 17# __clone in /lib64/[libc.so](http://libc.so/).6 ``` ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
