nsivarajan opened a new issue, #64124:
URL: https://github.com/apache/doris/issues/64124

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   4.1 Decoupled mode
   
   ### What's Wrong?
   
   BE crashes with SIGSEGV when writing to an Iceberg table that has a sort 
order defined (ORDER BY), if the data volume is large enough to trigger 
multiple file flushes. The crash occurs in
     `MergeSorterState::update_batch_size() with a null pointer dereference at 
address 0x0.`
   
   ### What You Expected?
   
   INSERT OVERWRITE into a sorted Iceberg table should complete successfully 
regardless of data volume.
   
   ### How to Reproduce?
   
   -- Source: TPC-DS 10TB store_sales (~2.8B rows) in internal catalog
     -- Target: Iceberg table with ORDER BY
   
     CREATE TABLE store_sales_opt (...)
     ENGINE = ICEBERG_EXTERNAL_TABLE
     ORDER BY (`ss_sold_date_sk` ASC NULLS FIRST)
     LOCATION 'oss://...';
   
     -- Triggers crash when data volume causes multiple file flushes
     INSERT OVERWRITE TABLE iceberg_hms.tpcds10t.store_sales_opt
     SELECT * FROM internal.tpcds10t.store_sales;
   
     Key conditions:
     - Iceberg table must have a sort order (ORDER BY)
     - Data volume must exceed the target file size threshold, causing at least 
two consecutive flushes within a single write task
   
   
   ### Anything Else?
   
   **Root cause**: MergeSorterState::reset() was not clearing _queue between 
flush cycles. After the first file was flushed and the sorter reset, stale 
cursors remained in the queue. On the next flush,
     update_batch_size() dereferenced those stale cursor pointers → SIGSEGV.
   
     **Stack**: write() → _flush_to_file() → _write_sorted_data() → 
_merge_sort_read_impl() → update_batch_size() @ sort_cursor.h:430
   
     **Fix**: clear _queue and reset _num_rows in MergeSorterState::reset(). 
Also move _update_spill_block_batch_row_count() before append_block(), which 
clears the source block after copying.
   
   
   ```
   2026-05-04 10:17:37 INFO  [Thread-7] c.a.j.j.NativeLogger:34 - 
JdoAliyunMetaClient.cpp:236] Successfully get Secrets with AccessKeyId: 
STS.NZfRo9uGMPd4VqBKyrkcNDYkh from 
http://100.100.100.200/latest/meta-data/ram/security-credentials/dev-eval-decoupled-dev-role
   2026-05-04 10:17:37 INFO  [Thread-7] c.a.j.j.NativeLogger:34 - 
JdoAuthStsCredentialsProvider.cpp:136] Update auth retry 0: key updated = 1, 
secret updated = 1, token updated = 1
   2026-05-04 10:17:37 INFO  [Thread-7] c.a.j.j.NativeLogger:34 - 
JdoAuthStsCredentialsProvider.cpp:162] Auth updated, current time: 
1777904257734, updated time: 1777904257734, force update: 1, accessKeyId: 
STS.NZfRo9uGMPd4VqBKyrkcNDYkh, time elapsed: 7.702743MS
   2026-05-04 10:17:37 INFO  [Thread-20] c.a.j.c.FsStats:18 - cmd=open, 
src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m3.avro,
 dst=null, size=0, parameter=hasGetFileLength:true,readProfile:columnar, 
time-in-ms=556, version=6.10.4-nextarch
   2026-05-04 10:17:37 INFO  [Thread-20] c.a.j.c.FsStats:18 - cmd=read, 
src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m3.avro,
 dst=null, size=93145, 
parameter=byteReaded:93145,byteNeeded:93145,readTimes:15,BackendRequestCountTotal:0,uuid:6583b276-d663-4cc7-ae9b-7eceb6effcda,
 time-in-ms=65, version=6.10.4-nextarch
   2026-05-04 10:18:02 INFO  [Thread-22] c.a.j.o.a.HadoopLoginUserInfo:49 - 
User: hadoop, authMethod: SIMPLE, ugi: hadoop (auth:SIMPLE)
   2026-05-04 10:18:02 INFO  [Thread-22] c.a.j.c.JindoHadoopSystem:257 - 
Initialized native file system: true, userName: hadoop, authMethod: SIMPLE
   2026-05-04 10:18:02 INFO  [Thread-22] c.a.j.c.FsStats:18 - cmd=open, 
src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m0.avro,
 dst=null, size=0, parameter=hasGetFileLength:true,readProfile:columnar, 
time-in-ms=7, version=6.10.4-nextarch
   2026-05-04 10:18:02 INFO  [Thread-22] c.a.j.c.FsStats:18 - cmd=read, 
src=oss://doris-eval-coupled-dev/iceberg/warehouse/iceberg_tt_test.db/store_sales_tt/metadata/b0469cad-c8dd-4ff7-9601-49a82905457b-m0.avro,
 dst=null, size=253833, 
parameter=byteReaded:253833,byteNeeded:253833,readTimes:40,BackendRequestCountTotal:0,uuid:c820f3c3-fbd2-4618-91ae-b60b6a0659f7,
 time-in-ms=10, version=6.10.4-nextarch
   *** Query id: c3db8feece39405d-8ac88ac72a7487d5 ***
   *** is nereids: 1 ***
   *** tablet id: 0 ***
   *** Aborted at 1778338941 (unix time) try "date -d @1778338941" if you are 
using GNU date ***
   *** Current BE git commitID: 635a6e1c302 ***
   *** SIGSEGV unknown detail explain (@0x0) received by PID 203408 (TID 463290 
OR 0x6fe5e7d95700) from PID 0; stack trace: ***
    0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/common/signal_handler.h:420
    1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.10] 
in /usr/java/applejdk-17.0.9.9.1/lib/server/[libjvm.so](http://libjvm.so/)
    2# JVM_handle_linux_signal in 
/usr/java/applejdk-17.0.9.9.1/lib/server/[libjvm.so](http://libjvm.so/)
    3# 0x0000700258ADBCF0 in /lib64/[libpthread.so](http://libpthread.so/).0
    4# doris::SortingQueueBatch<doris::MergeSortCursor>::update_batch_size() at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sort_cursor.h:430
    5# doris::MergeSorterState::_merge_sort_read_impl(int, doris::Block*, 
bool*) at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sorter.cpp:123
    6# doris::FullSorter::get_next(doris::RuntimeState*, doris::Block*, bool*) 
at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sort/sorter.cpp:270
    7# doris::VIcebergSortWriter::_write_sorted_data() at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:184
    8# doris::VIcebergSortWriter::_flush_to_file() at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:170
    9# doris::VIcebergSortWriter::write(doris::Block&) at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_sort_writer.cpp:72
   10# doris::VIcebergTableWriter::_write_prepared_block(doris::Block&) at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_table_writer.cpp:329
   11# doris::VIcebergTableWriter::write(doris::RuntimeState*, doris::Block&) 
at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/exec/sink/writer/iceberg/viceberg_table_writer.cpp:213
   12# doris::AsyncResultWriter::process_block(doris::RuntimeState*, 
doris::RuntimeProfile*) in /ngs/app/doris/doris-current/be/lib/doris_be
   13# std::_Function_handler<void (), 
doris::AsyncResultWriter::start_writer(doris::RuntimeState*, 
doris::RuntimeProfile*)::$_0>::_M_invoke(std::_Any_data const&) at 
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
   14# doris::ThreadPool::dispatch_thread() at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/util/threadpool.cpp:623
   15# doris::Thread::supervise_thread(void*) at 
/home/zcp/repo_center/doris_tmp-branch-4.1.0-oss-native-sdk/doris/be/src/util/thread.cpp:461
   16# start_thread in /lib64/[libpthread.so](http://libpthread.so/).0
   17# __clone in /lib64/[libc.so](http://libc.so/).6
   ```
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to