github-actions[bot] commented on code in PR #64779:
URL: https://github.com/apache/doris/pull/64779#discussion_r3465629486


##########
be/src/io/fs/s3_file_writer.cpp:
##########
@@ -152,13 +152,35 @@ Status S3FileWriter::close(bool non_block) {
         _async_close_pack = std::make_unique<AsyncCloseStatusPack>();
         _async_close_pack->future = _async_close_pack->promise.get_future();
         s3_file_writer_async_close_queuing << 1;
-        return 
ExecEnv::GetInstance()->non_block_close_thread_pool()->submit_func([&]() {
+        Status submit_status = Status::OK();
+        DBUG_EXECUTE_IF("S3FileWriter.close.submit_async_close.inject_error", {
+            submit_status = 
Status::IOError("S3FileWriter.close.submit_async_close.inject_error");
+        });
+        if (submit_status.ok()) {
+            submit_status =
+                    
ExecEnv::GetInstance()->non_block_close_thread_pool()->submit_func([&]() {
+                        s3_file_writer_async_close_queuing << -1;
+                        s3_file_writer_async_close_processing << 1;
+                        _st = _close_impl();
+                        _async_close_pack->promise.set_value(_st);
+                        s3_file_writer_async_close_processing << -1;
+                    });
+        }
+        if (!submit_status.ok()) {

Review Comment:
   Blocking: this fallback assumes that a non-OK result from `submit_func()` 
means the close lambda was not retained. That is not guaranteed by 
`ThreadPool::do_submit`: it moves the runnable into `token->_entries`, may 
enqueue the token, and increments `_total_queued_tasks` before calling 
`create_thread()`. If that lazy thread creation fails while the pool has no 
workers, `do_submit` returns the error without removing the queued task. This 
path is reachable because thread-pool initialization deliberately ignores 
min-thread creation failures.
   
   Here the fallback clears `_async_close_pack`, restores the state, and runs 
`_close_impl()` synchronously. If a later submit creates a worker, the 
previously queued lambda can still run, capture this writer, and dereference 
the now-null `_async_close_pack` or race with/destruct after the synchronous 
close. The new `_submit_upload_buffer()` helper has the same assumption when it 
calls `_complete_part_task_callback(st)` after a failed submit even though a 
retained `UploadFileBuffer` can still later execute callbacks that capture this 
writer.
   
   Please either make the submit path used here guarantee that non-OK means the 
runnable was not retained, for example by rolling back the queued task on 
post-enqueue worker-creation failure, or use a different API/contract before 
manually completing the task and falling back to synchronous close.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to