github-actions[bot] commented on code in PR #63295:
URL: https://github.com/apache/doris/pull/63295#discussion_r3315965235


##########
cloud/src/recycler/recycler.cpp:
##########
@@ -5152,18 +5208,123 @@ int InstanceRecycler::recycle_rowsets() {
                 LOG(WARNING) << "failed to delete recycle rowset kv, 
instance_id=" << instance_id_;
                 return;
             }
+
             num_recycled.fetch_add(rowset_keys_to_delete.size(), 
std::memory_order_relaxed);
         });
+    };
+
+    auto submit_mark_abort_rowset_job = [&](std::vector<std::string> 
rowset_keys_to_mark,
+                                            std::vector<std::string> 
rowset_keys_to_abort) {
+        if (rowset_keys_to_mark.empty() && rowset_keys_to_abort.empty()) {
+            return;
+        }
+        mark_abort_worker_pool->submit([&, rowset_keys_to_mark = 
std::move(rowset_keys_to_mark),
+                                        rowset_keys_to_abort =
+                                                
std::move(rowset_keys_to_abort)]() mutable {
+            auto start = steady_clock::now();
+            DORIS_CLOUD_DEFER {
+                auto cost = duration_cast<milliseconds>(steady_clock::now() - 
start).count();
+                LOG(INFO) << "finish mark and abort rowset job, instance_id=" 
<< instance_id_ << ' '
+                          << "cost_ms=" << cost << ' '
+                          << "rowset_keys_to_mark.size()=" << 
rowset_keys_to_mark.size() << ' '
+                          << "rowset_keys_to_abort.size()=" << 
rowset_keys_to_abort.size();
+            };
+            if (!rowset_keys_to_mark.empty() &&
+                batch_mark_rowsets_as_recycled<RecycleRowsetPB>(txn_kv_.get(), 
instance_id_,
+                                                                
rowset_keys_to_mark) != 0) {
+                LOG(WARNING) << "failed to batch mark rowsets as recycled, 
instance_id="
+                             << instance_id_ << ' '
+                             << "rowset_keys_to_mark.size()=" << 
rowset_keys_to_mark.size();
+                return;
+            }
+            if (!rowset_keys_to_abort.empty() &&
+                
batch_abort_txn_or_job_for_recycle<RecycleRowsetPB>(rowset_keys_to_abort, true) 
!=
+                        0) {
+                LOG(WARNING) << "failed to batch abort txn or job for related 
rowset, "
+                                "instance_id="
+                             << instance_id_ << ' '
+                             << "rowset_keys_to_abort.size()=" << 
rowset_keys_to_abort.size();
+                return;
+            }
+        });
+    };
+
+    bool scan_finished = false;
+    auto loop_done = [&]() -> int {
+        std::vector<std::string> mark_keys_to_process;
+        std::vector<std::string> abort_keys_to_process;
+        mark_keys_to_process.swap(rowset_keys_to_mark_recycled);
+        abort_keys_to_process.swap(rowset_keys_to_abort);
+        submit_mark_abort_rowset_job(std::move(mark_keys_to_process),
+                                     std::move(abort_keys_to_process));

Review Comment:
   This changes the ordering that protected PREPARE recycle rowsets. 
`handle_rowset_kv` still queues keys in `rowset_keys_to_abort`, but for 
`RecycleRowsetPB::PREPARE` it now calls `delete_rowset_data_by_prefix(...)` 
immediately while the abort is only submitted here to `mark_abort_worker_pool`; 
for non-PREPARE rows the delete jobs below can also run in `worker_pool` before 
this separate pool finishes. The previous code ran mark, abort, prepare-task 
collection, and deletion sequentially inside one worker job, so load rowsets 
aborted their txn and compaction/schema-change rowsets aborted their job before 
any physical files were removed. With this race, a failed abort can leave 
metadata referring to rowset data that has already been deleted. Please keep 
abort-before-delete in the same execution chain (or otherwise wait for the 
abort/mark batch to finish successfully before submitting any delete job for 
those keys).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to