Re: [PR] [fix](cloud) Use bthread-aware shared mutex for tablet header lock [doris]

via GitHub Wed, 17 Jun 2026 23:19:08 -0700


github-actions[bot] commented on code in PR #64574:
URL: https://github.com/apache/doris/pull/64574#discussion_r3433520217



##########
be/src/storage/tablet/base_tablet.h:
##########
@@ -372,7 +373,7 @@ class BaseTablet : public 
std::enable_shared_from_this<BaseTablet> {
 
     Result<CaptureRowsetResult> _remote_capture_rowsets(const Version& 
version_range) const;
 
-    mutable std::shared_mutex _meta_lock;
+    mutable BthreadSharedMutex _meta_lock;
     TimestampedVersionTracker _timestamped_version_tracker;

Review Comment:
   Switching `_meta_lock` to this writer-preferring lock leaves existing 
recursive read-lock paths that can now self-deadlock. For example, 
`Tablet::calc_compaction_score()` takes `_meta_lock` in shared mode and then 
calls `get_real_compaction_score()`, which takes `_meta_lock` again. If a 
writer (`add_rowset`/`modify_rowsets`/cloud sync) queues between those two 
acquisitions, `BthreadSharedMutex::lock()` sets `_write_entered` and waits for 
the outer reader, while the inner `lock_shared()` waits for `_write_entered` to 
clear. The same pattern still exists in time-series cumulative scoring 
(`suitable_for_compaction()` -> `_calc_cumulative_compaction_score()` -> 
`pick_candidate_rowsets_to_cumulative_compaction()`) and cloud read-source 
capture when prefer-cache or freshness tolerance dispatches into 
`CloudTablet::capture_versions_*()` under the outer `capture_read_source()` 
lock. Please audit and remove the remaining nested read acquisitions, e.g. by 
adding unlocked helpers or by avoid
 ing the outer lock around virtual paths, before replacing the tablet header 
lock.



##########
be/test/util/bthread_shared_mutex_test.cpp:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "util/bthread_shared_mutex.h"
+
+#include <bthread/bthread.h>
+#include <gtest/gtest.h>
+
+#include <atomic>
+#include <shared_mutex>
+#include <thread>
+#include <vector>
+
+namespace doris {
+
+// 1) Basic SharedMutex contract. try_lock/try_lock_shared are non-blocking, so
+//    this can be checked single-threaded.
+TEST(BthreadSharedMutexTest, BasicContract) {
+    BthreadSharedMutex m;
+
+    // Multiple shared holders can coexist.
+    ASSERT_TRUE(m.try_lock_shared());
+    ASSERT_TRUE(m.try_lock_shared());
+    // A writer cannot acquire while any reader is held.
+    ASSERT_FALSE(m.try_lock());
+    m.unlock_shared();
+    ASSERT_FALSE(m.try_lock()); // still one reader
+    m.unlock_shared();
+
+    // Exclusive ownership.
+    ASSERT_TRUE(m.try_lock());
+    ASSERT_FALSE(m.try_lock());        // exclusive: a second writer fails
+    ASSERT_FALSE(m.try_lock_shared()); // a reader is blocked while the writer 
holds
+    m.unlock();
+
+    // Free again.
+    ASSERT_TRUE(m.try_lock_shared());
+    m.unlock_shared();
+}
+
+// 2) Writer-pending behavior of the two-gate algorithm: once a writer has set
+//    the write-entered bit and is waiting for existing readers to drain, new
+//    readers must block until the writer acquires and releases the lock.
+namespace {
+struct WriterPendingCtx {
+    BthreadSharedMutex* m;
+    std::atomic<bool> acquired {false};
+    std::atomic<bool> released {false};
+};
+
+void* writer_pending_fn(void* arg) {
+    auto* c = static_cast<WriterPendingCtx*>(arg);
+    c->m->lock(); // sets write-entered, then blocks until the reader drains
+    c->acquired.store(true);
+    bthread_usleep(20 * 1000);
+    c->m->unlock();
+    c->released.store(true);
+    return nullptr;
+}
+} // namespace
+
+TEST(BthreadSharedMutexTest, WriterPendingBlocksNewReaders) {
+    bthread_setconcurrency(8);
+    BthreadSharedMutex m;
+    m.lock_shared(); // reader R1 holds a shared lock
+
+    WriterPendingCtx ctx;
+    ctx.m = &m;
+    bthread_t w;
+    ASSERT_EQ(0, bthread_start_background(&w, nullptr, writer_pending_fn, 
&ctx));
+
+    // Give the writer time to reach lock() and set the write-entered bit.
+    bthread_usleep(100 * 1000);
+    ASSERT_FALSE(ctx.acquired.load()); // still waiting on R1 to drain

Review Comment:
   This assertion depends on a fixed 100 ms sleep being enough for the writer 
bthread to run and set `_write_entered`. On a slow ASAN/CI worker, if the 
writer has not reached `lock()` yet, `try_lock_shared()` can still succeed and 
make the test fail even though the mutex is correct. Please replace the fixed 
sleep with a bounded synchronization/probing loop, for example repeatedly 
probing `try_lock_shared()` (unlocking successful probes) until it observes the 
pending writer state or times out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [fix](cloud) Use bthread-aware shared mutex for tablet header lock [doris]

Reply via email to