Re,
[email protected], le mar. 03 mars 2026 04:08:15 +0000, a ecrit:
> 3. rwlock timeout/wakeup races (H-3, H-4): When a timed rwlock
> operation times out simultaneously with being woken by an unlock,
> reader counts or writer ownership leak, permanently locking the
> rwlock.
Indeed!
> This could explain rumpnet hangs.
Errr... rumpnet doesn't use pthread_rwlock_*
> 3. **rwlock timeout/wakeup race** (H-3, H-4): When a timed rwlock operation
> times out simultaneously with being woken by an unlock, reader counts leak
> or writer ownership leaks, permanently locking the rwlock.
> #### H-3: rwlock timedrdlock: reader count leak on timeout/wakeup race
> (CRITICAL)
>
> **Files**: `sysdeps/htl/pt-rwlock-timedrdlock.c:86-109`,
> `sysdeps/htl/pt-rwlock-unlock.c:66-87`
>
> When a reader times out but was already dequeued and counted by the unlocking
> thread (which does `rwlock->__readers += n` for all queued readers), the
> timedrdlock returns `ETIMEDOUT` without decrementing `__readers`. The reader
> count is permanently inflated: the rwlock never appears fully unlocked again.
>
> **Scenario**: Thread A holds write lock. Thread B calls
> `pthread_rwlock_timedrdlock`
> and queues. B's timeout fires. Concurrently, A unlocks, dequeues B, increments
> `__readers`. B wakes, sees `prevp == NULL` and `err != 0`, drains the Mach
> message, returns ETIMEDOUT. But `__readers` is now 1 with no actual reader.
>
> **Impact**: Permanent rwlock deadlock for writers. Could explain ext2fs hangs.
>
> #### H-3 Proposed Fix: Treat dequeue-by-unlocker as successful acquisition
>
> The bug is that `prevp == NULL && err != 0` returns ETIMEDOUT even though
> the unlocker already counted this thread as a reader. When `prevp == NULL`,
> the unlocker dequeued us and incremented `__readers` — we own the read lock
> regardless of whether we also timed out. Replace lines 86-116 with:
>
> ```c
> __pthread_spin_wait (&rwlock->__lock);
> if (self->prevp == NULL)
> {
> /* Dequeued by unlocker — we got the lock. Drain any stale wakeup. */
> __pthread_spin_unlock (&rwlock->__lock);
> if (err)
> __pthread_block (self); /* drain */
> assert (rwlock->__readers > 0);
> return 0;
> }
> else
> {
> /* Still queued — we timed out. Remove ourselves. */
> __pthread_dequeue (self);
> __pthread_spin_unlock (&rwlock->__lock);
> assert (err == ETIMEDOUT);
> return err;
> }
> ```
>
> #### H-4: rwlock timedwrlock: `__held` leak on timeout/wakeup race (CRITICAL)
>
> **File**: `sysdeps/htl/pt-rwlock-timedwrlock.c:72-99`
>
> Same pattern as H-3 but for writers. The unlocker transfers `__held` ownership
> to a queued writer by leaving `__held` locked. If the writer times out but was
> already dequeued, it returns ETIMEDOUT without releasing `__held`. The rwlock
> is permanently locked.
>
> **Impact**: Permanent deadlock. Any thread attempting to acquire the rwlock
> will
> block forever.
>
> #### H-4 Proposed Fix: Same pattern as H-3
>
> When `prevp == NULL`, the unlocker transferred `__held` ownership to us.
> Replace lines 72-99 with the same structure:
>
> ```c
> __pthread_spin_wait (&rwlock->__lock);
> if (self->prevp == NULL)
> {
> /* Dequeued by unlocker — we own __held. Drain any stale wakeup. */
> __pthread_spin_unlock (&rwlock->__lock);
> if (err)
> __pthread_block (self); /* drain */
> assert (rwlock->__readers == 0);
> return 0;
> }
> else
> {
> /* Still queued — we timed out. Remove ourselves. */
> __pthread_dequeue (self);
> __pthread_spin_unlock (&rwlock->__lock);
> assert (err == ETIMEDOUT);
> return err;
> }
> ```
Yep, fixed essentially that way (but keeping the existing comments...)
> ### rumpnet hangs
>
> Most likely: **H-3/H-4** (rwlock timeout race causing permanent deadlock).
> Network stacks use rwlocks extensively. A single timeout race permanently
> locks the rwlock, hanging all subsequent operations.
ditto.
> 3. **Fix H-3/H-4**: In the rwlock timedrdlock/timedwrlock timeout path, when
> `prevp == NULL`, treat as successful acquisition regardless of timeout.
Samuel