On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
> Because of writer lock stealing, it is possible that a constant
> stream of incoming writers will cause a waiting writer or reader to
> wait indefinitely leading to lock starvation.
> 
> The mutex code has a lock handoff mechanism to prevent lock starvation.
> This patch implements a similar lock handoff mechanism to disable
> lock stealing and force lock handoff to the first waiter in the queue
> after at least a 5ms waiting period. The waiting period is used to
> avoid discouraging lock stealing too much to affect performance.

I would say the handoff it not at all similar to the mutex code. It is
in fact radically different.

> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>               adjustment = RWSEM_READER_BIAS;
>               oldcount = atomic_long_fetch_add(adjustment, &sem->count);
>               if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
> +                     /*
> +                      * Initiate handoff to reader, if applicable.
> +                      */
> +                     if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
> +                         time_after(jiffies, waiter->timeout)) {
> +                             adjustment -= RWSEM_FLAG_HANDOFF;
> +                             lockevent_inc(rwsem_rlock_handoff);
> +                     }
> +
>                       atomic_long_sub(adjustment, &sem->count);
>                       return;
>               }

That confuses the heck out of me...

The above seems to rely on __rwsem_mark_wake() to be fully serialized
(and it is, by ->wait_lock, but that isn't spelled out anywhere) such
that we don't get double increment of FLAG_HANDOFF.

So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:

  CPU0                                  CPU1

  oldcount = atomic_long_fetch_add(adjustment, &sem->count)

                                        oldcount = 
atomic_long_fetch_add(adjustment, &sem->count)

  if (!(oldcount & HANDOFF))
    adjustment -= HANDOFF;

                                        if (!(oldcount & HANDOFF))
                                          adjustment -= HANDOFF;
  atomic_long_sub(adjustment)
                                        atomic_long_sub(adjustment)


*whoops* double negative decrement of HANDOFF (aka double increment).


However there is another site that fiddles with the HANDOFF bit, namely
__rwsem_down_write_failed_common(), and that does:

+                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);

_OUTSIDE_ of ->wait_lock, which would yield:

  CPU0                                  CPU1

  oldcount = atomic_long_fetch_add(adjustment, &sem->count)

                                        atomic_long_or(HANDOFF)

  if (!(oldcount & HANDOFF))
    adjustment -= HANDOFF;

  atomic_long_sub(adjustment)

*whoops*, incremented HANDOFF on HANDOFF.


And there's not a comment in sight that would elucidate if this is
possible or not.


Also:

+                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
+                               first++;
+
+                               /*
+                                * Make sure the handoff bit is seen by
+                                * others before proceeding.
+                                */
+                               smp_mb__after_atomic();

That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
visible'. There needs to be order between two memops on both sides.

Reply via email to