On 03/26/2013 10:27 AM, Peter Zijlstra wrote:
On Tue, 2013-03-26 at 06:40 -0700, Michel Lespinasse wrote:
sem_nsems is user provided as the array size in some semget system
call. It's the size of an ipc semaphore array.
So we're basically adding a random (big) number to preempt_count
(obviously while preemption is disabled), seems rather costly and
undesirable.
>
complex semop operations take the array's lock plus every semaphore
locks; simple semop operations (operating on a single semaphore) only
take that one semaphore's lock.
Right, standard global/local lock like stuff. Is there a way we can add
a r/o test to the 'local' lock operation and avoid doing the above?
That makes me wonder, how did mm_take_all_locks used to work before
we turned the anon_vma lock into a mutex?
The code used to use spin_lock_nest_lock, but still has the potential
to overflow the preempt counter. How did that ever work right?
Maybe something like:
void sma_lock(struct sem_array *sma) /* global */
{
int i;
sma->global_locked = 1;
smp_wmb(); /* can we merge with the LOCK ? */
spin_lock(&sma->global_lock);
/* wait for all local locks to go away */
for (i = 0; i < sma->sem_nsems; i++)
spin_unlock_wait(&sem->sem_base[i]->lock);
}
void sma_lock_one(struct sem_array *sma, int nr) /* local */
{
smp_rmb(); /* pairs with wmb in sma_lock() */
if (unlikely(sma->global_locked)) { /* wait for global lock */
while (sma->global_locked)
spin_unlock_wait(&sma->global_lock);
}
spin_lock(&sma->sem_base[nr]->lock);
}
That is essentially a read-only version of the global rwlock that
I originally proposed, where the global lock takes the lock for
write and the single version takes the global lock for read, and
then one of the semaphore spinlocks.
I could certainly implement and test the above, unless Linus
thinks it's too ugly to live :)
This still has the problem of a non-preemptible section of O(sem_nsems)
(with the avg wait-time on the local lock). Could we make the global
lock a sleeping lock?
Not without breaking your scheme above :)
I suppose making things into a sleeping lock should be possible,
but that is another major change in this code. I would rather do
things in smaller steps...
--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/