On Tue, Jan 16, 2001 at 11:04:45AM -0800, Theodore Y. Ts'o wrote:
> 
> HJ Lu recently pointed me at a potential locking problem
> try_to_free_inodes(), and when I started proding at it, I found what
> appears to be another set of SMP locking issues in the dcache code.
> (But if that were the case, why aren't we seeing huge numbers of
> complaints?  So I'm wondering if I'm missing something.)

Because the code is correct ;). It is infact a fix and it took some time to fix
such bug in mid 2.2.x.

> 
> Anyway, the first problem which HJ pointed out is in
> try_to_free_inodes() which attempts to implement a mutual exclusion with
> respect to itself as follows....
> 
>       if (block)
>       {
>               struct wait_queue __wait;
> 
>               __wait.task = current;
>               add_wait_queue(&wait_inode_freeing, &__wait);
>               for (;;)
>               {
>                       /* NOTE: we rely only on the inode_lock to be sure
>                          to not miss the unblock event */
>                       current->state = TASK_UNINTERRUPTIBLE;
>                       spin_unlock(&inode_lock);
                        ^^^^^^^^^^^^^^^^^^^^^^^^
>                       schedule();
>                       spin_lock(&inode_lock);
                        ^^^^^^^^^^^^^^^^^^^^^^^^
>                       if (!block)
>                               break;
>               }
>               remove_wait_queue(&wait_inode_freeing, &__wait);
>               current->state = TASK_RUNNING;
>       }
>       block = 1;
> 
> Of course, this is utterly unsafe on an SMP machines, since access to
> the "block" variable isn't protected at all.  So the first question is

Wrong, it's obviously protected by the inode_lock. And even if it wasn't
protected by the inode_lock in 2.2.x inode.c and dcache.c runs globally
serialized by the BKL (but it is obviously protected regardless of the BKL).

> why did whoever implemented this do it in this horribly complicated way
> above, instead of something simple, say like this:
> 
>       static struct semaphore block = MUTEX;
>       if (down_trylock(&block)) {
>               spin_unlock(&inode_lock);
>               down(&block);
>               spin_lock(&inode_lock);
>       }

The above is overkill (there's no need to use further atomic API, when we can
rely on the inode_lock for the locking. It's overcomplex and slower.

> (with the appropriate unlocking code at the end of the function).
> 
> 
> Next question.... why was this there in the first place?  After all,

To fix the "inode-max limit reached" faliures that you could reproduce on
earlier 2.2.x. (the freed inodes was re-used before the task that freed them
had a chance to allocate them for itself)

> most of the time try_to_free_inodes() is called with the inode_lock
> spinlock held, which would act as a automatic mutual exclusion anyway.

> The only time this doesn't happen is when we call prune_dcache(), where
> inode_lock is temporarily dropped.

Wrong:

                spin_unlock(&inode_lock);
                prune_dcache(0, goal);
                spin_lock(&inode_lock);
                sync_all_inodes();
                __free_inodes(&throw_away);

the above code obviously drops the spinlock for doing things like flushing the
dirty inodes to the buffer cache that can block in balance_dirty() etc...
(and that's the real problem because it sleeps so also the BKL gets released
while prune_dcache in practice could not race because of the BKL)

> So I took a look at prune_dcache(), and discovered that (a) it's called
> from multiple places, and (b) it and shrink_dcache_sb() both iterate over
> dentry_unused and among other things, tried to free dcache structures
> without any kind of locking to prevent two kernel threads to
> from mucking with the contents of dentry_unused at the same time, and
> possibly having prune_one_dentry() being called by two processes on the
> same dentry structure.  This should almost certainly cause problems.

we're running under the BKL all over the place in 2.2.x so they can't race.

> So the following patch I think is definitely necessary, assuming that

The patch is definitely not necessary.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to