On Tue, Jan 05, 2016 at 12:23:55PM -0800, Davidlohr Bueso wrote: > +++ b/kernel/futex.c > @@ -520,7 +520,18 @@ again: > else > err = 0; > > - lock_page(page); > + /* > + * The treatment of mapping from this point on is critical. The page > + * lock protects many things but in this context the page lock > + * stabilises mapping, prevents inode freeing in the shared > + * file-backed region case and guards against movement to swap cache.
A little extra whitespace separating this into paragraphs might help readability. > + * Strictly speaking the page lock is not needed in all cases being > + * considered here and page lock forces unnecessarily serialisation. > + * From this point on, mapping will be reverified if necessary and > + * page lock will be acquired only if it is unavoiable. > + */ > + mapping = READ_ONCE(compound_head(page)->mapping); > + > /* > * If page->mapping is NULL, then it cannot be a PageAnon > * page; but it might be the ZERO_PAGE or in the gate area or > + if (unlikely(!mapping)) { > + int shmem_swizzled; > + > + /* > + * Page lock is required to identify which special case above > + * applies. If this is really a shmem page then the page lock > + * will prevent unexpected transitions. > + */ > + lock_page(page); > + shmem_swizzled = PageSwapCache(page); > unlock_page(page); > put_page(page); > + WARN_ON_ONCE(mapping); We've not re-loaded mapping, so how could this possibly be? > + /* > + * Take a reference unless it is about to be freed. Previously > + * this reference was taken by ihold under the page lock > + * pinning the inode in place so i_lock was unnecessary. The > + * only way for this check to fail is if the inode was > + * truncated in parallel so warn for now if this happens. > + * > + * TODO: VFS and/or filesystem people should review this check > + * and see if there is a safer or more reliable way to do this. > + */ > + if (WARN_ON(!atomic_inc_not_zero(&inode->i_count))) { > + rcu_read_unlock(); > + put_page(page); > + goto again; > + } > + > + /* > + * get_futex_key() must imply MB (B) and we are not going to > + * call into get_futex_key_refs() at this point. > + */ > + smp_mb__after_atomic(); I don't get this one, the above is a successful atomic op with return value, that _must_ imply a full barrier. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/