On Wed, 23 Jan 2008, Peter Zijlstra wrote: > > It would need some addition piece to not call msync_interval() for > MS_SYNC, and remove the balance_dirty_pages_ratelimited_nr() stuff. > > But yeah, this pte walker is much better.
Actually, I think this patch is much better. Anyway, it's better because: - it actually honors the range - it uses the same code for MS_ASYNC and MS_SYNC - it just avoids doing the "wait for" for MS_ASYNC. However, it's totally untested, of course. What did you expect? Clean code _and_ testing? [ Side note: it is quite possible that we should not do the SYNC_FILE_RANGE_WAIT_BEFORE on MS_ASYNC, and just skip over pages that are busily under writeback already. Whatever. There are probably other problems here too, so consider this a "Hey, wouldn't something like this work really well?" patch rather than something final. ] Just to get peoples creative juices going, here's my suggested patch. Linus --- mm/msync.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 48 insertions(+), 4 deletions(-) diff --git a/mm/msync.c b/mm/msync.c index 144a757..a7a2ea4 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -10,10 +10,37 @@ #include <linux/fs.h> #include <linux/mm.h> #include <linux/mman.h> +#include <linux/pagemap.h> #include <linux/file.h> #include <linux/syscalls.h> #include <linux/sched.h> +static int msync_range(struct file *file, loff_t start, unsigned long len, unsigned int sync) +{ + int ret; + struct address_space *mapping = file->f_mapping; + loff_t end = start + len - 1; + + ret = do_sync_mapping_range(mapping, start, end, + SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_BEFORE); + + if (ret || !sync) + return ret; + + if (file->f_op && file->f_op->fsync) { + mutex_lock(&mapping->host->i_mutex); + ret = file->f_op->fsync(file, file->f_path.dentry, 0); + mutex_unlock(&mapping->host->i_mutex); + + if (ret < 0) + return ret; + } + + return wait_on_page_writeback_range(mapping, + start >> PAGE_CACHE_SHIFT, + end >> PAGE_CACHE_SHIFT); +} + /* * MS_SYNC syncs the entire file - including mappings. * @@ -77,18 +104,35 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) goto out_unlock; } file = vma->vm_file; - start = vma->vm_end; - if ((flags & MS_SYNC) && file && - (vma->vm_flags & VM_SHARED)) { + + if (file && (vma->vm_flags & VM_SHARED)) { + loff_t offset; + unsigned long len; + + /* + * We need to do all of this before we release the mmap_sem, + * since "vma" isn't available after that. + */ + offset = start - vma->vm_start; + offset += vma->vm_pgoff << PAGE_SHIFT; + len = end; + if (len > vma->vm_end) + len = vma->vm_end; + len -= start; + + /* Update start here, since vm_end will be gone too.. */ + start = vma->vm_end; get_file(file); up_read(&mm->mmap_sem); - error = do_fsync(file, 0); + + error = msync_range(file, offset, len, flags & MS_SYNC); fput(file); if (error || start >= end) goto out; down_read(&mm->mmap_sem); vma = find_vma(mm, start); } else { + start = vma->vm_end; if (start >= end) { error = 0; goto out_unlock; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/