Hi, Robert L Mathews wrote:
> Unfortunately, I'm not able to reproduce the original problem on a > test system for some reason (and I tried quite hard). It only > happens on one of my production servers, but I can't experiment on > that one. Sorry about that. :-( Thanks for the update, and no problem. What filesystem options does your test system use (as shown in /proc/mounts or by dumpe2fs -h <bdev>)? How do you set up quotas on it? Looking closer, straightforward testing of that one patch would probably not reveal much anyway --- any subtle effects wouldn't necessarily be triggered often. Pointers that might help someone review the code: http://thread.gmane.org/gmane.comp.file-systems.ext4/19421/focus=19462 http://thread.gmane.org/gmane.comp.file-systems.ext4/19421/focus=19470 I suspect applying the patch to 2.6.32.y is safe even in edge cases. Summary of the discussion: Ted said | In the long-term I | think the quota file should become a special file much like the | journal, and then this makes a huge amount of sense. Jan addressed his fears by clarifying that the quota file is immutable when quotas are on: | Ted, that's what generic quota code actually does for you (unless | DQUOT_QUOTA_SYS_FILE flag is specified but that's not the case of | ext?) | - see vfs_load_quota_inode. The basic rules about quota have not changed fundamentally between 2.6.32 and 2.6.36, so this looks like a good and safe patch for squeeze. To avoid tempting fate, here is a list including some other fixes that could go with it. I haven't tried testing this at all yet. Comments (including test results) welcome. - v2.6.34-rc1~192^2~11 "quota: Properly invalidate caches even for filesystems with blocksize < pagesize", 2010-02-22 - v2.6.36-rc1~501^2~15 "ext4: Always journal quota file modifications", 2010-07-27 - v2.6.36-rc1~501^2~7 "ext4: force block allocation on quota_off", 2010-08-01, and v2.6.37-rc2~75^2~2 "ext4: do not try to grab the s_umount semaphore in ext4_quota_off", 2010-11-08 - v2.6.33-rc1~311^2~14 "quota: Fix WARN_ON in lookup_one_len", 2009-11-12 - 6b6dc836a "vfs: Provide function to get superblock and wait for it to thaw", 2012-02-10, and dcdbed853d9f "quota: Fix deadlock with suspend and quotas", 2012-02-10 fs/ext4/super.c | 32 +++++++++++++++++--------------- fs/quota/dquot.c | 14 +++++++++----- fs/quota/quota.c | 24 +++++++++++++++++++++--- fs/super.c | 22 ++++++++++++++++++++++ include/linux/fs.h | 1 + 5 files changed, 70 insertions(+), 23 deletions(-)
From: Jan Kara <j...@suse.cz> Date: Mon, 22 Feb 2010 21:07:17 +0100 Subject: quota: Properly invalidate caches even for filesystems with blocksize < pagesize commit ab94c39b6fa076d4f6d2903dcc54cda35d938776 upstream. Sometimes invalidate_bdev() can fail to invalidate a part of block device cache because of dirty data. If the filesystem has blocksize smaller than page size, this can happen even for pages containing quota files and thus kernel would operate on stale data. Fix the issue by syncing the filesystem before invalidating the cache. Reviewed-by: Christoph Hellwig <h...@lst.de> Signed-off-by: Jan Kara <j...@suse.cz> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/quota/dquot.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index ce9a4f22c1dd..d06ee1f12c8b 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2087,11 +2087,13 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, } if (!(dqopt->flags & DQUOT_QUOTA_SYS_FILE)) { - /* As we bypass the pagecache we must now flush the inode so - * that we see all the changes from userspace... */ - write_inode_now(inode, 1); - /* And now flush the block cache so that kernel sees the - * changes */ + /* As we bypass the pagecache we must now flush all the + * dirty data and invalidate caches so that kernel sees + * changes from userspace. It is not enough to just flush + * the quota file since if blocksize < pagesize, invalidation + * of the cache could fail because of other unrelated dirty + * data */ + sync_filesystem(sb); invalidate_bdev(sb->s_bdev); } mutex_lock(&dqopt->dqonoff_mutex); -- 1.7.9
From: Jan Kara <j...@suse.cz> Date: Tue, 27 Jul 2010 11:56:07 -0400 Subject: ext4: Always journal quota file modifications commit 62d2b5f2dcd3707b070efb16bbfdf6947c38c194 upstream. When journaled quota options are not specified, we do writes to quota files just in data=ordered mode. This actually causes warnings from JBD2 about dirty journaled buffer because ext4_getblk unconditionally treats a block allocated by it as metadata. Since quota actually is filesystem metadata, the easiest way to get rid of the warning is to always treat quota writes as metadata... Signed-off-by: Jan Kara <j...@suse.cz> Signed-off-by: "Theodore Ts'o" <ty...@mit.edu> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/ext4/super.c | 19 +++++-------------- 1 files changed, 5 insertions(+), 14 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index f1e7077a06a3..c53a60742cca 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3926,7 +3926,6 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, int err = 0; int offset = off & (sb->s_blocksize - 1); int tocopy; - int journal_quota = EXT4_SB(sb)->s_qf_names[type] != NULL; size_t towrite = len; struct buffer_head *bh; handle_t *handle = journal_current_handle(); @@ -3944,24 +3943,16 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, bh = ext4_bread(handle, inode, blk, 1, &err); if (!bh) goto out; - if (journal_quota) { - err = ext4_journal_get_write_access(handle, bh); - if (err) { - brelse(bh); - goto out; - } + err = ext4_journal_get_write_access(handle, bh); + if (err) { + brelse(bh); + goto out; } lock_buffer(bh); memcpy(bh->b_data+offset, data, tocopy); flush_dcache_page(bh->b_page); unlock_buffer(bh); - if (journal_quota) - err = ext4_handle_dirty_metadata(handle, NULL, bh); - else { - /* Always do at least ordered writes for quotas */ - err = ext4_jbd2_file_inode(handle, inode); - mark_buffer_dirty(bh); - } + err = ext4_handle_dirty_metadata(handle, NULL, bh); brelse(bh); if (err) goto out; -- 1.7.9
From: Dmitry Monakhov <dmonak...@openvz.org> Date: Sun, 1 Aug 2010 17:48:36 -0400 Subject: ext4: force block allocation on quota_off commit ca0e05e4b15193aeba72b995e90de990db7f8304 upstream. Perform full sync procedure so that any delayed allocation blocks are allocated so quota will be consistent. Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org> Signed-off-by: "Theodore Ts'o" <ty...@mit.edu> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/ext4/super.c | 15 ++++++++++++++- 1 files changed, 14 insertions(+), 1 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c53a60742cca..95ae5b658cbd 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -995,6 +995,7 @@ static int ext4_mark_dquot_dirty(struct dquot *dquot); static int ext4_write_info(struct super_block *sb, int type); static int ext4_quota_on(struct super_block *sb, int type, int format_id, char *path, int remount); +static int ext4_quota_off(struct super_block *sb, int type, int remount); static int ext4_quota_on_mount(struct super_block *sb, int type); static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data, size_t len, loff_t off); @@ -1026,7 +1027,7 @@ static const struct dquot_operations ext4_quota_operations = { static const struct quotactl_ops ext4_qctl_operations = { .quota_on = ext4_quota_on, - .quota_off = vfs_quota_off, + .quota_off = ext4_quota_off, .quota_sync = vfs_quota_sync, .get_info = vfs_get_dqinfo, .set_info = vfs_set_dqinfo, @@ -3876,6 +3877,18 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, return err; } +static int ext4_quota_off(struct super_block *sb, int type, int remount) +{ + /* Force all delayed allocation blocks to be allocated */ + if (test_opt(sb, DELALLOC)) { + down_read(&sb->s_umount); + sync_filesystem(sb); + up_read(&sb->s_umount); + } + + return vfs_quota_off(sb, type, remount); +} + /* Read data from quotafile - avoid pagecache and such because we cannot afford * acquiring the locks... As quota files are never truncated and quota code * itself serializes the operations (and noone else should touch the files) -- 1.7.9
From: Dmitry Monakhov <dmonak...@openvz.org> Date: Mon, 8 Nov 2010 13:47:33 -0500 Subject: ext4: do not try to grab the s_umount semaphore in ext4_quota_off commit 87009d86dc045d228e21242467a67a5f99347553 upstream. It's not needed to sync the filesystem, and it fixes a lock_dep complaint. [jrnie...@gmail.com: s_umount does need to be held in sync_filesystem. The reason not to grab it in ext4_quota_off is that quotactl has already acquired s_umount by calling get_super.] Signed-off-by: Dmitry Monakhov <dmonak...@gmail.com> Signed-off-by: "Theodore Ts'o" <ty...@mit.edu> Reviewed-by: Jan Kara <j...@suse.cz> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/ext4/super.c | 8 +++----- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 95ae5b658cbd..302304b82f7b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3879,12 +3879,10 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, static int ext4_quota_off(struct super_block *sb, int type, int remount) { - /* Force all delayed allocation blocks to be allocated */ - if (test_opt(sb, DELALLOC)) { - down_read(&sb->s_umount); + /* Force all delayed allocation blocks to be allocated. + * Caller already holds s_umount sem */ + if (test_opt(sb, DELALLOC)) sync_filesystem(sb); - up_read(&sb->s_umount); - } return vfs_quota_off(sb, type, remount); } -- 1.7.9
From: Jan Kara <j...@suse.cz> Date: Thu, 12 Nov 2009 15:42:08 +0100 Subject: quota: Fix WARN_ON in lookup_one_len commit c56818d7dc976a7392be82e8e04fe26347d591f3 upstream. We should hold i_mutex when looking up quota files for journaled quotas, otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't hold i_mutex previously probably could not lead to a real bug since the filesystem is just being mounted / remounted read-write and thus the root directory cannot change anyway but it's definitely cleaner with i_mutex. Reported-by: Bastien ROUCARIES <roucaries.bast...@gmail.com> Signed-off-by: Jan Kara <j...@suse.cz> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/quota/dquot.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index d06ee1f12c8b..fc061714ead7 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2287,7 +2287,9 @@ int vfs_quota_on_mount(struct super_block *sb, char *qf_name, struct dentry *dentry; int error; + mutex_lock(&sb->s_root->d_inode->i_mutex); dentry = lookup_one_len(qf_name, sb->s_root, strlen(qf_name)); + mutex_unlock(&sb->s_root->d_inode->i_mutex); if (IS_ERR(dentry)) return PTR_ERR(dentry); -- 1.7.9
From: Jan Kara <j...@suse.cz> Date: Fri, 10 Feb 2012 11:03:00 +0100 Subject: vfs: Provide function to get superblock and wait for it to thaw commit 6b6dc836a195e077e76977b6c020a73de411b46d upstream. In quota code we need to find a superblock corresponding to a device and wait for superblock to be unfrozen. However this waiting has to happen without s_umount semaphore because that is required for superblock to thaw. So provide a function in VFS for this to keep dances with s_umount where they belong. [AV: implementation switched to saner variant] Signed-off-by: Jan Kara <j...@suse.cz> Signed-off-by: Al Viro <v...@zeniv.linux.org.uk> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/super.c | 22 ++++++++++++++++++++++ include/linux/fs.h | 1 + 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index aff046b0fe78..b1ea409e5f4d 100644 --- a/fs/super.c +++ b/fs/super.c @@ -467,6 +467,28 @@ rescan: EXPORT_SYMBOL(get_super); /** + * get_super_thawed - get thawed superblock of a device + * @bdev: device to get the superblock for + * + * Scans the superblock list and finds the superblock of the file system + * mounted on the device. The superblock is returned once it is thawed + * (or immediately if it was not frozen). %NULL is returned if no match + * is found. + */ +struct super_block *get_super_thawed(struct block_device *bdev) +{ + while (1) { + struct super_block *s = get_super(bdev); + if (!s || s->s_frozen == SB_UNFROZEN) + return s; + up_read(&s->s_umount); + vfs_check_frozen(s, SB_FREEZE_WRITE); + put_super(s); + } +} +EXPORT_SYMBOL(get_super_thawed); + +/** * get_active_super - get an active reference to the superblock of a device * @bdev: device to get the superblock for * diff --git a/include/linux/fs.h b/include/linux/fs.h index 1b9a47abb014..bcbd8576cece 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2348,6 +2348,7 @@ extern void get_filesystem(struct file_system_type *fs); extern void put_filesystem(struct file_system_type *fs); extern struct file_system_type *get_fs_type(const char *name); extern struct super_block *get_super(struct block_device *); +extern struct super_block *get_super_thawed(struct block_device *); extern struct super_block *get_active_super(struct block_device *bdev); extern struct super_block *user_get_super(dev_t); extern void drop_super(struct super_block *sb); -- 1.7.9
From: Jan Kara <j...@suse.cz> Date: Fri, 10 Feb 2012 11:03:01 +0100 Subject: quota: Fix deadlock with suspend and quotas commit dcdbed853d9fbb0547b781ba676049b87f54129a upstream. This script causes a kernel deadlock: set -e DEVICE=/dev/vg1/linear lvchange -ay $DEVICE mkfs.ext3 $DEVICE mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test quotacheck -gu /mnt/test umount /mnt/test mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test quotaon /mnt/test dmsetup suspend $DEVICE setquota -u root 1 2 3 4 /mnt/test & sleep 1 dmsetup resume $DEVICE setquota acquired semaphore s_umount for read and then tried to perform a transaction (and waits because the device is suspended). dmsetup resume tries to acquire s_umount for write before resuming the device (and waits for setquota). Fix the deadlock by grabbing a thawed superblock for quota commands which need it. Reported-by: Mikulas Patocka <mpato...@redhat.com> Signed-off-by: Jan Kara <j...@suse.cz> Signed-off-by: Al Viro <v...@zeniv.linux.org.uk> Signed-off-by: Jonathan Nieder <jrnie...@gmail.com> --- fs/quota/quota.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 95c5b42384b2..ea5f543c3319 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -351,11 +351,26 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id, return 0; } +/* Return 1 if 'cmd' will block on frozen filesystem */ +static int quotactl_cmd_write(int cmd) +{ + switch (cmd) { + case Q_GETFMT: + case Q_GETINFO: + case Q_SYNC: + case Q_XGETQSTAT: + case Q_XGETQUOTA: + case Q_XQUOTASYNC: + return 0; + } + return 1; +} + /* * look up a superblock on which quota ops will be performed * - use the name of a block device to find the superblock thereon */ -static struct super_block *quotactl_block(const char __user *special) +static struct super_block *quotactl_block(const char __user *special, int cmd) { #ifdef CONFIG_BLOCK struct block_device *bdev; @@ -368,7 +383,10 @@ static struct super_block *quotactl_block(const char __user *special) putname(tmp); if (IS_ERR(bdev)) return ERR_CAST(bdev); - sb = get_super(bdev); + if (quotactl_cmd_write(cmd)) + sb = get_super_thawed(bdev); + else + sb = get_super(bdev); bdput(bdev); if (!sb) return ERR_PTR(-ENODEV); @@ -396,7 +414,7 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special, type = cmd & SUBCMDMASK; if (cmds != Q_SYNC || special) { - sb = quotactl_block(special); + sb = quotactl_block(special, cmds); if (IS_ERR(sb)) return PTR_ERR(sb); } -- 1.7.9