sync: Per containter sync and syncfs and fs.fsync-enable sysctl

Andrey Zhadchenko Thu, 07 Oct 2021 07:59:31 -0700

From: Konstantin Khorenko <khore...@virtuozzo.com>

"sync/fsync" called from inside a Container might have different behavior.


Affects sys_sync, sys_fsync, sys_fdatasync, sys_sync_file_range
syscalls.
aio_fsync (sys_io_submit) not affected.

syncs cannot be disabled for ve0.
All values described below (even if set on ve0) affect veX behavior only.

Possible values for the Hardware Node:
======================================
0 (FSYNC_NEVER)         CT fsync and syncs are ignored
1 (FSYNC_ALWAYS)        CT fsync and syncs work as usual, all inodes
                        for all filesystem will be synced
2 (FSYNC_FILTERED)      CT fsync as usual, syncs only its file data
                        (only CT-relayed files and filesystems will be flushed)

Possible values inside a Container:
======================================
0                       CT fsync and syncs are ignored
2                       Use HN global value
any other value         Same as 2 (FSYNC_FILTERED)

Default kernel value (for both HN and CT): 2 (FSYNC_FILTERED).

=====================================================
ve/fs: Port fs.fsync-enable and fs.odirect_enable sysctls

This is a part of 74-diff-ve-mix-combined.

https://jira.sw.ru/browse/PSBM-17903
Signed-off-by: Kirill Tkhai <ktk...@parallels.com>

=====================================================
ve/fs: check container odirect and fsync settings in __dentry_open

sys_open for conventional filesystems doesn't call dentry_open,
it calls __dentry_open (in nameidata_to_filp), so we have to move
checks for odirect and fsync behaviour to __dentry_open
to make them working on ploop containers.

https://jira.sw.ru/browse/PSBM-17157

Signed-off-by: Dmitry Guryanov <dgurya...@parallels.com>
Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>

================================================
ve: initialize fsync_enable also for non ve0 environment

Patchset description:

ve: fix initialization and remove sysctl_fsync_enable

v2:
- initialize only on ve cgroup creation, remove get_ve_features
- rename setup_iptables_mask into ve_setup_iptables_mask

https://jira.sw.ru/browse/PSBM-34286
https://jira.sw.ru/browse/PSBM-34285

Pavel Tikhomirov (4):
  ve: remove sysctl_fsync_enable and use ve_fsync_behavior instead
  ve: initialize fsync_enable also for non ve0 environment
  ve: iptables: fix mask initialization and changing
  ve: cgroup: initialize odirect_enable, features and _randomize_va_space

=====================================================================
Combined several vz7 patches into one:
 d35caf1 ("ve/fs/sync: per containter sync and syncfs")
 3016bac ("ve: remove sync_mutex")
 4cc281e ("ve: remove sysctl_fsync_enable and use ve_fsync_behavior instead")
 c3e4103 ("ve/fs: introduce "fs.fsync-enable" and "fs.odirect_enable" sysctls")
 fdbb570 ("fs: Restrict ve sync methods")

VZ 8 rebase part https://jira.sw.ru/browse/PSBM-127782
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>

khorenko@ changes:
 - "2" -> "FSYNC_FILTERED" in a couple of places
 - -               if (!sb_rdonly(sb) && sb->s_root && sb->s_bdi)
   +               if (!sb_rdonly(sb) && sb->s_root && (sb->s_flags & SB_BORN))

+++
ve/msync: fix wrong behaviour of fs.fsync-enable

When FSYNC_NEVER is set in container (in fs.fsync-enable sysctl) syncs
should be ignored instead of failing with ENOMEM as we have now.

https://jira.sw.ru/browse/PSBM-131652

Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>
Acked-by: Alexander Mikhalitsyn <alexander.mikhalit...@virtuozzo.com>

+++
ve/sync/mounts: skip cursor mounts when iterating over mnt_ns->list

After RHEL ported "proc/mounts: add cursor" we need to iterate over
mounts list in mntns more carefully:

 - Export mnt_list_next and move it out from CONFIG_PROC_FS;
 - Use mnt_list_next in sync_collect_filesystems to skip cursors.

Otherwise kernel would break at dereferencing something from
uninitialized cursor mount.

https://jira.sw.ru/browse/PSBM-131158
Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>

+++
fs/sync: fix nullptr dereference ve->ve_ns->mnt_ns

ve_ns is not guaranteed to be non-NULL. Fix
is_sb_ve_accessible() and sync_collect_filesystems()
Also add rcu_dereference since ve->ve_ns is rcu-protected

An example of shell commands to crash kernel:

 # mkdir /sys/fs/cgroup/ve/10001
 # echo 10001 >  /sys/fs/cgroup/ve/10001/ve.veid
 # echo $$ > /sys/fs/cgroup/ve/10001/tasks
 # sync

[59390.889322] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
[59390.889395] PGD 0 P4D 0
[59390.889442] Oops: 0000 [#1] SMP PTI
[59390.889492] CPU: 1 PID: 8950 Comm: sync ve: 10001 Kdump: loaded Not tainted 
4.18.0-240.1.1.vz8.5.47 #1 5.47
[59390.889554] Hardware name: Virtuozzo KVM, BIOS 1.10.2-3.1.vz7.3 04/01/2014
[59390.889622] RIP: 0010:sync_filesystems_ve+0x34/0x220
[59390.889673] Code: 55 41 54 55 53 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 
89 44 24 18 31 c0 48 8b 87 98 01 00 00 48 8d 6c 24 08 48 89 6c 24 08 <4c> 8b 68 
18 48 8b 44 24 08 48 89 6c 24 10 48 39 c5 0f 85 ce 01 00
[59390.889798] RSP: 0018:ffffb1b7810a7ec0 EFLAGS: 00010246
[59390.889849] RAX: 0000000000000000 RBX: ffff92309ab7c418 RCX: 0000000000000000
[59390.889903] RDX: ffff92308bbff180 RSI: 0000000000000000 RDI: ffff92309ab7c418
[59390.889958] RBP: ffffb1b7810a7ec8 R08: 0000000000000000 R09: 0000000000000000
[59390.890016] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[59390.890071] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[59390.890126] FS:  00007fd7880b6540(0000) GS:ffff9230bbb00000(0000) 
knlGS:0000000000000000
[59390.890184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[59390.890235] CR2: 0000000000000018 CR3: 000000010b22e000 CR4: 00000000000006e0
[59390.890293] Call Trace:
[59390.890351]  ? __do_page_fault+0x23a/0x4f0
[59390.890407]  ksys_sync+0x10d/0x130
[59390.890456]  __ia32_sys_sync+0xa/0x10
[59390.890509]  do_syscall_64+0x5b/0x1a0
[59390.890562]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[59390.890620] RIP: 0033:0x7fd787fe4ffb
[59390.890667] Code: c3 48 8b 0d a7 8e 0c 00 f7 d8 64 89 01 b8 ff ff ff ff eb 
c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a2 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 75 8e 0c 00 f7 d8 64 89 01 48
[59390.890791] RSP: 002b:00007ffd853dd328 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a2
[59390.890848] RAX: ffffffffffffffda RBX: 00007ffd853dd468 RCX: 00007fd787fe4ffb
[59390.890903] RDX: 00007fd7880b2001 RSI: 0000000000000000 RDI: 00007fd788079b5e
[59390.890957] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[59390.891012] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[59390.891067] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fd7880ae1b4
[59390.896038] CR2: 0000000000000018

https://jira.sw.ru/browse/PSBM-130894
Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>

v2: move new sync_filesystems code under namespace_sem to ensure mnt_ns
won't dissapear unexpectedly

(cherry picked from vz8 commit 5a96860dcd780c5caaaaf7c95cbefc764cd7f88a)
Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>
---
 fs/fcntl.c          |   2 +
 fs/mount.h          |   2 +
 fs/namespace.c      |   8 +-
 fs/open.c           |   3 +
 fs/sync.c           | 213 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h  |  12 +++
 include/linux/ve.h  |   2 +
 kernel/ve/ve.c      |   3 +
 kernel/ve/veowner.c |   8 ++
 mm/msync.c          |   2 +
 10 files changed, 249 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 2e0c851..8af146e 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -68,6 +68,8 @@ static int setfl(int fd, struct file * filp, unsigned long 
arg)
        if (!may_use_odirect())
                arg &= ~O_DIRECT;
 
+       if (ve_fsync_behavior() == FSYNC_NEVER)
+               arg &= ~O_SYNC;
        /*
         * O_APPEND cannot be cleared if the file is marked as append-only
         * and the file is open for write.
diff --git a/fs/mount.h b/fs/mount.h
index e19f732..7c6b724 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -100,6 +100,8 @@ static inline int is_mounted(struct vfsmount *mnt)
        return !IS_ERR_OR_NULL(real_mount(mnt)->mnt_ns);
 }
 
+extern struct rw_semaphore namespace_sem;
+
 extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *);
 
 extern int __legitimize_mnt(struct vfsmount *, unsigned);
diff --git a/fs/namespace.c b/fs/namespace.c
index c106149..7af19eb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -72,7 +72,7 @@ static int __init set_mphash_entries(char *str)
 static struct hlist_head *mount_hashtable __read_mostly;
 static struct hlist_head *mountpoint_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
-static DECLARE_RWSEM(namespace_sem);
+DECLARE_RWSEM(namespace_sem);
 static HLIST_HEAD(unmounted);  /* protected by namespace_sem */
 static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 
@@ -1300,9 +1300,8 @@ struct vfsmount *mnt_clone_internal(const struct path 
*path)
        return &p->mnt;
 }
 
-#ifdef CONFIG_PROC_FS
-static struct mount *mnt_list_next(struct mnt_namespace *ns,
-                                  struct list_head *p)
+struct mount *mnt_list_next(struct mnt_namespace *ns,
+                           struct list_head *p)
 {
        struct mount *mnt, *ret = NULL;
 
@@ -1319,6 +1318,7 @@ static struct mount *mnt_list_next(struct mnt_namespace 
*ns,
        return ret;
 }
 
+#ifdef CONFIG_PROC_FS
 /* iterator; we want it to have access to namespace_sem, thus here... */
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
diff --git a/fs/open.c b/fs/open.c
index 040df8b..65e60aa 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -785,6 +785,9 @@ static int do_dentry_open(struct file *f,
        if (!may_use_odirect())
                f->f_flags &= ~O_DIRECT;
 
+       if (ve_fsync_behavior() == FSYNC_NEVER)
+               f->f_flags &= ~O_SYNC;
+
        if (unlikely(f->f_flags & O_PATH)) {
                f->f_mode = FMODE_PATH | FMODE_OPENED;
                f->f_op = &empty_fops;
diff --git a/fs/sync.c b/fs/sync.c
index 1373a61..1c78756 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -8,6 +8,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/sched.h>
 #include <linux/writeback.h>
@@ -16,7 +17,9 @@
 #include <linux/pagemap.h>
 #include <linux/quotaops.h>
 #include <linux/backing-dev.h>
+#include <linux/ve.h>
 #include "internal.h"
+#include "mount.h"
 
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
                        SYNC_FILE_RANGE_WAIT_AFTER)
@@ -96,6 +99,160 @@ static void fdatawait_one_bdev(struct block_device *bdev, 
void *arg)
        filemap_fdatawait_keep_errors(bdev->bd_inode->i_mapping);
 }
 
+struct sync_sb {
+       struct list_head list;
+       struct super_block *sb;
+};
+
+static void sync_release_filesystems(struct list_head *sync_list)
+{
+       struct sync_sb *ss, *tmp;
+
+       list_for_each_entry_safe(ss, tmp, sync_list, list) {
+               list_del(&ss->list);
+               put_super(ss->sb);
+               kfree(ss);
+       }
+}
+
+static int sync_filesystem_collected(struct list_head *sync_list, struct 
super_block *sb)
+{
+       struct sync_sb *ss;
+
+       list_for_each_entry(ss, sync_list, list)
+               if (ss->sb == sb)
+                       return 1;
+       return 0;
+}
+
+static int sync_collect_filesystems(struct ve_struct *ve, struct list_head 
*sync_list)
+{
+       struct mount *mnt;
+       struct mnt_namespace *mnt_ns;
+       struct nsproxy *ve_ns;
+       struct sync_sb *ss;
+       int ret = 0;
+
+       BUG_ON(!list_empty(sync_list));
+
+       down_read(&namespace_sem);
+
+       rcu_read_lock();
+       ve_ns = rcu_dereference(ve->ve_ns);
+       if (!ve_ns) {
+               rcu_read_unlock();
+               up_read(&namespace_sem);
+               return 0;
+       }
+       mnt_ns = ve_ns->mnt_ns;
+       rcu_read_unlock();
+
+       mnt = mnt_list_next(mnt_ns, &mnt_ns->list);
+       while (mnt) {
+               if (sync_filesystem_collected(sync_list, mnt->mnt.mnt_sb))
+                       goto next;
+
+               ss = kmalloc(sizeof(*ss), GFP_KERNEL);
+               if (ss == NULL) {
+                       ret = -ENOMEM;
+                       break;
+               }
+               ss->sb = mnt->mnt.mnt_sb;
+               /*
+                * We hold mount point and thus can be sure, that superblock is
+                * alive. And it means, that we can safely increase it's usage
+                * counter.
+                */
+               spin_lock(&sb_lock);
+               ss->sb->s_count++;
+               spin_unlock(&sb_lock);
+               list_add_tail(&ss->list, sync_list);
+next:
+               mnt = mnt_list_next(mnt_ns, &mnt->mnt_list);
+       }
+       up_read(&namespace_sem);
+       return ret;
+}
+
+static void sync_filesystems_ve(struct ve_struct *ve, int wait)
+{
+       struct super_block *sb;
+       LIST_HEAD(sync_list);
+       struct sync_sb *ss;
+
+       /*
+        * We don't need to care about allocating failure here. At least we
+        * don't need to skip sync on such error.
+        * Let's sync what we collected already instead.
+        */
+       sync_collect_filesystems(ve, &sync_list);
+
+       list_for_each_entry(ss, &sync_list, list) {
+               sb = ss->sb;
+               down_read(&sb->s_umount);
+               if (!sb_rdonly(sb) && sb->s_root && (sb->s_flags & SB_BORN))
+                       __sync_filesystem(sb, wait);
+               up_read(&sb->s_umount);
+       }
+
+       sync_release_filesystems(&sync_list);
+}
+
+static int is_sb_ve_accessible(struct ve_struct *ve, struct super_block *sb)
+{
+       struct mount *mnt;
+       struct mnt_namespace *mnt_ns;
+       struct nsproxy *ve_ns;
+       int ret = 0;
+
+       down_read(&namespace_sem);
+
+       rcu_read_lock();
+       ve_ns = rcu_dereference(ve->ve_ns);
+       if (!ve_ns) {
+               rcu_read_unlock();
+               up_read(&namespace_sem);
+               return 0;
+       }
+       mnt_ns = ve_ns->mnt_ns;
+       rcu_read_unlock();
+
+       list_for_each_entry(mnt, &mnt_ns->list, mnt_list) {
+               if (mnt->mnt.mnt_sb == sb) {
+                       ret = 1;
+                       break;
+               }
+       }
+       up_read(&namespace_sem);
+       return ret;
+}
+
+static int __ve_fsync_behavior(struct ve_struct *ve)
+{
+       /*
+        * - __ve_fsync_behavior() is not called for ve0
+        * - FSYNC_FILTERED for veX does NOT mean "filtered" behavior
+        * - FSYNC_FILTERED for veX means "get value from ve0"
+        */
+       if (ve->fsync_enable == FSYNC_FILTERED)
+               return get_ve0()->fsync_enable;
+       else if (ve->fsync_enable)
+               return FSYNC_FILTERED; /* sync forced by ve is always filtered 
*/
+       else
+               return 0;
+}
+
+int ve_fsync_behavior(void)
+{
+       struct ve_struct *ve;
+
+       ve = get_exec_env();
+       if (ve_is_super(ve))
+               return FSYNC_ALWAYS;
+       else
+               return __ve_fsync_behavior(ve);
+}
+
 /*
  * Sync everything. We start by waking flusher threads so that most of
  * writeback runs on all devices in parallel. Then we sync all inodes reliably
@@ -108,8 +265,32 @@ static void fdatawait_one_bdev(struct block_device *bdev, 
void *arg)
  */
 void ksys_sync(void)
 {
+       struct ve_struct *ve = get_exec_env();
        int nowait = 0, wait = 1;
 
+       if (!ve_is_super(ve)) {
+               int fsb;
+               /*
+                * init can't sync during VE stop. Rationale:
+                *  - NFS with -o hard will block forever as network is down
+                *  - no useful job is performed as VE0 will call umount/sync
+                *    by his own later
+                *  Den
+                */
+               if (is_child_reaper(task_pid(current)))
+                       return;
+
+               fsb = __ve_fsync_behavior(ve);
+               if (fsb == FSYNC_NEVER)
+                       return;
+
+               if (fsb == FSYNC_FILTERED) {
+                       sync_filesystems_ve(ve, nowait);
+                       sync_filesystems_ve(ve, wait);
+                       return;
+               }
+       }
+
        wakeup_flusher_threads(WB_REASON_SYNC);
        iterate_supers(sync_inodes_one_sb, NULL);
        iterate_supers(sync_fs_one_sb, &nowait);
@@ -162,18 +343,42 @@ void emergency_sync(void)
 {
        struct fd f = fdget(fd);
        struct super_block *sb;
-       int ret, ret2;
+       int ret = 0, ret2 = 0;
+       struct ve_struct *ve;
 
        if (!f.file)
                return -EBADF;
        sb = f.file->f_path.dentry->d_sb;
 
+       ve = get_exec_env();
+
+       if (!ve_is_super(ve)) {
+               int fsb;
+               /*
+                * init can't sync during VE stop. Rationale:
+                *  - NFS with -o hard will block forever as network is down
+                *  - no useful job is performed as VE0 will call umount/sync
+                *    by his own later
+                *  Den
+                */
+               if (is_child_reaper(task_pid(current)))
+                       goto fdput;
+
+               fsb = __ve_fsync_behavior(ve);
+               if (fsb == FSYNC_NEVER)
+                       goto fdput;
+
+               if ((fsb == FSYNC_FILTERED) && !is_sb_ve_accessible(ve, sb))
+                       goto fdput;
+       }
+
        down_read(&sb->s_umount);
        ret = sync_filesystem(sb);
        up_read(&sb->s_umount);
 
        ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
 
+fdput:
        fdput(f);
        return ret ? ret : ret2;
 }
@@ -217,9 +422,13 @@ int vfs_fsync(struct file *file, int datasync)
 
 static int do_fsync(unsigned int fd, int datasync)
 {
-       struct fd f = fdget(fd);
+       struct fd f;
        int ret = -EBADF;
 
+       if (ve_fsync_behavior() == FSYNC_NEVER)
+               return 0;
+
+       f = fdget(fd);
        if (f.file) {
                ret = vfs_fsync(f.file, datasync);
                fdput(f);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 01419db..42021f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -71,6 +71,7 @@
 struct fs_context;
 struct fs_parameter_spec;
 struct fileattr;
+struct mnt_namespace;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -2521,6 +2522,7 @@ extern struct dentry *mount_nodev(struct file_system_type 
*fs_type,
 void kill_litter_super(struct super_block *sb);
 void deactivate_super(struct super_block *sb);
 void deactivate_locked_super(struct super_block *sb);
+void put_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int set_anon_super_fc(struct super_block *s, struct fs_context *fc);
 int get_anon_bdev(dev_t *);
@@ -3146,6 +3148,13 @@ static inline void i_readcount_inc(struct inode *inode)
 
 extern char *file_path(struct file *, char *, int);
 
+int ve_fsync_behavior(void);
+
+#define FSYNC_NEVER    0       /* ve syncs are ignored    */
+#define FSYNC_ALWAYS   1       /* ve syncs work as ususal */
+#define FSYNC_FILTERED 2       /* ve syncs only its files */
+/* For non-ve0 FSYNC_FILTERED value means "get value from ve0". */
+
 #include <linux/err.h>
 
 /* needed for stackable file system support */
@@ -3495,6 +3504,9 @@ void setattr_copy(struct user_namespace *, struct inode 
*inode,
 
 extern int file_update_time(struct file *file);
 
+extern struct mount *mnt_list_next(struct mnt_namespace *ns,
+                                  struct list_head *p);
+
 static inline bool vma_is_dax(const struct vm_area_struct *vma)
 {
        return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
diff --git a/include/linux/ve.h b/include/linux/ve.h
index 3d5a1dc..ad1c4710 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -57,6 +57,8 @@ struct ve_struct {
        struct kstat_lat_pcpu_struct    sched_lat_ve;
        int                     odirect_enable;
 
+       int                     fsync_enable;
+
 #if IS_ENABLED(CONFIG_BINFMT_MISC)
        struct binfmt_misc      *binfmt_misc;
 #endif
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index e8616d9..fe4c4d9 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -56,6 +56,7 @@ struct ve_struct ve0 = {
        .sched_lat_ve.cur       = &ve0_lat_stats,
        .netns_avail_nr         = ATOMIC_INIT(INT_MAX),
        .netns_max_nr           = INT_MAX,
+       .fsync_enable           = FSYNC_FILTERED,
        ._randomize_va_space    =
 #ifdef CONFIG_COMPAT_BRK
                                        1,
@@ -678,6 +679,8 @@ static struct cgroup_subsys_state *ve_create(struct 
cgroup_subsys_state *parent_
        ve->meminfo_val = VE_MEMINFO_DEFAULT;
 
        ve->odirect_enable = 2;
+       /* for veX FSYNC_FILTERED means "get value from ve0 */
+       ve->fsync_enable = FSYNC_FILTERED;
 
        atomic_set(&ve->netns_avail_nr, NETNS_MAX_NR_DEFAULT);
        ve->netns_max_nr = NETNS_MAX_NR_DEFAULT;
diff --git a/kernel/ve/veowner.c b/kernel/ve/veowner.c
index b0aba35..e255fe5 100644
--- a/kernel/ve/veowner.c
+++ b/kernel/ve/veowner.c
@@ -7,6 +7,7 @@
  *
  */
 
+#include <linux/ve.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/proc_fs.h>
@@ -66,6 +67,13 @@ static void prepare_proc(void)
                .extra1         = &ve_mount_nr_min,
                .extra2         = &ve_mount_nr_max,
        },
+       {
+               .procname       = "fsync-enable",
+               .data           = &ve0.fsync_enable,
+               .maxlen         = sizeof(int),
+               .mode           = 0644 | S_ISVTX,
+               .proc_handler   = &proc_dointvec_virtual,
+       },
        { }
 };
 
diff --git a/mm/msync.c b/mm/msync.c
index 137d1c1..20737eb 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -51,6 +51,8 @@
        if (end < start)
                goto out;
        error = 0;
+       if (ve_fsync_behavior() == FSYNC_NEVER)
+               goto out;
        if (end == start)
                goto out;
        /*
-- 
1.8.3.1

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RH9 06/22] ve/fs/sync: Per containter sync and syncfs and fs.fsync-enable sysctl

Reply via email to