On 29.06.2021 12:33, Pavel Tikhomirov wrote:


On 25.06.2021 20:12, Andrey Zhadchenko wrote:
ve_ns is not guaranteed to be non-NULL. Fix
is_sb_ve_accessible() and sync_collect_filesystems()
Also add rcu_dereference since ve->ve_ns is rcu-protected

An example of shell commands to crash kernel:

mkdir /sys/fs/cgroup/ve/10001
cat /sys/fs/cgroup/ve/10001/tasks
echo 10001 >  /sys/fs/cgroup/ve/10001/ve.veid
echo $$ > /sys/fs/cgroup/ve/10001/tasks
sync

[59390.889322] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[59390.889395] PGD 0 P4D 0
[59390.889442] Oops: 0000 [#1] SMP PTI
[59390.889492] CPU: 1 PID: 8950 Comm: sync ve: 10001 Kdump: loaded Not tainted 4.18.0-240.1.1.vz8.5.47 #1 5.47 [59390.889554] Hardware name: Virtuozzo KVM, BIOS 1.10.2-3.1.vz7.3 04/01/2014
[59390.889622] RIP: 0010:sync_filesystems_ve+0x34/0x220
[59390.889673] Code: 55 41 54 55 53 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 48 8b 87 98 01 00 00 48 8d 6c 24 08 48 89 6c 24 08 <4c> 8b 68 18 48 8b 44 24 08 48 89 6c 24 10 48 39 c5 0f 85 ce 01 00
[59390.889798] RSP: 0018:ffffb1b7810a7ec0 EFLAGS: 00010246
[59390.889849] RAX: 0000000000000000 RBX: ffff92309ab7c418 RCX: 0000000000000000 [59390.889903] RDX: ffff92308bbff180 RSI: 0000000000000000 RDI: ffff92309ab7c418 [59390.889958] RBP: ffffb1b7810a7ec8 R08: 0000000000000000 R09: 0000000000000000 [59390.890016] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [59390.890071] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [59390.890126] FS:  00007fd7880b6540(0000) GS:ffff9230bbb00000(0000) knlGS:0000000000000000
[59390.890184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[59390.890235] CR2: 0000000000000018 CR3: 000000010b22e000 CR4: 00000000000006e0
[59390.890293] Call Trace:
[59390.890351]  ? __do_page_fault+0x23a/0x4f0
[59390.890407]  ksys_sync+0x10d/0x130
[59390.890456]  __ia32_sys_sync+0xa/0x10
[59390.890509]  do_syscall_64+0x5b/0x1a0
[59390.890562]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[59390.890620] RIP: 0033:0x7fd787fe4ffb
[59390.890667] Code: c3 48 8b 0d a7 8e 0c 00 f7 d8 64 89 01 b8 ff ff ff ff eb c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 8e 0c 00 f7 d8 64 89 01 48 [59390.890791] RSP: 002b:00007ffd853dd328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2 [59390.890848] RAX: ffffffffffffffda RBX: 00007ffd853dd468 RCX: 00007fd787fe4ffb [59390.890903] RDX: 00007fd7880b2001 RSI: 0000000000000000 RDI: 00007fd788079b5e [59390.890957] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [59390.891012] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [59390.891067] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fd7880ae1b4
[59390.896038] CR2: 0000000000000018

https://jira.sw.ru/browse/PSBM-130894
Signed-off-by: Andrey Zhadchenko <[email protected]>
---
  fs/sync.c | 24 ++++++++++++++++++++++--
  1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index ef4b1d1..2a7c96b 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -127,10 +127,20 @@ static int sync_filesystem_collected(struct list_head *sync_list, struct super_b   static int sync_collect_filesystems(struct ve_struct *ve, struct list_head *sync_list)
  {
      struct mount *mnt;
-    struct mnt_namespace *mnt_ns = ve->ve_ns->mnt_ns;
+    struct mnt_namespace *mnt_ns;
+    struct nsproxy *ve_ns;
      struct sync_sb *ss;
      int ret = 0;
+    rcu_read_lock();
+    ve_ns = rcu_dereference(ve->ve_ns);

Above you read ve->ve_ns to ve_ns variable but never use it. You probably designed it to use in the below code, because ve->ve_ns can change to NULL after the check below and we still get crash.

+    if (!ve->ve_ns) {
+        rcu_read_unlock();
+        return 0;
+    }
+    mnt_ns = ve->ve_ns->mnt_ns;

One more thing, should not we get some refference on mnt_ns? Can not it be released/freed by put_nsproxy() on ve_ns?

+    rcu_read_unlock();
+
      BUG_ON(!list_empty(sync_list));
      down_read(&namespace_sem);
@@ -185,9 +195,19 @@ static void sync_filesystems_ve(struct ve_struct *ve, int wait)   static int is_sb_ve_accessible(struct ve_struct *ve, struct super_block *sb)
  {
      struct mount *mnt;
-    struct mnt_namespace *mnt_ns = ve->ve_ns->mnt_ns;
+    struct mnt_namespace *mnt_ns;
+    struct nsproxy *ve_ns;
      int ret = 0;
+    rcu_read_lock();
+    ve_ns = rcu_dereference(ve->ve_ns);

Same thing here.

+    if (!ve->ve_ns) {
+        rcu_read_unlock();
+        return 0;
+    }
+    mnt_ns = ve->ve_ns->mnt_ns;
+    rcu_read_unlock();
+
      down_read(&namespace_sem);
      list_for_each_entry(mnt, &mnt_ns->list, mnt_list) {
          if (mnt->mnt.mnt_sb == sb) {



--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to