Andrew's question makes me think if this fix is superficial. Actually i have had same question. But when i saw a smilar patch in kernel-4.4 was already merged in 2012, i decided to submit this patch first.
Here is the call stack i got: [ 0.003450] [<ffffff8bef2a0190>] proc_mount+0x2c/0x98 [ 0.003459] [<ffffff8bef22e560>] mount_fs+0x164/0x190 [ 0.003465] [<ffffff8bef24c138>] vfs_kern_mount+0x74/0x168 [ 0.003469] [<ffffff8bef24c244>] kern_mount_data+0x18/0x30 [ 0.003474] [<ffffff8bef2a0258>] pid_ns_prepare_proc+0x24/0x40 [ 0.003484] [<ffffff8bef0cd5ec>] alloc_pid+0x498/0x4b4 [ 0.003492] [<ffffff8bef0a9b94>] copy_process.isra.73.part.74+0xed0/0x1708 [ 0.003496] [<ffffff8bef0aa560>] _do_fork+0xdc/0x3f8 [ 0.003501] [<ffffff8bef0aa8c8>] kernel_thread+0x34/0x3c [ 0.003511] [<ffffff8bf00cd498>] rest_init+0x20/0x80 [ 0.003522] [<ffffff8bf0c00c7c>] start_kernel+0x3e4/0x43c [ 0.003527] [<ffffff8bf0c001e8>] __primary_switched+0x64/0x90 I notice only proc filesystem has function "pid_ns_prepare_proc". there is no other "pid_ns_prepare_xxx" function in other filesystem. Take the position of proc filesystem of kernel into consideration, the answer of question "Other filesystems parse the options from fill_super(). Is proc special in some fashion" could be "Yes, it is. Because proc filesystem is special indeed. It's a filesystem kernel will mount when it's booting". But is it enough? Is anyone responsible for deinitialize sb->sroot? Well actually i'm not an expert of filesystem, and don't unserstand what does sb->s_root represent for. But i'm sure no one call "pid_ns_release_proc" in the runtime(by add some logs). And even it is called, it doesn't clean sb->s_root. Until now, i didn't see any deeper issue. Maybe it's true that we should handle proc filesystem specially. If anyone who is sure about the functionality of sb->s_root and think it should be handled in another way, feel free to correct me. 程洋 <chengy...@xiaomi.com> 于2018年11月30日周五 上午10:34写道: >> Here is an article illustrates the details. > https://medium.com/@topjohnwu/from-anime-game-to-android-system-security-vulnerability-9b955a182f20 > > And There is a similar fix on kernel-4.4: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=99663be772c827b8f5f594fe87eb4807be1994e5 > > Q: Other filesystems parse the options from fill_super(). Is proc special in > some fashion? > A: According to my research, start_kernel will call proc_mount first, and > initialize sb->s_root before any userspace process runs. If others want to > mount it, all options will be ignored. > AOSP change here: > https://android-review.googlesource.com/c/platform/system/core/+/181345/4/init/init.cpp > At first I though we should mount it with MS_REMOUNT flag. But kernel > will crash if we did this. > > Q: Why is this considered to be security sensitive? I can guess, but I'd > like to know your reasoning. > A: See the article above. It's part of Android sanbox. > > > > [PATCH] Security: Handle hidepid option correctly > > Why is this considered to be security sensitive? I can guess, but I'd like > to know your reasoning. > > On Thu, 29 Nov 2018 19:08:21 +0800 mailto:d17103...@gmail.com wrote: > > > From: Cheng Yang <mailto:chengy...@xiaomi.com> > > > > The proc_parse_options() call from proc_mount() runs only once at boot > > time. So on any later mount attempt, any mount options are ignored > > because ->s_root is already initialized. > > As a consequence, "mount -o <options>" will ignore the options. The > > only way to change mount options is "mount -o remount,<options>". > > To fix this, parse the mount options unconditionally. > > > > --- a/fs/proc/inode.c > > +++ b/fs/proc/inode.c > > @@ -493,13 +493,9 @@ struct inode *proc_get_inode(struct super_block > > *sb, struct proc_dir_entry *de) > > > > int proc_fill_super(struct super_block *s, void *data, int silent) { > > -struct pid_namespace *ns = get_pid_ns(s->s_fs_info); > > struct inode *root_inode; > > int ret; > > > > -if (!proc_parse_options(data, ns)) > > -return -EINVAL; > > - > > /* User space would break if executables or devices appear on proc */ > > s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV; > > s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC; diff --git > > a/fs/proc/root.c b/fs/proc/root.c index f4b1a9d..f5f3bf3 100644 > > --- a/fs/proc/root.c > > +++ b/fs/proc/root.c > > @@ -98,6 +98,9 @@ static struct dentry *proc_mount(struct file_system_type > > *fs_type, > > ns = task_active_pid_ns(current); > > } > > > > +if (!proc_parse_options(data, ns)) > > +return ERR_PTR(-EINVAL); > > + > > return mount_ns(fs_type, flags, data, ns, ns->user_ns, > > proc_fill_super); } > > Other filesystems parse the options from fill_super(). Is proc special in > some fashion? > > #/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! > This e-mail and its attachments contain confidential information from > XIAOMI, which is intended only for the person or entity whose address is > listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, or > dissemination) by persons other than the intended recipient(s) is prohibited. > If you receive this e-mail in error, please notify the sender by phone or > email immediately and delete it!******/#