On Thu, May 05, 2016 at 01:18:43PM +0300, Kirill A. Shutemov wrote:
> On Thu, May 05, 2016 at 09:32:45AM +0800, kernel test robot wrote:
> > FYI, we noticed the following commit:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > commit 409ca714ac58768342cd39ca79c16f51e1825b3e ("mm, thp: avoid 
> > unnecessary swapin in khugepaged")
> > 
> > on test machine: vm-kbuild-1G: 2 threads qemu-system-x86_64 -enable-kvm 
> > -cpu Haswell,+smep,+smap with 1G memory
> > 
> > caused below changes:
> > 
> <trying to dedup strings, it's really annoying>
> 
> > 
> > [   21.116124] ======================================================
> > [   21.116124] [ INFO: possible circular locking dependency detected ]
> > [   21.116127] 4.6.0-rc5-00302-g409ca71 #1 Not tainted
> > [   21.116127] -------------------------------------------------------
> > [   21.116128] udevadm/221 is trying to acquire lock:
> > [   21.116138]  (&mm->mmap_sem){++++++}, at: [<ffffffff81262543>] 
> > __might_fault+0x83/0x150
> > [   21.116138] 
> > [   21.116138] but task is already holding lock:
> > [   21.116144]  (s_active#12){++++.+}, at: [<ffffffff813315ee>] 
> > kernfs_fop_write+0x8e/0x250
> > [   21.116144] 
> > [   21.116144] which lock already depends on the new lock.
> > [   21.116144] 
> > [   21.116145] the existing dependency chain (in reverse order) is:
> > [   21.116148] 
> > [   21.116148] -> #2 (s_active#12){++++.+}:
> > [   21.116152]        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
> > [   21.116155]        [<ffffffff8132f50a>] __kernfs_remove+0x2da/0x410
> > [   21.116158]        [<ffffffff81330630>] 
> > kernfs_remove_by_name_ns+0x40/0x90
> > [   21.116160]        [<ffffffff813339fb>] sysfs_remove_file_ns+0x2b/0x70
> > [   21.116164]        [<ffffffff81ba8a16>] device_del+0x166/0x320
> > [   21.116166]        [<ffffffff81ba943c>] device_destroy+0x3c/0x50
> > [   21.116170]        [<ffffffff8105aa61>] 
> > cpuid_class_cpu_callback+0x51/0x70
> > [   21.116173]        [<ffffffff81131ce9>] notifier_call_chain+0x59/0x190
> > [   21.116177]        [<ffffffff81132749>] 
> > __raw_notifier_call_chain+0x9/0x10
> > [   21.116180]        [<ffffffff810fe6b0>] __cpu_notify+0x40/0x90
> > [   21.116182]        [<ffffffff810fe890>] cpu_notify_nofail+0x10/0x30
> > [   21.116185]        [<ffffffff810fe8d7>] notify_dead+0x27/0x1e0
> > [   21.116187]        [<ffffffff810fe273>] cpuhp_down_callbacks+0x93/0x190
> > [   21.116192]        [<ffffffff82096062>] _cpu_down+0xc2/0x1e0
> > [   21.116194]        [<ffffffff810ff727>] do_cpu_down+0x37/0x50
> > [   21.116197]        [<ffffffff8110003b>] cpu_down+0xb/0x10
> > [   21.116201]        [<ffffffff81038e4d>] _debug_hotplug_cpu+0x7d/0xd0
> > [   21.116205]        [<ffffffff8435d6bb>] debug_hotplug_cpu+0xd/0x11
> > [   21.116208]        [<ffffffff84352426>] do_one_initcall+0x138/0x1cf
> > [   21.116211]        [<ffffffff8435270a>] kernel_init_freeable+0x24d/0x2de
> > [   21.116214]        [<ffffffff8209533a>] kernel_init+0xa/0x120
> > [   21.116217]        [<ffffffff820a7972>] ret_from_fork+0x22/0x50
> > [   21.116221] 
> > [   21.116221] -> #1 (cpu_hotplug.lock#2){+.+.+.}:
> > [   21.116223]        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
> > [   21.116226]        [<ffffffff820a20d1>] mutex_lock_nested+0x71/0x4c0
> > [   21.116228]        [<ffffffff810ff526>] get_online_cpus+0x66/0x80
> > [   21.116232]        [<ffffffff81246fb3>] sum_vm_event+0x23/0x1b0
> > [   21.116236]        [<ffffffff81293768>] collapse_huge_page+0x118/0x10b0
> > [   21.116238]        [<ffffffff81294c5d>] khugepaged+0x55d/0xe80
> > [   21.116240]        [<ffffffff81130304>] kthread+0x134/0x1a0
> > [   21.116242]        [<ffffffff820a7972>] ret_from_fork+0x22/0x50
> > [   21.116244] 
> > [   21.116244] -> #0 (&mm->mmap_sem){++++++}:
> > [   21.116246]        [<ffffffff8117bf61>] __lock_acquire+0x2861/0x31f0
> > [   21.116248]        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
> > [   21.116251]        [<ffffffff8126257e>] __might_fault+0xbe/0x150
> > [   21.116253]        [<ffffffff8133160f>] kernfs_fop_write+0xaf/0x250
> > [   21.116256]        [<ffffffff812a8933>] __vfs_write+0x43/0x1a0
> > [   21.116258]        [<ffffffff812a8d3a>] vfs_write+0xda/0x240
> > [   21.116260]        [<ffffffff812a8f84>] SyS_write+0x44/0xa0
> > [   21.116263]        [<ffffffff820a773c>] 
> > entry_SYSCALL_64_fastpath+0x1f/0xbd
> > [   21.116264] 
> > [   21.116264] other info that might help us debug this:
> > [   21.116264] 
> > [   21.116268] Chain exists of:
> > [   21.116268]   &mm->mmap_sem --> cpu_hotplug.lock#2 --> s_active#12
> > [   21.116268] 
> > [   21.116268]  Possible unsafe locking scenario:
> > [   21.116268] 
> > [   21.116269]        CPU0                    CPU1
> > [   21.116269]        ----                    ----
> > [   21.116270]   lock(s_active#12);
> > [   21.116271]                                lock(cpu_hotplug.lock#2);
> > [   21.116272]                                lock(s_active#12);
> > [   21.116273]   lock(&mm->mmap_sem);
> > [   21.116274] 
> > [   21.116274]  *** DEADLOCK ***
> > [   21.116274] 
> > [   21.116274]  *** DEADLOCK ***
> > [   21.116274] 
> > [   21.116274] 3 locks held by udevadm/221:
> > [   21.116278]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff812ad64d>] 
> > __sb_start_write+0x6d/0x120
> > [   21.116280]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff813315e6>] 
> > kernfs_fop_write+0x86/0x250
> > [   21.116282]  #2:  (s_active#12){++++.+}, at: [<ffffffff813315ee>] 
> > kernfs_fop_write+0x8e/0x250
> > [   21.116283] 
> > [   21.116283] stack backtrace:
> > [   21.116283] 
> > [   21.116283] stack backtrace:
> > [   21.116284] CPU: 1 PID: 221 Comm: udevadm Not tainted 
> > 4.6.0-rc5-00302-g409ca71 #1
> > [   21.116287]  ffff88003f698000 ffff88003f077bf0 ffffffff81444ef3 
> > 0000000000000011
> > [   21.116288]  ffffffff84bdd8f0 ffffffff84bf2630 ffff88003f077c40 
> > ffffffff81173e91
> > [   21.116290]  0000000000000000 ffffffff84fbdbc0 00ff88003f077c40 
> > ffff88003f698bb8
> > [   21.116290] Call Trace:
> > [   21.116293]  [<ffffffff81444ef3>] dump_stack+0x86/0xd3
> > [   21.116294]  [<ffffffff81173e91>] print_circular_bug+0x221/0x360
> > [   21.116296]  [<ffffffff8117bf61>] __lock_acquire+0x2861/0x31f0
> > [   21.116297]  [<ffffffff8117da2c>] lock_acquire+0xac/0x180
> > [   21.116299]  [<ffffffff81262543>] ? __might_fault+0x83/0x150
> > [   21.116300]  [<ffffffff8126257e>] __might_fault+0xbe/0x150
> > [   21.116302]  [<ffffffff81262543>] ? __might_fault+0x83/0x150
> > [   21.116303]  [<ffffffff8133160f>] kernfs_fop_write+0xaf/0x250
> > [   21.116304]  [<ffffffff812a8933>] __vfs_write+0x43/0x1a0
> > [   21.116306]  [<ffffffff8116fe0d>] ? update_fast_ctr+0x1d/0x80
> > [   21.116308]  [<ffffffff8116ffe7>] ? percpu_down_read+0x57/0xa0
> > [   21.116310]  [<ffffffff812ad64d>] ? __sb_start_write+0x6d/0x120
> > [   21.116311]  [<ffffffff812ad64d>] ? __sb_start_write+0x6d/0x120
> > [   21.116312]  [<ffffffff812a8d3a>] vfs_write+0xda/0x240
> > [   21.116314]  [<ffffffff812a8f84>] SyS_write+0x44/0xa0
> > [   21.116315]  [<ffffffff820a773c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
> 
> If I read this correctly (I'm not sure about this), we shouldn't call
> sum_vm_event() under mmap_sem.
> 
> BTW, we do need mmap_sem for swapin, but there's no need for exclusive
> one. It can be too expensive to do I/O with down_write(mmap_sem).
> 
> Ebru, could look how to move sum_vm_event() outside mmap_sem and probably
> have down_read(mmap_sem) during swapin? I don't have time for this right
> now.

I started to work on down_read(mmap_sem) a few days ago. But firstly,
I'll take this issue before down_read.

kind regards.

Reply via email to