Hello:

We are running a Ruby/Passenger app called Canvas (by Instructure).  It runs 
from 3 application servers that use shared storage (iSCSI, EqualLogic Volume, 
LVM2, OCFS2) for user uploaded files.  We have been experiencing strange 
behavior where a ruby and jbd2 process will go into a 'D' (I/O wait) state 
randomly with some I/O operations on the shared storage (a file upload most 
commonly).  I turned on some kernel debugging options for one of the 
application servers and captured the following 'possible circular locking 
dependency' error during one of these times.  Would anyone be able to provide 
any insight into possible ways to troubleshoot this further?

Our distro is Gentoo Linux using hardened kernel sources v3.11.2 on this 
particular app server.  We do not have GrSecurity or PaX enabled.  The other 
app servers are running kernel 3.2.30.  Was testing newer kernel to see if it 
helped - which it doesn't appear to have.

Thanks for any help.

[  858.570825] ======================================================
[  858.570878] [ INFO: possible circular locking dependency detected ]
[  858.570932] 3.11.2-hardenedcanvas-2013-12-16 #4 Not tainted
[  858.570980] -------------------------------------------------------
[  858.571033] ruby/3710 is trying to acquire lock:
[  858.571073]  (&sc->sc_send_lock){+.+...}, at: [<ffffffff81355567>] 
o2net_send_message_vec+0x543/0x8dd
[  858.571165]
but task is already holding lock:
[  858.571213]  (jbd2_handle){+.+...}, at: [<ffffffff8123986e>] 
start_this_handle+0x4e8/0x57d
[  858.571294]
which lock already depends on the new lock.

[  858.571361]
the existing dependency chain (in reverse order) is:
[  858.571423]
-> #4 (jbd2_handle){+.+...}:
[  858.571464]        [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.571539]        [<ffffffff812398e0>] start_this_handle+0x55a/0x57d
[  858.571615]        [<ffffffff81239b0c>] jbd2__journal_start+0x9f/0x15c
[  858.571691]        [<ffffffff8121feab>] __ext4_journal_start_sb+0x89/0xa6
[  858.571771]        [<ffffffff812005c1>] ext4_dirty_inode+0x25/0x54
[  858.571844]        [<ffffffff8117a338>] __mark_inode_dirty+0x54/0x210
[  858.571920]        [<ffffffff8116d0c7>] update_time+0x9c/0xa7
[  858.571989]        [<ffffffff8116d16e>] file_update_time+0x9c/0xbd
[  858.572060]        [<ffffffff8120077f>] ext4_page_mkwrite+0x50/0x2a1
[  858.572133]        [<ffffffff81130997>] __do_fault+0x125/0x2f5
[  858.572203]        [<ffffffff81132d9a>] handle_pte_fault+0x1d4/0x4f2
[  858.572277]        [<ffffffff81133459>] handle_mm_fault+0x1af/0x1c9
[  858.572349]        [<ffffffff81029a3c>] __do_page_fault+0x317/0x35e
[  858.572423]        [<ffffffff81029aab>] do_page_fault+0x9/0xb
[  858.572491]        [<ffffffff816e4e45>] page_fault+0x25/0x30
[  858.572560]
-> #3 (sb_pagefaults){.+.+..}:
[  858.572643]        [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.572712]        [<ffffffff81159900>] __sb_start_write+0x111/0x161
[  858.572786]        [<ffffffff81200773>] ext4_page_mkwrite+0x44/0x2a1
[  858.572860]        [<ffffffff81130997>] __do_fault+0x125/0x2f5
[  858.572928]        [<ffffffff81132d9a>] handle_pte_fault+0x1d4/0x4f2
[  858.573002]        [<ffffffff81133459>] handle_mm_fault+0x1af/0x1c9
[  858.573075]        [<ffffffff81029a3c>] __do_page_fault+0x317/0x35e
[  858.573148]        [<ffffffff81029aab>] do_page_fault+0x9/0xb
[  858.573216]        [<ffffffff816e4e45>] page_fault+0x25/0x30
[  858.573284]
-> #2 (&mm->mmap_sem){++++++}:
[  858.573361]        [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.573430]        [<ffffffff81130bc9>] might_fault+0x62/0x85
[  858.573498]        [<ffffffff8163e41b>] skb_do_copy_data_nocache+0x155/0x168
[  858.573578]        [<ffffffff8163f72a>] tcp_sendmsg+0x398/0x7e1
[  858.573648]        [<ffffffff8165ee8b>] inet_sendmsg+0x5e/0x67
[  858.573718]        [<ffffffff815f76d3>] sock_aio_write+0x101/0x111
[  858.573790]        [<ffffffff81157544>] do_sync_write+0x5d/0x89
[  858.573860]        [<ffffffff81158035>] vfs_write+0x105/0x14c
[  858.573929]        [<ffffffff8115832d>] SyS_write+0x57/0x83
[  858.573996]        [<ffffffff816e5424>] system_call_fastpath+0x1c/0x21
[  858.574072]
-> #1 (sk_lock-AF_INET){+.+.+.}:
[  858.574150]        [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.574219]        [<ffffffff815fc22e>] lock_sock_nested+0x69/0x79
[  858.574291]        [<ffffffff8163d8bc>] lock_sock+0xb/0xd
[  858.574356]        [<ffffffff8163efe4>] tcp_sendpage+0x55/0x3da
[  858.574426]        [<ffffffff8165ef7a>] inet_sendpage+0x82/0x97
[  858.574496]        [<ffffffff81354b02>] o2net_sendpage+0xae/0x21b
[  858.574568]        [<ffffffff81356d71>] o2net_accept_many+0x383/0x40a
[  858.574642]        [<ffffffff81099dc6>] process_one_work+0x1e9/0x358
[  858.574716]        [<ffffffff8109a31f>] worker_thread+0x159/0x1ee
[  858.574787]        [<ffffffff810a02b1>] kthread+0xac/0xb4
[  858.574853]        [<ffffffff816e5372>] ret_from_fork+0x72/0xa0
[  858.574924]
-> #0 (&sc->sc_send_lock){+.+...}:
[  858.575003]        [<ffffffff810c6e3c>] __lock_acquire+0xb65/0xe65
[  858.575077]        [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.575146]        [<ffffffff816e1732>] mutex_lock_nested+0x3f/0x353
[  858.575219]        [<ffffffff81355567>] o2net_send_message_vec+0x543/0x8dd
[  858.575296]        [<ffffffff81355925>] o2net_send_message+0x24/0x26
[  858.575369]        [<ffffffff81369390>] dlm_do_master_request+0xca/0x630
[  858.575446]        [<ffffffff8136c727>] dlm_get_lock_resource+0xa58/0xe1d
[  858.575523]        [<ffffffff813747d7>] dlmlock+0x865/0x11d3
[  858.575593]        [<ffffffff81349ad8>] o2cb_dlm_lock+0x59/0x69
[  858.575665]        [<ffffffff81349029>] ocfs2_dlm_lock+0x28/0x2a
[  858.575736]        [<ffffffff812ecef9>] ocfs2_lock_create+0xfa/0x270
[  858.575811]        [<ffffffff812ed68d>] ocfs2_create_new_lock+0x7a/0x89
[  858.575887]        [<ffffffff812ed81d>] 
ocfs2_create_new_inode_locks+0x181/0x2ae
[  858.575988]        [<ffffffff813068fc>] 
__ocfs2_mknod_locked.isra.21+0x3b1/0x4d4
[  858.576088]        [<ffffffff81306b39>] 
ocfs2_mknod_locked.isra.22+0x11a/0x12d
[  858.576186]        [<ffffffff813072d2>] ocfs2_mknod+0x786/0xe09
[  858.576257]        [<ffffffff813079d2>] ocfs2_mkdir+0x7d/0x11f
[  858.576326]        [<ffffffff81162ba7>] vfs_mkdir+0x5d/0x9e
[  858.576393]        [<ffffffff811650f9>] SyS_mkdirat+0x8f/0xe2
[  858.576462]        [<ffffffff81165160>] SyS_mkdir+0x14/0x16
[  858.576528]        [<ffffffff816e5424>] system_call_fastpath+0x1c/0x21
[  858.576604]
other info that might help us debug this:

[  858.576721] Chain exists of:
  &sc->sc_send_lock --> sb_pagefaults --> jbd2_handle

[  858.576847]  Possible unsafe locking scenario:

[  858.576931]        CPU0                    CPU1
[  858.576986]        ----                    ----
[  858.577042]   lock(jbd2_handle);
[  858.577090]                                lock(sb_pagefaults);
[  858.577161]                                lock(jbd2_handle);
[  858.577229]   lock(&sc->sc_send_lock);
[  858.577282]
*** DEADLOCK ***

[  858.577383] 11 locks held by ruby/3710:
[  858.577432]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff811714f9>] 
mnt_want_write+0x1f/0x46
[  858.577551]  #1:  (&type->i_mutex_dir_key#3/1){+.+.+.}, at: 
[<ffffffff8116279e>] kern_path_create+0x7f/0x14a
[  858.577681]  #2:  (Meta){++++++}, at: [<ffffffff81306c68>] 
ocfs2_mknod+0x11c/0xe09
[  858.577789]  #3:  
(&ocfs2_sysfile_lock_key[args->fi_sysfile_type]#3){+.+.+.}, at: 
[<ffffffff8131e853>] ocfs2_reserve_suballoc_bits+0xc3/0x12ce
[  858.577942]  #4:  (inode_alloc:%04d){+.+.+.}, at: [<ffffffff8131e869>] 
ocfs2_reserve_suballoc_bits+0xd9/0x12ce
[  858.578071]  #5:  
(&ocfs2_sysfile_lock_key[args->fi_sysfile_type]#4){+.+.+.}, at: 
[<ffffffff8131e853>] ocfs2_reserve_suballoc_bits+0xc3/0x12ce
[  858.578225]  #6:  (extent_alloc:%04d){+.+.+.}, at: [<ffffffff8131e869>] 
ocfs2_reserve_suballoc_bits+0xd9/0x12ce
[  858.578354]  #7:  
(&ocfs2_sysfile_lock_key[args->fi_sysfile_type]#5){+.+...}, at: 
[<ffffffff8130162c>] ocfs2_reserve_local_alloc_bits+0xad/0xc93
[  858.578509]  #8:  (sb_internal#2){.+.+..}, at: [<ffffffff812fcb30>] 
ocfs2_start_trans+0xa7/0x1be
[  858.578632]  #9:  (&journal->j_trans_barrier){.+.+..}, at: 
[<ffffffff812fcb40>] ocfs2_start_trans+0xb7/0x1be
[  858.578760]  #10:  (jbd2_handle){+.+...}, at: [<ffffffff8123986e>] 
start_this_handle+0x4e8/0x57d
[  858.578880]
stack backtrace:
[  858.578952] CPU: 3 PID: 3710 Comm: ruby Not tainted 
3.11.2-hardenedcanvas-2013-12-16 #4
[  858.579054] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/09/2012
[  858.579178]  ffffffff85305fa0 ffff880516e27348 ffffffff816dd690 
0000000000000006
[  858.579283]  ffffffff8531ace0 ffff880516e27398 ffffffff816d9328 
0000000000000002
[  858.579387]  ffffffff856d9850 ffff8804ffa8a190 ffff8804ffa89600 
ffff8804ffa8a190
[  858.579492] Call Trace:
[  858.580924]  [<ffffffff816dd690>] dump_stack+0x46/0x58
[  858.580994]  [<ffffffff816d9328>] print_circular_bug+0x1f8/0x209
[  858.581071]  [<ffffffff810c6e3c>] __lock_acquire+0xb65/0xe65
[  858.581145]  [<ffffffff810c60e4>] ? mark_lock+0x2d/0x220
[  858.581215]  [<ffffffff81355567>] ? o2net_send_message_vec+0x543/0x8dd
[  858.581296]  [<ffffffff810c761c>] lock_acquire+0xbf/0xfe
[  858.581366]  [<ffffffff81355567>] ? o2net_send_message_vec+0x543/0x8dd
[  858.581447]  [<ffffffff810c7c21>] ? debug_check_no_locks_freed+0x132/0x147
[  858.581531]  [<ffffffff81355567>] ? o2net_send_message_vec+0x543/0x8dd
[  858.581612]  [<ffffffff816e1732>] mutex_lock_nested+0x3f/0x353
[  858.581687]  [<ffffffff81355567>] ? o2net_send_message_vec+0x543/0x8dd
[  858.581770]  [<ffffffff810c45bd>] ? lockdep_init_map+0xc2/0x147
[  858.581848]  [<ffffffff81009715>] ? paravirt_read_tsc+0x9/0xd
[  858.581922]  [<ffffffff810097bc>] ? read_tsc+0x9/0x19
[  858.581991]  [<ffffffff810bac88>] ? timekeeping_get_ns.constprop.8+0x13/0x3a
[  858.582077]  [<ffffffff81355567>] o2net_send_message_vec+0x543/0x8dd
[  858.582163]  [<ffffffff810c60e4>] ? mark_lock+0x2d/0x220
[  858.582233]  [<ffffffff810c792d>] ? mark_held_locks+0x6d/0x95
[  858.582309]  [<ffffffff810f2470>] ? rcu_irq_exit+0x7c/0x90
[  858.582381]  [<ffffffff81355925>] o2net_send_message+0x24/0x26
[  858.582457]  [<ffffffff81369390>] dlm_do_master_request+0xca/0x630
[  858.582536]  [<ffffffff81009a93>] ? sched_clock+0x9/0xd
[  858.582606]  [<ffffffff8136c727>] dlm_get_lock_resource+0xa58/0xe1d
[  858.582684]  [<ffffffff810c7539>] ? lock_release+0x1ad/0x1d1
[  858.582758]  [<ffffffff813d0718>] ? do_raw_spin_unlock+0x8f/0x98
[  858.582835]  [<ffffffff81373ee0>] ? dlm_new_lock+0xbc/0x14e
[  858.582908]  [<ffffffff813495fc>] ? ocfs2_cluster_connect_agnostic+0x3b/0x3b
[  858.582993]  [<ffffffff813747d7>] dlmlock+0x865/0x11d3
[  858.583062]  [<ffffffff8134960d>] ? o2dlm_lock_ast_wrapper+0x11/0x11
[  858.583142]  [<ffffffff810c60e4>] ? mark_lock+0x2d/0x220
[  858.583211]  [<ffffffff81349ad8>] o2cb_dlm_lock+0x59/0x69
[  858.583282]  [<ffffffff813495fc>] ? ocfs2_cluster_connect_agnostic+0x3b/0x3b
[  858.583369]  [<ffffffff8134960d>] ? o2dlm_lock_ast_wrapper+0x11/0x11
[  858.583448]  [<ffffffff810c7aa9>] ? trace_hardirqs_on_caller+0x154/0x18b
[  858.583531]  [<ffffffff81349029>] ocfs2_dlm_lock+0x28/0x2a
[  858.583603]  [<ffffffff812ecef9>] ocfs2_lock_create+0xfa/0x270
[  858.583679]  [<ffffffff816e47be>] ? _raw_spin_unlock_irqrestore+0x40/0x4d
[  858.583763]  [<ffffffff810c7aa9>] ? trace_hardirqs_on_caller+0x154/0x18b
[  858.583846]  [<ffffffff812ed68d>] ocfs2_create_new_lock+0x7a/0x89
[  858.583924]  [<ffffffff812ed81d>] ocfs2_create_new_inode_locks+0x181/0x2ae
[  858.584010]  [<ffffffff813068fc>] __ocfs2_mknod_locked.isra.21+0x3b1/0x4d4
[  858.584095]  [<ffffffff81306b39>] ocfs2_mknod_locked.isra.22+0x11a/0x12d
[  858.584178]  [<ffffffff813072d2>] ocfs2_mknod+0x786/0xe09
[  858.584249]  [<ffffffff813079d2>] ocfs2_mkdir+0x7d/0x11f
[  858.584319]  [<ffffffff81162ba7>] vfs_mkdir+0x5d/0x9e
[  858.584386]  [<ffffffff811650f9>] SyS_mkdirat+0x8f/0xe2
[  858.584456]  [<ffffffff81165160>] SyS_mkdir+0x14/0x16
[  858.584524]  [<ffffffff816e5424>] system_call_fastpath+0x1c/0x21
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to