Hi Jim,

 From the log you provided, it seems that one node died.
If I remember correctly, you are using kernel-4.9 in which a bug resides 
causing cluster hang if a node dies.

You can refer to a fix in kernel mainline.

commit 1c01967116a678fed8e2c68a6ab82abc8effeddc
Author: Changwei Ge <ge.chang...@h3c.com>
Date:   Wed Nov 15 17:31:33 2017 -0800

     ocfs2: fix cluster hang after a node dies

     When a node dies, other live nodes have to choose a new master for an
     existed lock resource mastered by the dead node.

     As for ocfs2/dlm implementation, this is done by function -
     dlm_move_lockres_to_recovery_list which marks those lock rsources as
     DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
     changes lock resource's master later.

     So without invoking dlm_move_lockres_to_recovery_list, no master will be
     choosed after dlm recovery accomplishment since no lock resource can be
     found through ::resource list.

     What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
     resources mastered a dead node, it will break up synchronization among
     nodes.

     So invoke dlm_move_lockres_to_recovery_list again.

     Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres 
when recovery master goes down")'
     Link: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_63ADC13FD55D6546B7DECE290D39E373CED6E0F9-40H3CMLB14-2DEX.srv.huawei-2D3com.com&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=vzAgbXgcqHK6m5ELB3pMNcIZeK5kyuApN1DNfx2AbeI&e=
     Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
     Reported-by: Vitaly Mayatskih <v.mayats...@gmail.com>
     Tested-by: Vitaly Mayatskikh <v.mayats...@gmail.com>
     Cc: Mark Fasheh <mfas...@versity.com>
     Cc: Joel Becker <jl...@evilplan.org>
     Cc: Junxiao Bi <junxiao...@oracle.com>
     Cc: Joseph Qi <jiangqi...@gmail.com>
     Cc: <sta...@vger.kernel.org>
     Signed-off-by: Andrew Morton <a...@linux-foundation.org>
     Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 74407c6..ec8f758 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt 
*dlm, u8 dead_node)
                                         dlm_lockres_put(res);
                                         continue;
                                 }
+                               dlm_move_lockres_to_recovery_list(dlm, res);
                         } else if (res->owner == dlm->node_num) {
                                 dlm_free_dead_locks(dlm, res, dead_node);
                                 __dlm_lockres_calc_usage(dlm, res);



On 2018/1/6 6:31, Jim Okken wrote:
> hi again list,
> 
> we saw a very similar issue again today with access to the ocfs2 cluster. 
> please share any insight you might have with me on what might of happened
> (the cluster is 13 nodes large, cluster.conf is at the end of my email.)
> 
> This time I found this in /var/log/messages on node-103, the only node that 
> was heavily accessing the cluster overnight, it is from 4:40. I don't know 
> how to read these traces. Is it related to ocfs2? I see it mentioned in the 
> CPU 12 trace...
> 
> 2018-01-05T04:40:53.555125+00:00 node-103 kernel: [632449.967312] Modules 
> linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink 
> vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev 
> br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs 
> ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs 
> ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter 
> xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp 
> llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp 
> crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel 
> aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass 
> glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw 
> ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid 
> ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi 
> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov 
> async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
> multipath linear dm_round_robin ses enclosure scsi_transport_sas uas 
> usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel 
> scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua 
> dm_multipath
> 2018-01-05T04:40:53.555140+00:00 node-103 kernel: [632449.969786] CPU: 4 PID: 
> 28 Comm: migration/4 Not tainted 4.4.0-98-generic #121-Ubuntu
> 2018-01-05T04:40:53.555143+00:00 node-103 kernel: [632449.969916] Hardware 
> name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
> 2018-01-05T04:40:53.555145+00:00 node-103 kernel: [632449.970049] task: 
> ffff881038ab7000 ti: ffff881038b2c000 task.ti: ffff881038b2c000
> 2018-01-05T04:40:53.555146+00:00 node-103 kernel: [632449.970050] RIP: 
> 0010:[<ffffffff8112161c>]  [<ffffffff8112161c>] multi_cpu_stop+0x4c/0xe0
> 2018-01-05T04:40:53.555147+00:00 node-103 kernel: [632449.970320] RSP: 
> 0018:ffff881038b2fd98  EFLAGS: 00000246
> 2018-01-05T04:40:53.555149+00:00 node-103 kernel: [632449.970321] RAX: 
> ffffffff81a12200 RBX: 0000000000000001 RCX: 0000000000000000
> 2018-01-05T04:40:53.555171+00:00 node-103 kernel: [632449.970323] RDX: 
> 0000000000000001 RSI: 0000000000000286 RDI: ffff882036b2b6b0
> 2018-01-05T04:40:53.555175+00:00 node-103 kernel: [632449.970324] RBP: 
> ffff881038b2fdc0 R08: ffff881038b2c000 R09: 0000000000000000
> 2018-01-05T04:40:53.555177+00:00 node-103 kernel: [632449.970325] R10: 
> 0000000000000008 R11: ffff88102d2a1c00 R12: ffff882036b2b6b0
> 2018-01-05T04:40:53.555178+00:00 node-103 kernel: [632449.970327] R13: 
> 0000000000000286 R14: ffff882036b2b6d4 R15: ffff882036b2b600
> 2018-01-05T04:40:53.555180+00:00 node-103 kernel: [632449.970465] FS:  
> 0000000000000000(0000) GS:ffff88103f900000(0000) knlGS:0000000000000000
> 2018-01-05T04:40:53.555181+00:00 node-103 kernel: [632449.970467] CS:  0010 
> DS: 0000 ES: 0000 CR0: 0000000080050033
> 2018-01-05T04:40:53.555183+00:00 node-103 kernel: [632449.970604] CR2: 
> 00007f4d6a61c4f0 CR3: 0000000001e0a000 CR4: 00000000001426e0
> 2018-01-05T04:40:53.555185+00:00 node-103 kernel: [632449.970605] Stack:
> 2018-01-05T04:40:53.555187+00:00 node-103 kernel: [632449.970736]  
> ffff88103f90f368 ffff88103f90f360 ffffffff811215d0 ffff882036b2b6b0
> 2018-01-05T04:40:53.555189+00:00 node-103 kernel: [632449.970738]  
> ffff882036b2b6d8 ffff881038b2fe88 ffffffff81121900 ffff88103f90f370
> 2018-01-05T04:40:53.555191+00:00 node-103 kernel: [632449.970876]  
> ffff881038ab7000 ffff88103f916e00 ffff881038b2fe20 ffffffff810a9d6e
> 2018-01-05T04:40:53.555192+00:00 node-103 kernel: [632449.970878] Call Trace:
> 2018-01-05T04:40:53.555194+00:00 node-103 kernel: [632449.970881]  
> [<ffffffff811215d0>] ? cpu_stop_queue_work+0x80/0x80
> 2018-01-05T04:40:53.555196+00:00 node-103 kernel: [632449.970883]  
> [<ffffffff81121900>] cpu_stopper_thread+0xb0/0x140
> 2018-01-05T04:40:53.555198+00:00 node-103 kernel: [632449.970886]  
> [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T04:40:53.555200+00:00 node-103 kernel: [632449.971019]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
> 2018-01-05T04:40:53.555202+00:00 node-103 kernel: [632449.971023]  
> [<ffffffff810a3f20>] ? sort_range+0x30/0x30
> 2018-01-05T04:40:53.555203+00:00 node-103 kernel: [632449.971156]  
> [<ffffffff810a4025>] smpboot_thread_fn+0x105/0x160
> 2018-01-05T04:40:53.555206+00:00 node-103 kernel: [632449.971158]  
> [<ffffffff810a0c75>] kthread+0xe5/0x100
> 2018-01-05T04:40:53.555208+00:00 node-103 kernel: [632449.971159]  
> [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T04:40:53.555209+00:00 node-103 kernel: [632449.971162]  
> [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70
> 2018-01-05T04:40:53.555211+00:00 node-103 kernel: [632449.971295]  
> [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T04:40:53.555212+00:00 node-103 kernel: [632449.971296] Code: 00 00 
> 49 89 c5 48 8b 47 18 48 85 c0 0f 84 86 00 00 00 89 db 48 0f a3 18 19 db 85 db 
> 41 0f 95 c7 4d 8d 74 24 24 31 c9 31 d2 f3 90 <41> 8b 5c 24 20 39 da 74 1a 83 
> fb 02 74 49 83 fb 03 75 05 45 84
> 2018-01-05T04:40:53.658730+00:00 node-103 kernel: [632450.074720] Modules 
> linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink 
> vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev 
> br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs 
> ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs 
> ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter 
> xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp 
> llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp 
> crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel 
> aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass 
> glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw 
> ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid 
> ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi 
> nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov 
> async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
> multipath linear dm_round_robin ses enclosure scsi_transport_sas uas 
> usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel 
> scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua 
> dm_multipath
> 2018-01-05T04:40:53.658731+00:00 node-103 kernel: [632450.074776] CPU: 12 
> PID: 25399 Comm: qemu-system-x86 Tainted: G             L  4.4.0-98-generic 
> #121-Ubuntu
> 2018-01-05T04:40:53.658732+00:00 node-103 kernel: [632450.074777] Hardware 
> name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017
> 2018-01-05T04:40:53.658733+00:00 node-103 kernel: [632450.074778] task: 
> ffff8820376d8000 ti: ffff880073f40000 task.ti: ffff880073f40000
> 2018-01-05T04:40:53.658748+00:00 node-103 kernel: [632450.074779] RIP: 
> 0010:[<ffffffff810cb27c>]  [<ffffffff810cb27c>] 
> native_queued_spin_lock_slowpath+0x15c/0x170
> 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074785] RSP: 
> 0018:ffff88203f083c30  EFLAGS: 00000202
> 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074786] RAX: 
> 0000000000000101 RBX: ffff88201566ba30 RCX: 0000000000000001
> 2018-01-05T04:40:53.658763+00:00 node-103 kernel: [632450.074787] RDX: 
> 0000000000000101 RSI: 0000000000000001 RDI: ffff88201566ba2c
> 2018-01-05T04:40:53.658764+00:00 node-103 kernel: [632450.074788] RBP: 
> ffff88203f083c30 R08: 0000000000000101 R09: ffffffff811924a7
> 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074788] R10: 
> ffffea0080cff900 R11: 0000000000005600 R12: ffff88201566ba2c
> 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074789] R13: 
> 0000000000005600 R14: 0000000000a34000 R15: 0000000000005600
> 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074791] FS:  
> 00007fa12aa41c00(0000) GS:ffff88203f080000(0000) knlGS:0000000000000000
> 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074792] CS:  0010 
> DS: 0000 ES: 0000 CR0: 0000000080050033
> 2018-01-05T04:40:53.658767+00:00 node-103 kernel: [632450.074792] CR2: 
> 00007f5bc811f000 CR3: 000000203449b000 CR4: 00000000001426e0
> 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074793] Stack:
> 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074794]  
> ffff88203f083c40 ffffffff81844421 ffff88203f083c60 ffffffff81842535
> 2018-01-05T04:40:53.658769+00:00 node-103 kernel: [632450.074796]  
> ffff880fea63a000 ffff88201566baf0 ffff88203f083c70 ffffffff8184257b
> 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074797]  
> ffff88203f083ca0 ffffffffc08a258d ffff881f48984100 0000000000005600
> 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074799] Call Trace:
> 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074800]  <IRQ>
> 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074806]  
> [<ffffffff81844421>] _raw_spin_lock+0x21/0x30
> 2018-01-05T04:40:53.658772+00:00 node-103 kernel: [632450.074808]  
> [<ffffffff81842535>] __mutex_unlock_slowpath+0x25/0x50
> 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074810]  
> [<ffffffff8184257b>] mutex_unlock+0x1b/0x20
> 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074845]  
> [<ffffffffc08a258d>] ocfs2_dio_end_io+0x6d/0x80 [ocfs2]
> 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074849]  
> [<ffffffff8124e57c>] dio_complete+0x11c/0x1c0
> 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074850]  
> [<ffffffff8124e693>] dio_bio_end_aio+0x73/0x100
> 2018-01-05T04:40:53.658775+00:00 node-103 kernel: [632450.074853]  
> [<ffffffff813c3edf>] bio_endio+0x3f/0x60
> 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074856]  
> [<ffffffff813cb897>] blk_update_request+0x87/0x310
> 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074859]  
> [<ffffffff816bbd66>] end_clone_bio+0x46/0x70
> 2018-01-05T04:40:53.658777+00:00 node-103 kernel: [632450.074861]  
> [<ffffffff813c3edf>] bio_endio+0x3f/0x60
> 2018-01-05T04:40:53.658778+00:00 node-103 kernel: [632450.074862]  
> [<ffffffff813cb897>] blk_update_request+0x87/0x310
> 2018-01-05T04:40:53.658780+00:00 node-103 kernel: [632450.074866]  
> [<ffffffff815c52f3>] scsi_end_request+0x33/0x1d0
> 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074869]  
> [<ffffffff815c8a26>] scsi_io_completion+0x1b6/0x690
> 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074873]  
> [<ffffffff810beb46>] ? rebalance_domains+0x166/0x2d0
> 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074875]  
> [<ffffffff815bf64f>] scsi_finish_command+0xcf/0x120
> 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074877]  
> [<ffffffff815c81b4>] scsi_softirq_done+0x124/0x150
> 2018-01-05T04:40:53.658791+00:00 node-103 kernel: [632450.074880]  
> [<ffffffff813d3787>] blk_done_softirq+0x87/0xb0
> 2018-01-05T04:40:53.658802+00:00 node-103 kernel: [632450.074885]  
> [<ffffffff81085dc1>] __do_softirq+0x101/0x290
> 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074886]  
> [<ffffffff810860c3>] irq_exit+0xa3/0xb0
> 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074890]  
> [<ffffffff81050e93>] smp_call_function_single_interrupt+0x33/0x40
> 2018-01-05T04:40:53.658805+00:00 node-103 kernel: [632450.074892]  
> [<ffffffff81845ae2>] call_function_single_interrupt+0x82/0x90
> 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074893]  <EOI>
> 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074895]  
> [<ffffffff8184245a>] ? __mutex_lock_slowpath+0xaa/0x130
> 2018-01-05T04:40:53.658808+00:00 node-103 kernel: [632450.074908]  
> [<ffffffffc08b9099>] ? ocfs2_inode_unlock+0x119/0x120 [ocfs2]
> 2018-01-05T04:40:53.658809+00:00 node-103 kernel: [632450.074910]  
> [<ffffffff818424ff>] mutex_lock+0x1f/0x30
> 2018-01-05T04:40:53.658810+00:00 node-103 kernel: [632450.074922]  
> [<ffffffffc08c277a>] ocfs2_file_write_iter+0x95a/0xdf0 [ocfs2]
> 2018-01-05T04:40:53.658811+00:00 node-103 kernel: [632450.074926]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
> 2018-01-05T04:40:53.658812+00:00 node-103 kernel: [632450.074937]  
> [<ffffffffc08c1e20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
> 2018-01-05T04:40:53.658814+00:00 node-103 kernel: [632450.074941]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
> 2018-01-05T04:40:53.658815+00:00 node-103 kernel: [632450.074944]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
> 2018-01-05T04:40:53.658816+00:00 node-103 kernel: [632450.074945]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
> 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074947]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
> 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074949]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
> 2018-01-05T04:40:53.658818+00:00 node-103 kernel: [632450.074951]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 2018-01-05T04:40:53.658819+00:00 node-103 kernel: [632450.074952] Code: 01 48 
> 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 63 
> ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 
> 00 66 89 07 5d c3 0f 1f 40 00 0f

This traces seems strange to me. It may need more investigation.


> 
> 
> 
> Then later on as more nodes started to access the cluster, which is at 
> 6:00ish, I see messages like these on all the nodes in the cluster.
> 
> 
> 2018-01-05T6:04:35.720570+00:00 node-115 kernel: [248734.731852] nova-compute 
>    D ffff882036c77888     0  4986      1 0x00000000
> 2018-01-05T6:04:35.720572+00:00 node-115 kernel: [248734.731856]  
> ffff882036c77888 ffff88203f056e00 ffff882038ede200 ffff88102aca7000
> 2018-01-05T6:04:35.720576+00:00 node-115 kernel: [248734.731858]  
> ffff882036c78000 ffff882036c77a30 ffff882036c77a28 ffff88102aca7000
> 2018-01-05T6:04:35.720579+00:00 node-115 kernel: [248734.731860]  
> 0000000000000000 ffff882036c778a0 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:04:35.720581+00:00 node-115 kernel: [248734.731862] Call Trace:
> 2018-01-05T6:04:35.720583+00:00 node-115 kernel: [248734.731870]  
> [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:04:35.720584+00:00 node-115 kernel: [248734.731874]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:04:35.720586+00:00 node-115 kernel: [248734.731878]  
> [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T6:04:35.720589+00:00 node-115 kernel: [248734.731880]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
> 2018-01-05T6:04:35.720591+00:00 node-115 kernel: [248734.731882]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:04:35.720594+00:00 node-115 kernel: [248734.731885]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:04:35.720595+00:00 node-115 kernel: [248734.731932]  
> [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:04:35.720597+00:00 node-115 kernel: [248734.731945]  
> [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2]
> 2018-01-05T6:04:35.720613+00:00 node-115 kernel: [248734.731956]  
> [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
> 2018-01-05T6:04:35.720617+00:00 node-115 kernel: [248734.731969]  
> [<ffffffffc0784644>] ocfs2_lookup_lock_orphan_dir.constprop.28+0x74/0x160 
> [ocfs2]
> 2018-01-05T6:04:35.720619+00:00 node-115 kernel: [248734.731981]  
> [<ffffffffc0784782>] ocfs2_prepare_orphan_dir+0x52/0x270 [ocfs2]
> 2018-01-05T6:04:35.720621+00:00 node-115 kernel: [248734.731992]  
> [<ffffffffc07864a7>] ocfs2_rename+0x1027/0x1a30 [ocfs2]
> 2018-01-05T6:04:35.720622+00:00 node-115 kernel: [248734.732003]  
> [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2]
> 2018-01-05T6:04:35.720624+00:00 node-115 kernel: [248734.732027]  
> [<ffffffffc076a3b0>] ? ocfs2_inode_lock_full_nested+0x310/0x920 [ocfs2]
> 2018-01-05T6:04:35.720626+00:00 node-115 kernel: [248734.732050]  
> [<ffffffffc077bdff>] ? ocfs2_wait_for_recovery+0x2f/0xa0 [ocfs2]
> 2018-01-05T6:04:35.720629+00:00 node-115 kernel: [248734.732054]  
> [<ffffffff8121afd4>] ? inode_permission+0x14/0x50
> 2018-01-05T6:04:35.720632+00:00 node-115 kernel: [248734.732056]  
> [<ffffffff8121e451>] vfs_rename+0x991/0x9d0
> 2018-01-05T6:04:35.720634+00:00 node-115 kernel: [248734.732058]  
> [<ffffffff81222fbf>] SyS_rename+0x39f/0x3c0
> 2018-01-05T6:04:35.720667+00:00 node-115 kernel: [248734.732060]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 2018-01-05T6:04:35.720678+00:00 node-115 kernel: [248734.732097] 
> kworker/u80:0   D ffff881f2c337b68     0  6190      2 0x00000000
> 2018-01-05T6:04:35.720679+00:00 node-115 kernel: [248734.732111] Workqueue: 
> ocfs2_wq ocfs2_orphan_scan_work [ocfs2]
> 2018-01-05T6:04:35.720681+00:00 node-115 kernel: [248734.732112]  
> ffff881f2c337b68 ffff881f2c337b30 ffff882038ede200 ffff881f13488000
> 2018-01-05T6:04:35.720682+00:00 node-115 kernel: [248734.732114]  
> ffff881f2c338000 ffff881f2c337d10 ffff881f2c337d08 ffff881f13488000
> 2018-01-05T6:04:35.720686+00:00 node-115 kernel: [248734.732115]  
> 0000000000000000 ffff881f2c337b80 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:04:35.720688+00:00 node-115 kernel: [248734.732116] Call Trace:
> 2018-01-05T6:04:35.720691+00:00 node-115 kernel: [248734.732118]  
> [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:04:35.720693+00:00 node-115 kernel: [248734.732119]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:04:35.720694+00:00 node-115 kernel: [248734.732121]  
> [<ffffffff818441ee>] ? _raw_spin_unlock_bh+0x1e/0x20
> 2018-01-05T6:04:35.720696+00:00 node-115 kernel: [248734.732124]  
> [<ffffffff8171fd11>] ? release_sock+0x111/0x160
> 2018-01-05T6:04:35.720699+00:00 node-115 kernel: [248734.732125]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:04:35.720701+00:00 node-115 kernel: [248734.732127]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:04:35.720703+00:00 node-115 kernel: [248734.732138]  
> [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:04:35.720705+00:00 node-115 kernel: [248734.732140]  
> [<ffffffff810b5403>] ? update_curr+0xe3/0x160
> 2018-01-05T6:04:35.720706+00:00 node-115 kernel: [248734.732141]  
> [<ffffffff8171b5cd>] ? sock_recvmsg+0x3d/0x50
> 2018-01-05T6:04:35.720708+00:00 node-115 kernel: [248734.732151]  
> [<ffffffffc07698a5>] ocfs2_orphan_scan_lock+0x75/0xe0 [ocfs2]
> 2018-01-05T6:04:35.720711+00:00 node-115 kernel: [248734.732161]  
> [<ffffffffc077a60f>] ocfs2_orphan_scan_work+0x6f/0x2e0 [ocfs2]
> 2018-01-05T6:04:35.720714+00:00 node-115 kernel: [248734.732164]  
> [<ffffffff8109a635>] process_one_work+0x165/0x480
> 2018-01-05T6:04:35.720716+00:00 node-115 kernel: [248734.732165]  
> [<ffffffff8109a99b>] worker_thread+0x4b/0x4c0
> 2018-01-05T6:04:35.720717+00:00 node-115 kernel: [248734.732166]  
> [<ffffffff8109a950>] ? process_one_work+0x480/0x480
> 2018-01-05T6:04:35.720719+00:00 node-115 kernel: [248734.732168]  
> [<ffffffff810a0c75>] kthread+0xe5/0x100
> 2018-01-05T6:04:35.720720+00:00 node-115 kernel: [248734.732169]  
> [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T6:04:35.720724+00:00 node-115 kernel: [248734.732171]  
> [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70
> 2018-01-05T6:04:35.720728+00:00 node-115 kernel: [248734.732172]  
> [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
> 2018-01-05T6:10:35.720707+00:00 node-115 kernel: [249094.694942] 
> qemu-system-x86 D ffff881024e8b9d8     0  6663      1 0x00000000
> 2018-01-05T6:10:35.720709+00:00 node-115 kernel: [249094.694944]  
> ffff881024e8b9d8 0000000000000202 ffff882038f38000 ffff881022028000
> 2018-01-05T6:10:35.720711+00:00 node-115 kernel: [249094.694946]  
> ffff881024e8c000 ffff881024e8bb80 ffff881024e8bb78 ffff881022028000
> 2018-01-05T6:10:35.720712+00:00 node-115 kernel: [249094.694948]  
> 0000000000000000 ffff881024e8b9f0 ffffffff81840585 7fffffffffffffff
> 2018-01-05T6:10:35.720714+00:00 node-115 kernel: [249094.694949] Call Trace:
> 2018-01-05T6:10:35.720717+00:00 node-115 kernel: [249094.694951]  
> [<ffffffff81840585>] schedule+0x35/0x80
> 2018-01-05T6:10:35.720719+00:00 node-115 kernel: [249094.694953]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
> 2018-01-05T6:10:35.720721+00:00 node-115 kernel: [249094.694955]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
> 2018-01-05T6:10:35.720722+00:00 node-115 kernel: [249094.694957]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
> 2018-01-05T6:10:35.720724+00:00 node-115 kernel: [249094.694985]  
> [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
> 2018-01-05T6:10:35.720726+00:00 node-115 kernel: [249094.694986]  
> [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220
> 2018-01-05T6:10:35.720728+00:00 node-115 kernel: [249094.694998]  
> [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
> 2018-01-05T6:10:35.720731+00:00 node-115 kernel: [249094.695003]  
> [<ffffffff813986d2>] ? aa_file_perm+0x142/0x3c0
> 2018-01-05T6:10:35.720732+00:00 node-115 kernel: [249094.695015]  
> [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720733+00:00 node-115 kernel: [249094.695026]  
> [<ffffffffc076aa7a>] ocfs2_inode_lock_atime+0x3a/0x190 [ocfs2]
> 2018-01-05T6:10:35.720735+00:00 node-115 kernel: [249094.695037]  
> [<ffffffffc0769521>] ? ocfs2_rw_lock+0xa1/0x170 [ocfs2]
> 2018-01-05T6:10:35.720737+00:00 node-115 kernel: [249094.695048]  
> [<ffffffffc076ef5c>] ocfs2_file_read_iter+0x6c/0x330 [ocfs2]
> 2018-01-05T6:10:35.720740+00:00 node-115 kernel: [249094.695059]  
> [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720742+00:00 node-115 kernel: [249094.695070]  
> [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2]
> 2018-01-05T6:10:35.720744+00:00 node-115 kernel: [249094.695073]  
> [<ffffffff812612b0>] aio_run_iocb+0x130/0x2d0
> 2018-01-05T6:10:35.720748+00:00 node-115 kernel: [249094.695077]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
> 2018-01-05T6:10:35.720750+00:00 node-115 kernel: [249094.695079]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
> 2018-01-05T6:10:35.720781+00:00 node-115 kernel: [249094.695080]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
> 2018-01-05T6:10:35.720784+00:00 node-115 kernel: [249094.695082]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> rebooted node 103 (from above) at 6:37
> 2018-01-05T6:37:37.525550+00:00 node-115 kernel: [250716.332150] o2net: 
> Connection to node node-103 (num 1) at 10.20.243.43:7777 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=>
>  has been idle for 30.62 secs.
> 2018-01-05T6:38:07.604427+00:00 node-115 kernel: [250746.409068] o2net: 
> Connection to node node-103 (num 1) at 10.20.243.43:7777 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=>
>  has been idle for 30.80 secs.
> 2018-01-05T6:38:10.088603+00:00 node-115 kernel: [250748.893160] o2net: No 
> longer connected to node node-103 (num 1) at 10.20.243.43:7777 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=>
> 2018-01-05T6:38:10.088616+00:00 node-115 kernel: [250748.893192] o2cb: o2dlm 
> has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:10.561008+00:00 node-115 kernel: [250749.367653] o2cb: o2dlm 
> has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:11.096451+00:00 node-115 kernel: [250749.900777] o2dlm: 
> Waiting on the recovery of node 1 in domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:14.881250+00:00 node-115 kernel: [250753.684410] o2dlm: Begin 
> recovery on domain 83022C092E5E4625BD58E3C20E4E5D92 for node 1
> 2018-01-05T6:38:14.881655+00:00 node-115 kernel: [250753.684414] o2dlm: Node 
> 2 (he) is the Recovery Master for the dead node 1 in domain 
> 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:14.881658+00:00 node-115 kernel: [250753.684415] o2dlm: End 
> recovery on domain 83022C092E5E4625BD58E3C20E4E5D92
> 2018-01-05T6:38:16.585255+00:00 node-115 kernel: [250755.391444] ocfs2: Begin 
> replay journal (node 1, slot 10) on device (252,0)
> 2018-01-05T6:38:19.460438+00:00 node-115 kernel: [250758.266976] ocfs2: End 
> replay journal (node 1, slot 10) on device (252,0)
> 2018-01-05T6:38:19.489132+00:00 node-115 kernel: [250758.295509] ocfs2: 
> Beginning quota recovery on device (252,0) for slot 10
> 
> 
> 
> cluster:
>          node_count = 13
>          name = MSA
> 
> node:
>          number = 1
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.43
>          name = node-103
> 
> node:
>          number = 2
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.71
>          name = node-104
> 
> node:
>          number = 3
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.41
>          name = node-113
> 
> node:
>          number = 4
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.44
>          name = node-114
> 
> node:
>          number = 5
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.45
>          name = node-115
> 
> node:
>          number = 6
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.46
>          name = node-116
> 
> node:
>          number = 7
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.73
>          name = node-120
> 
> node:
>          number = 8
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.70
>          name = node-99
> 
> node:
>          number = 9
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.66
>          name = node-122
> 
> node:
>          number = 10
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.68
>          name = node-123
> 
> node:
>          number = 11
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.69
>          name = node-124
> 
> node:
>          number = 12
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.76
>          name = node-125
> 
> node:
>          number = 13
>          cluster = MSA
>          ip_port = 7777
>          ip_address = 10.20.243.67
>          name = node-126
> 
> 
> -- Jim
> 
> On Tue, Jan 2, 2018 at 4:57 PM, Jim Okken <j...@jokken.com 
> <mailto:j...@jokken.com>> wrote:
> 
>     I just wanted to resend my last update to this thread in case it got lost 
> during the holiday weekend, Happy New Year everyone!
> 
>         thanks for your reply Changwei,
> 
>         no I can't say that any of the nodes lost power or rebooted. It isn't 
> impossible, but when I assessed the situation none of the nodes where down.
>         there is other stuck stacks as well yes.
> 
>         sorry for the long email but below I have pasted what I believe is 
> logs from the original "stuck stack" 3-4 days before the "ls" stuck stack 
> pasted in my original email.
>         This happened on node-103, the node that was at that point modifying 
> for the file(s) in the directory I was later ls-ing on. qemu is the 
> underlying KVM hypervior openstack is using.
> 
> 
>         My ocfs2 filesystem and openstack environment is back up after I 
> rebooted all the nodes and the storage device. Even the files in that 
> troubled directory are fine. (this isn't a production environment, only a 
> testing environment, still important but not crucial, crucial.
> 
>         Please let me know any observations or comments. Also please let me 
> know if this occurs again how to easiest resolve and stabilize the ocfs2 
> (rebooting node-103 did not seem to fix anything).
> 
>         Also, I am new the the concept of fencing, is ocfs2 fenced 
> sufficiently by default, or should I have set up some other mechanism....?
> 
>         thanks
> 
>         2017-12-17T23:53:42.511398+00:00 node-103 kernel: [974474.883386] 
> qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:53:42.511399+00:00 node-103 kernel: [974474.883390]  
> ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:53:42.511408+00:00 node-103 kernel: [974474.883392]  
> ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883393]  
> 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883395] 
> Call Trace:
>         2017-12-17T23:53:42.511411+00:00 node-103 kernel: [974474.883403]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883407]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883411]  
> [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:53:42.511443+00:00 node-103 kernel: [974474.883416]  
> [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:53:42.511444+00:00 node-103 kernel: [974474.883418]  
> [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:53:42.511445+00:00 node-103 kernel: [974474.883420]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883421]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883466]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:53:42.511447+00:00 node-103 kernel: [974474.883469]  
> [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883482]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883494]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:53:42.511454+00:00 node-103 kernel: [974474.883505]  
> [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883508]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883511]  
> [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:53:42.511456+00:00 node-103 kernel: [974474.883522]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:53:42.511462+00:00 node-103 kernel: [974474.883525]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:53:42.511463+00:00 node-103 kernel: [974474.883528]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883529]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883530]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:53:42.511482+00:00 node-103 kernel: [974474.883532]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:53:42.511490+00:00 node-103 kernel: [974474.883534]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883545] 
> qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883547]  
> ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:53:42.511502+00:00 node-103 kernel: [974474.883549]  
> ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883550]  
> 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883552] 
> Call Trace:
>         2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883554]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883555]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:53:42.511505+00:00 node-103 kernel: [974474.883557]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:53:42.511511+00:00 node-103 kernel: [974474.883559]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:53:42.511512+00:00 node-103 kernel: [974474.883560]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883573]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883595]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883605]  
> [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883620]  
> [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:53:42.511520+00:00 node-103 kernel: [974474.883623]  
> [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:53:42.511521+00:00 node-103 kernel: [974474.883625]  
> [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883636]  
> [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883638]  
> [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:53:42.511523+00:00 node-103 kernel: [974474.883640]  
> [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:53:42.511524+00:00 node-103 kernel: [974474.883641]  
> [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:53:42.511534+00:00 node-103 kernel: [974474.883642]  
> [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:53:42.511549+00:00 node-103 kernel: [974474.883644]  
> [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:53:42.511551+00:00 node-103 kernel: [974474.883645]  
> [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883647]  
> [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883649]  
> [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:53:42.511557+00:00 node-103 kernel: [974474.883651]  
> [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:53:42.511558+00:00 node-103 kernel: [974474.883653]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:55:42.511102+00:00 node-103 kernel: [974594.892385] 
> qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:55:42.511103+00:00 node-103 kernel: [974594.892388]  
> ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:55:42.511121+00:00 node-103 kernel: [974594.892390]  
> ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:55:42.511123+00:00 node-103 kernel: [974594.892391]  
> 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:55:42.511124+00:00 node-103 kernel: [974594.892393] 
> Call Trace:
>         2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892399]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892402]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:55:42.511126+00:00 node-103 kernel: [974594.892406]  
> [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:55:42.511127+00:00 node-103 kernel: [974594.892409]  
> [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:55:42.511128+00:00 node-103 kernel: [974594.892411]  
> [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:55:42.511129+00:00 node-103 kernel: [974594.892413]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:55:42.511130+00:00 node-103 kernel: [974594.892414]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892448]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892451]  
> [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:55:42.511133+00:00 node-103 kernel: [974594.892463]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:55:42.511134+00:00 node-103 kernel: [974594.892475]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:55:42.511135+00:00 node-103 kernel: [974594.892486]  
> [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892490]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892493]  
> [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:55:42.511137+00:00 node-103 kernel: [974594.892504]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:55:42.511139+00:00 node-103 kernel: [974594.892507]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:55:42.511140+00:00 node-103 kernel: [974594.892510]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:55:42.511141+00:00 node-103 kernel: [974594.892511]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:55:42.511142+00:00 node-103 kernel: [974594.892513]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:55:42.511158+00:00 node-103 kernel: [974594.892515]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:55:42.511160+00:00 node-103 kernel: [974594.892517]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892527] 
> qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892529]  
> ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:55:42.511165+00:00 node-103 kernel: [974594.892530]  
> ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:55:42.511166+00:00 node-103 kernel: [974594.892532]  
> 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892533] 
> Call Trace:
>         2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892535]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892537]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892538]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:55:42.511170+00:00 node-103 kernel: [974594.892540]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:55:42.511171+00:00 node-103 kernel: [974594.892542]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:55:42.511172+00:00 node-103 kernel: [974594.892553]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:55:42.511173+00:00 node-103 kernel: [974594.892565]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892576]  
> [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892592]  
> [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:55:42.511176+00:00 node-103 kernel: [974594.892594]  
> [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:55:42.511177+00:00 node-103 kernel: [974594.892596]  
> [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:55:42.511178+00:00 node-103 kernel: [974594.892608]  
> [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892610]  
> [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892612]  
> [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:55:42.511180+00:00 node-103 kernel: [974594.892613]  
> [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:55:42.511181+00:00 node-103 kernel: [974594.892615]  
> [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:55:42.511183+00:00 node-103 kernel: [974594.892616]  
> [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:55:42.511184+00:00 node-103 kernel: [974594.892618]  
> [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:55:42.511187+00:00 node-103 kernel: [974594.892620]  
> [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892622]  
> [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892624]  
> [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:55:42.511197+00:00 node-103 kernel: [974594.892626]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:57:42.511168+00:00 node-103 kernel: [974714.901454] 
> qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:57:42.511169+00:00 node-103 kernel: [974714.901457]  
> ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:57:42.511170+00:00 node-103 kernel: [974714.901459]  
> ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:57:42.511183+00:00 node-103 kernel: [974714.901461]  
> 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901463] 
> Call Trace:
>         2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901470]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901473]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901477]  
> [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:57:42.511188+00:00 node-103 kernel: [974714.901481]  
> [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:57:42.511189+00:00 node-103 kernel: [974714.901482]  
> [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:57:42.511190+00:00 node-103 kernel: [974714.901484]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:57:42.511197+00:00 node-103 kernel: [974714.901486]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:57:42.511198+00:00 node-103 kernel: [974714.901527]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:57:42.511199+00:00 node-103 kernel: [974714.901530]  
> [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:57:42.511201+00:00 node-103 kernel: [974714.901543]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:57:42.511202+00:00 node-103 kernel: [974714.901555]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:57:42.511203+00:00 node-103 kernel: [974714.901566]  
> [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901569]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901572]  
> [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:57:42.511205+00:00 node-103 kernel: [974714.901583]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:57:42.511207+00:00 node-103 kernel: [974714.901587]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:57:42.511208+00:00 node-103 kernel: [974714.901590]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:57:42.511209+00:00 node-103 kernel: [974714.901591]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:57:42.511210+00:00 node-103 kernel: [974714.901593]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:57:42.511227+00:00 node-103 kernel: [974714.901595]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:57:42.511229+00:00 node-103 kernel: [974714.901598]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901609] 
> qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901610]  
> ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:57:42.511235+00:00 node-103 kernel: [974714.901612]  
> ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:57:42.511236+00:00 node-103 kernel: [974714.901613]  
> 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:57:42.511237+00:00 node-103 kernel: [974714.901615] 
> Call Trace:
>         2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901617]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901618]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:57:42.511239+00:00 node-103 kernel: [974714.901620]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:57:42.511240+00:00 node-103 kernel: [974714.901622]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:57:42.511242+00:00 node-103 kernel: [974714.901623]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901636]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901648]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901659]  
> [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901685]  
> [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:57:42.511246+00:00 node-103 kernel: [974714.901687]  
> [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:57:42.511247+00:00 node-103 kernel: [974714.901690]  
> [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:57:42.511248+00:00 node-103 kernel: [974714.901701]  
> [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901703]  
> [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901704]  
> [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:57:42.511250+00:00 node-103 kernel: [974714.901706]  
> [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:57:42.511252+00:00 node-103 kernel: [974714.901707]  
> [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:57:42.511253+00:00 node-103 kernel: [974714.901708]  
> [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:57:42.511254+00:00 node-103 kernel: [974714.901710]  
> [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901712]  
> [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901714]  
> [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:57:42.511258+00:00 node-103 kernel: [974714.901715]  
> [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:57:42.511260+00:00 node-103 kernel: [974714.901717]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910524] 
> qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910528]  
> ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-17T23:59:42.511081+00:00 node-103 kernel: [974834.910529]  
> ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-17T23:59:42.511083+00:00 node-103 kernel: [974834.910531]  
> 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:59:42.511084+00:00 node-103 kernel: [974834.910533] 
> Call Trace:
>         2017-12-17T23:59:42.511085+00:00 node-103 kernel: [974834.910540]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910543]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910547]  
> [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-17T23:59:42.511087+00:00 node-103 kernel: [974834.910551]  
> [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-17T23:59:42.511089+00:00 node-103 kernel: [974834.910553]  
> [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-17T23:59:42.511090+00:00 node-103 kernel: [974834.910555]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910557]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910594]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:59:42.511092+00:00 node-103 kernel: [974834.910596]  
> [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-17T23:59:42.511093+00:00 node-103 kernel: [974834.910609]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:59:42.511095+00:00 node-103 kernel: [974834.910633]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910644]  
> [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910647]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-17T23:59:42.511097+00:00 node-103 kernel: [974834.910649]  
> [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-17T23:59:42.511098+00:00 node-103 kernel: [974834.910660]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-17T23:59:42.511129+00:00 node-103 kernel: [974834.910663]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-17T23:59:42.511133+00:00 node-103 kernel: [974834.910665]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-17T23:59:42.511135+00:00 node-103 kernel: [974834.910666]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-17T23:59:42.511137+00:00 node-103 kernel: [974834.910668]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-17T23:59:42.511154+00:00 node-103 kernel: [974834.910670]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-17T23:59:42.511156+00:00 node-103 kernel: [974834.910672]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-17T23:59:42.511161+00:00 node-103 kernel: [974834.910686] 
> qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-17T23:59:42.511162+00:00 node-103 kernel: [974834.910688]  
> ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-17T23:59:42.511163+00:00 node-103 kernel: [974834.910689]  
> ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-17T23:59:42.511164+00:00 node-103 kernel: [974834.910691]  
> 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-17T23:59:42.511165+00:00 node-103 kernel: [974834.910692] 
> Call Trace:
>         2017-12-17T23:59:42.511166+00:00 node-103 kernel: [974834.910694]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910696]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910697]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-17T23:59:42.511168+00:00 node-103 kernel: [974834.910699]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-17T23:59:42.511170+00:00 node-103 kernel: [974834.910700]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-17T23:59:42.511171+00:00 node-103 kernel: [974834.910712]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910722]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910733]  
> [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-17T23:59:42.511173+00:00 node-103 kernel: [974834.910748]  
> [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-17T23:59:42.511174+00:00 node-103 kernel: [974834.910751]  
> [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-17T23:59:42.511176+00:00 node-103 kernel: [974834.910753]  
> [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-17T23:59:42.511177+00:00 node-103 kernel: [974834.910777]  
> [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-17T23:59:42.511178+00:00 node-103 kernel: [974834.910778]  
> [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910780]  
> [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910782]  
> [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-17T23:59:42.511180+00:00 node-103 kernel: [974834.910783]  
> [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-17T23:59:42.511182+00:00 node-103 kernel: [974834.910785]  
> [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-17T23:59:42.511183+00:00 node-103 kernel: [974834.910786]  
> [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-17T23:59:42.511185+00:00 node-103 kernel: [974834.910789]  
> [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-17T23:59:42.511186+00:00 node-103 kernel: [974834.910791]  
> [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-17T23:59:42.511187+00:00 node-103 kernel: [974834.910793]  
> [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-17T23:59:42.511188+00:00 node-103 kernel: [974834.910795]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-18T00:00:01.271777+00:00 node-103 kernel: [974853.675776] 
> Process accounting resumed
>         2017-12-18T00:01:42.511127+00:00 node-103 kernel: [974954.919618] 
> qemu-system-x86 D ffff880ef621b9c8     0 26593      1 0x00000000
>         2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919621]  
> ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00
>         2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919623]  
> ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00
>         2017-12-18T00:01:42.511130+00:00 node-103 kernel: [974954.919625]  
> 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff
>         2017-12-18T00:01:42.511131+00:00 node-103 kernel: [974954.919627] 
> Call Trace:
>         2017-12-18T00:01:42.511132+00:00 node-103 kernel: [974954.919634]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-18T00:01:42.511133+00:00 node-103 kernel: [974954.919638]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919643]  
> [<ffffffff810ac642>] ? default_wake_function+0x12/0x20
>         2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919647]  
> [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40
>         2017-12-18T00:01:42.511136+00:00 node-103 kernel: [974954.919649]  
> [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90
>         2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919651]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919653]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919702]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919705]  
> [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0
>         2017-12-18T00:01:42.511141+00:00 node-103 kernel: [974954.919719]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-18T00:01:42.511142+00:00 node-103 kernel: [974954.919732]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-18T00:01:42.511143+00:00 node-103 kernel: [974954.919744]  
> [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2]
>         2017-12-18T00:01:42.511144+00:00 node-103 kernel: [974954.919746]  
> [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140
>         2017-12-18T00:01:42.511145+00:00 node-103 kernel: [974954.919749]  
> [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0
>         2017-12-18T00:01:42.511176+00:00 node-103 kernel: [974954.919761]  
> [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
>         2017-12-18T00:01:42.511181+00:00 node-103 kernel: [974954.919764]  
> [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0
>         2017-12-18T00:01:42.511182+00:00 node-103 kernel: [974954.919766]  
> [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60
>         2017-12-18T00:01:42.511184+00:00 node-103 kernel: [974954.919767]  
> [<ffffffff8122e933>] ? __fdget+0x13/0x20
>         2017-12-18T00:01:42.511185+00:00 node-103 kernel: [974954.919769]  
> [<ffffffff812622cf>] do_io_submit+0x25f/0x500
>         2017-12-18T00:01:42.511203+00:00 node-103 kernel: [974954.919771]  
> [<ffffffff81262580>] SyS_io_submit+0x10/0x20
>         2017-12-18T00:01:42.511205+00:00 node-103 kernel: [974954.919773]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
>         2017-12-18T00:01:42.511209+00:00 node-103 kernel: [974954.919786] 
> qemu-img        D ffff880f19ec7948     0 40743   5019 0x00000000
>         2017-12-18T00:01:42.511210+00:00 node-103 kernel: [974954.919788]  
> ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00
>         2017-12-18T00:01:42.511211+00:00 node-103 kernel: [974954.919789]  
> ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00
>         2017-12-18T00:01:42.511212+00:00 node-103 kernel: [974954.919791]  
> 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff
>         2017-12-18T00:01:42.511213+00:00 node-103 kernel: [974954.919792] 
> Call Trace:
>         2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919794]  
> [<ffffffff81840585>] schedule+0x35/0x80
>         2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919795]  
> [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270
>         2017-12-18T00:01:42.511216+00:00 node-103 kernel: [974954.919797]  
> [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30
>         2017-12-18T00:01:42.511217+00:00 node-103 kernel: [974954.919799]  
> [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140
>         2017-12-18T00:01:42.511218+00:00 node-103 kernel: [974954.919801]  
> [<ffffffff810ac630>] ? wake_up_q+0x70/0x70
>         2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919826]  
> [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]
>         2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919838]  
> [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]
>         2017-12-18T00:01:42.511221+00:00 node-103 kernel: [974954.919850]  
> [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2]
>         2017-12-18T00:01:42.511222+00:00 node-103 kernel: [974954.919866]  
> [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2]
>         2017-12-18T00:01:42.511223+00:00 node-103 kernel: [974954.919869]  
> [<ffffffff812730f1>] get_acl+0x41/0x60
>         2017-12-18T00:01:42.511224+00:00 node-103 kernel: [974954.919872]  
> [<ffffffff8121aeab>] generic_permission+0x13b/0x190
>         2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919895]  
> [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2]
>         2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919897]  
> [<ffffffff8121af77>] __inode_permission+0x77/0xc0
>         2017-12-18T00:01:42.511227+00:00 node-103 kernel: [974954.919898]  
> [<ffffffff8121afd4>] inode_permission+0x14/0x50
>         2017-12-18T00:01:42.511228+00:00 node-103 kernel: [974954.919900]  
> [<ffffffff8121b0fb>] may_open+0x5b/0xf0
>         2017-12-18T00:01:42.511229+00:00 node-103 kernel: [974954.919901]  
> [<ffffffff8121efe8>] path_openat+0x188/0x1330
>         2017-12-18T00:01:42.511231+00:00 node-103 kernel: [974954.919903]  
> [<ffffffff81221381>] do_filp_open+0x91/0x100
>         2017-12-18T00:01:42.511232+00:00 node-103 kernel: [974954.919904]  
> [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190
>         2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919907]  
> [<ffffffff8120f738>] do_sys_open+0x138/0x2a0
>         2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919909]  
> [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
>         2017-12-18T00:01:42.511236+00:00 node-103 kernel: [974954.919910]  
> [<ffffffff8120f8be>] SyS_open+0x1e/0x20
>         2017-12-18T00:01:42.511238+00:00 node-103 kernel: [974954.919912]  
> [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71
> 
> 
>         -- Jim
> 
>         On Wed, Dec 27, 2017 at 8:03 PM, Changwei Ge <ge.chang...@h3c.com 
> <mailto:ge.chang...@h3c.com>> wrote:
> 
>             On 2017/12/28 3:02, Jim Okken wrote:
>              > Peter,
>              >
>              > I did not want to flood my first email with details and make 
> it 3 pages long. i gladly will provide more details. first I'd like to ask 
> that you be less condescending. You have no idea the journey I took toward 
> using ocfs2 in this environment, and also the requirements I needed to meet.
>              > you were amazed and astonished by my question, and I was 
> amazed and astonished by your answer.
>              >
>              > let's start over:
>              > if ocfs2 isnt the right solution for what I'm doing I can 
> admit that, and move off of it.
>              > if OpenStack and perhaps newer kernels do not necessarily work 
> with ocfs2 I can admit that too, and move off of it.
>              > I had high hopes it was the right solution, and at first it 
> did the job.
>              >
>              > I have a healthy HP MSA 2040 storage appliance connected to 
> via fiber channel. It has a 7TB storage volume on a fiber channel LUN. From 
> what I know I need a shared storage filesystem so each of my client systems, 
> also on the fiber channel network, can access this storage simultaneously 
> with corrupting data (I need file locking). This HP MSA is healthy and 
> stable. This isn't exactly local storage I know, but each client system sees 
> this MSA storage volume as a local drive, ie: /dev/sdb
>              >
>              > what could cause a "lost" wakeup from the OCFS2 lock manager?
> 
>             Hi Jim,
>             Did a node crash or lose power supply before the stuck stack was 
> found?
>             And is the stuck stack the only one you can find in your kernel 
> log?
> 
>             Thanks,
>             Changwei
> 
>              >
>              > Ubuntu has ocfs2 packages in it's repos. So I hope it has some 
> level of support in it's OSs and distributed kernels...
>              > I am not well versed in storage concepts but i'll surprise 
> you, and today my employer (who signs my paycheck) asks me, and tasks me, 
> with making this storage solution work better.
>              >
>              > please let me know if I can provide more details. please let 
> me know any further comments
>              >
>              > thanks!
>              >
>              > -- Jim
>              >
>              > On Wed, Dec 27, 2017 at 1:16 PM, Peter Grandi 
> <p...@ocfs.list.sabi.co.uk <mailto:p...@ocfs.list.sabi.co.uk> 
> <mailto:p...@ocfs.list.sabi.co.uk <mailto:p...@ocfs.list.sabi.co.uk>>> wrote:
>              >
>              >      > I have a ocfs2 filesystem setup as a shared filesystem 
> between
>              >      > 12 openstack compute nodes which are Ubuntu 16.04.3.
>              >
>              >     I am amazed by how unconstrained are the imaginations of 
> some
>              >     other people. That is a truly astonishing setup.
>              >
>              >      > I have a very big concern of stability.  A month ago I 
> lost a
>              >      > good deal of files, I don't know the real reason, but 
> things
>              >      > seemed to point to the ofcs2 cluster.
>              >
>              >     That also seems to me unconstrained by concern about mere
>              >     details.
>              >
>              >      > Last week I found many of my compute nodes with the nova
>              >      > service down. The node which went down first has a 
> "stuck"
>              >      > file/directory in the ocfs2 filesystem [ ... ]
>              >
>              >     The stack trace seems to point at a "lost" wakeup from the 
> OCFS2
>              >     lock manager.
>              >
>              >      > I have other openstack compute nodes that are identical 
> except
>              >      > they use local storage and do not use ocfs2 and these 
> have
>              >      > always been stable.
>              >
>              >     But OCFS2 is meant to work with local physical storage on a
>              >     local phyical machine. What's your current setup?
>              >
>              >      > maybe ocfs2 just isn't stable on Ubuntu 16.04.3? I am 
> using
>              >      > version 1.6.4-3.1
>              >
>              >     OCFS2 has been extremely stable for many years on very 
> high load
>              >     share-disk clusters for many users. OpenStack and perhaps 
> newer
>              >     kernels not necessarily so.
>              >
>              >     Also OCSF2 requires a storage subsystem with specific 
> features
>              >     and a high degree of reliable operation. It is astonishing 
> but
>              >     fairly typical that this reports contains no mention of the
>              >     setup or of the state of the storage subsystem.
>              >
>              >     _______________________________________________
>              >     Ocfs2-users mailing list
>              > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> 
> <mailto:Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>>
>              > https://oss.oracle.com/mailman/listinfo/ocfs2-users 
> <https://oss.oracle.com/mailman/listinfo/ocfs2-users> 
> <https://oss.oracle.com/mailman/listinfo/ocfs2-users 
> <https://oss.oracle.com/mailman/listinfo/ocfs2-users>>
>              >
>              >
> 
> 
> 
> 


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to