Hi Jim, From the log you provided, it seems that one node died. If I remember correctly, you are using kernel-4.9 in which a bug resides causing cluster hang if a node dies.
You can refer to a fix in kernel mainline. commit 1c01967116a678fed8e2c68a6ab82abc8effeddc Author: Changwei Ge <ge.chang...@h3c.com> Date: Wed Nov 15 17:31:33 2017 -0800 ocfs2: fix cluster hang after a node dies When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node. As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later. So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list. What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes. So invoke dlm_move_lockres_to_recovery_list again. Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_63ADC13FD55D6546B7DECE290D39E373CED6E0F9-40H3CMLB14-2DEX.srv.huawei-2D3com.com&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=vzAgbXgcqHK6m5ELB3pMNcIZeK5kyuApN1DNfx2AbeI&e= Signed-off-by: Changwei Ge <ge.chang...@h3c.com> Reported-by: Vitaly Mayatskih <v.mayats...@gmail.com> Tested-by: Vitaly Mayatskikh <v.mayats...@gmail.com> Cc: Mark Fasheh <mfas...@versity.com> Cc: Joel Becker <jl...@evilplan.org> Cc: Junxiao Bi <junxiao...@oracle.com> Cc: Joseph Qi <jiangqi...@gmail.com> Cc: <sta...@vger.kernel.org> Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 74407c6..ec8f758 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); On 2018/1/6 6:31, Jim Okken wrote: > hi again list, > > we saw a very similar issue again today with access to the ocfs2 cluster. > please share any insight you might have with me on what might of happened > (the cluster is 13 nodes large, cluster.conf is at the end of my email.) > > This time I found this in /var/log/messages on node-103, the only node that > was heavily accessing the cluster overnight, it is from 4:40. I don't know > how to read these traces. Is it related to ocfs2? I see it mentioned in the > CPU 12 trace... > > 2018-01-05T04:40:53.555125+00:00 node-103 kernel: [632449.967312] Modules > linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink > vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev > br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs > ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs > ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter > xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp > llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp > crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel > aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass > glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw > ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid > ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi > nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 > nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 > multipath linear dm_round_robin ses enclosure scsi_transport_sas uas > usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel > scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua > dm_multipath > 2018-01-05T04:40:53.555140+00:00 node-103 kernel: [632449.969786] CPU: 4 PID: > 28 Comm: migration/4 Not tainted 4.4.0-98-generic #121-Ubuntu > 2018-01-05T04:40:53.555143+00:00 node-103 kernel: [632449.969916] Hardware > name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017 > 2018-01-05T04:40:53.555145+00:00 node-103 kernel: [632449.970049] task: > ffff881038ab7000 ti: ffff881038b2c000 task.ti: ffff881038b2c000 > 2018-01-05T04:40:53.555146+00:00 node-103 kernel: [632449.970050] RIP: > 0010:[<ffffffff8112161c>] [<ffffffff8112161c>] multi_cpu_stop+0x4c/0xe0 > 2018-01-05T04:40:53.555147+00:00 node-103 kernel: [632449.970320] RSP: > 0018:ffff881038b2fd98 EFLAGS: 00000246 > 2018-01-05T04:40:53.555149+00:00 node-103 kernel: [632449.970321] RAX: > ffffffff81a12200 RBX: 0000000000000001 RCX: 0000000000000000 > 2018-01-05T04:40:53.555171+00:00 node-103 kernel: [632449.970323] RDX: > 0000000000000001 RSI: 0000000000000286 RDI: ffff882036b2b6b0 > 2018-01-05T04:40:53.555175+00:00 node-103 kernel: [632449.970324] RBP: > ffff881038b2fdc0 R08: ffff881038b2c000 R09: 0000000000000000 > 2018-01-05T04:40:53.555177+00:00 node-103 kernel: [632449.970325] R10: > 0000000000000008 R11: ffff88102d2a1c00 R12: ffff882036b2b6b0 > 2018-01-05T04:40:53.555178+00:00 node-103 kernel: [632449.970327] R13: > 0000000000000286 R14: ffff882036b2b6d4 R15: ffff882036b2b600 > 2018-01-05T04:40:53.555180+00:00 node-103 kernel: [632449.970465] FS: > 0000000000000000(0000) GS:ffff88103f900000(0000) knlGS:0000000000000000 > 2018-01-05T04:40:53.555181+00:00 node-103 kernel: [632449.970467] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 > 2018-01-05T04:40:53.555183+00:00 node-103 kernel: [632449.970604] CR2: > 00007f4d6a61c4f0 CR3: 0000000001e0a000 CR4: 00000000001426e0 > 2018-01-05T04:40:53.555185+00:00 node-103 kernel: [632449.970605] Stack: > 2018-01-05T04:40:53.555187+00:00 node-103 kernel: [632449.970736] > ffff88103f90f368 ffff88103f90f360 ffffffff811215d0 ffff882036b2b6b0 > 2018-01-05T04:40:53.555189+00:00 node-103 kernel: [632449.970738] > ffff882036b2b6d8 ffff881038b2fe88 ffffffff81121900 ffff88103f90f370 > 2018-01-05T04:40:53.555191+00:00 node-103 kernel: [632449.970876] > ffff881038ab7000 ffff88103f916e00 ffff881038b2fe20 ffffffff810a9d6e > 2018-01-05T04:40:53.555192+00:00 node-103 kernel: [632449.970878] Call Trace: > 2018-01-05T04:40:53.555194+00:00 node-103 kernel: [632449.970881] > [<ffffffff811215d0>] ? cpu_stop_queue_work+0x80/0x80 > 2018-01-05T04:40:53.555196+00:00 node-103 kernel: [632449.970883] > [<ffffffff81121900>] cpu_stopper_thread+0xb0/0x140 > 2018-01-05T04:40:53.555198+00:00 node-103 kernel: [632449.970886] > [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220 > 2018-01-05T04:40:53.555200+00:00 node-103 kernel: [632449.971019] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2018-01-05T04:40:53.555202+00:00 node-103 kernel: [632449.971023] > [<ffffffff810a3f20>] ? sort_range+0x30/0x30 > 2018-01-05T04:40:53.555203+00:00 node-103 kernel: [632449.971156] > [<ffffffff810a4025>] smpboot_thread_fn+0x105/0x160 > 2018-01-05T04:40:53.555206+00:00 node-103 kernel: [632449.971158] > [<ffffffff810a0c75>] kthread+0xe5/0x100 > 2018-01-05T04:40:53.555208+00:00 node-103 kernel: [632449.971159] > [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > 2018-01-05T04:40:53.555209+00:00 node-103 kernel: [632449.971162] > [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70 > 2018-01-05T04:40:53.555211+00:00 node-103 kernel: [632449.971295] > [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > 2018-01-05T04:40:53.555212+00:00 node-103 kernel: [632449.971296] Code: 00 00 > 49 89 c5 48 8b 47 18 48 85 c0 0f 84 86 00 00 00 89 db 48 0f a3 18 19 db 85 db > 41 0f 95 c7 4d 8d 74 24 24 31 c9 31 d2 f3 90 <41> 8b 5c 24 20 39 da 74 1a 83 > fb 02 74 49 83 fb 03 75 05 45 84 > 2018-01-05T04:40:53.658730+00:00 node-103 kernel: [632450.074720] Modules > linked in: nf_conntrack_netlink xt_set ip_set_hash_net ip_set nfnetlink > vhost_net vhost macvtap macvlan veth ip6table_raw xt_mac xt_tcpudp xt_physdev > br_netfilter ebtable_filter ebtables openvswitch ocfs2 quota_tree ocfs2_dlmfs > ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs > ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter > xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp > llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp > crct10dif_pclmul kvm_intel ipmi_ssif crc32_pclmul kvm ghash_clmulni_intel > aesni_intel aes_x86_64 joydev hpilo input_leds lrw gf128mul irqbypass > glue_helper ablk_helper cryptd ioatdma 8250_fintek sb_edac shpchp serio_raw > ipmi_si edac_core acpi_power_meter ipmi_msghandler lpc_ich dca mac_hid > ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi > nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 > nf_defrag_ipv4 nf_conntrack autofs4 btrfs raid10 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 > multipath linear dm_round_robin ses enclosure scsi_transport_sas uas > usb_storage hid_generic usbhid hid psmouse lpfc be2net vxlan ip6_udp_tunnel > scsi_transport_fc udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua > dm_multipath > 2018-01-05T04:40:53.658731+00:00 node-103 kernel: [632450.074776] CPU: 12 > PID: 25399 Comm: qemu-system-x86 Tainted: G L 4.4.0-98-generic > #121-Ubuntu > 2018-01-05T04:40:53.658732+00:00 node-103 kernel: [632450.074777] Hardware > name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017 > 2018-01-05T04:40:53.658733+00:00 node-103 kernel: [632450.074778] task: > ffff8820376d8000 ti: ffff880073f40000 task.ti: ffff880073f40000 > 2018-01-05T04:40:53.658748+00:00 node-103 kernel: [632450.074779] RIP: > 0010:[<ffffffff810cb27c>] [<ffffffff810cb27c>] > native_queued_spin_lock_slowpath+0x15c/0x170 > 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074785] RSP: > 0018:ffff88203f083c30 EFLAGS: 00000202 > 2018-01-05T04:40:53.658750+00:00 node-103 kernel: [632450.074786] RAX: > 0000000000000101 RBX: ffff88201566ba30 RCX: 0000000000000001 > 2018-01-05T04:40:53.658763+00:00 node-103 kernel: [632450.074787] RDX: > 0000000000000101 RSI: 0000000000000001 RDI: ffff88201566ba2c > 2018-01-05T04:40:53.658764+00:00 node-103 kernel: [632450.074788] RBP: > ffff88203f083c30 R08: 0000000000000101 R09: ffffffff811924a7 > 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074788] R10: > ffffea0080cff900 R11: 0000000000005600 R12: ffff88201566ba2c > 2018-01-05T04:40:53.658765+00:00 node-103 kernel: [632450.074789] R13: > 0000000000005600 R14: 0000000000a34000 R15: 0000000000005600 > 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074791] FS: > 00007fa12aa41c00(0000) GS:ffff88203f080000(0000) knlGS:0000000000000000 > 2018-01-05T04:40:53.658766+00:00 node-103 kernel: [632450.074792] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 > 2018-01-05T04:40:53.658767+00:00 node-103 kernel: [632450.074792] CR2: > 00007f5bc811f000 CR3: 000000203449b000 CR4: 00000000001426e0 > 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074793] Stack: > 2018-01-05T04:40:53.658768+00:00 node-103 kernel: [632450.074794] > ffff88203f083c40 ffffffff81844421 ffff88203f083c60 ffffffff81842535 > 2018-01-05T04:40:53.658769+00:00 node-103 kernel: [632450.074796] > ffff880fea63a000 ffff88201566baf0 ffff88203f083c70 ffffffff8184257b > 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074797] > ffff88203f083ca0 ffffffffc08a258d ffff881f48984100 0000000000005600 > 2018-01-05T04:40:53.658770+00:00 node-103 kernel: [632450.074799] Call Trace: > 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074800] <IRQ> > 2018-01-05T04:40:53.658771+00:00 node-103 kernel: [632450.074806] > [<ffffffff81844421>] _raw_spin_lock+0x21/0x30 > 2018-01-05T04:40:53.658772+00:00 node-103 kernel: [632450.074808] > [<ffffffff81842535>] __mutex_unlock_slowpath+0x25/0x50 > 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074810] > [<ffffffff8184257b>] mutex_unlock+0x1b/0x20 > 2018-01-05T04:40:53.658773+00:00 node-103 kernel: [632450.074845] > [<ffffffffc08a258d>] ocfs2_dio_end_io+0x6d/0x80 [ocfs2] > 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074849] > [<ffffffff8124e57c>] dio_complete+0x11c/0x1c0 > 2018-01-05T04:40:53.658774+00:00 node-103 kernel: [632450.074850] > [<ffffffff8124e693>] dio_bio_end_aio+0x73/0x100 > 2018-01-05T04:40:53.658775+00:00 node-103 kernel: [632450.074853] > [<ffffffff813c3edf>] bio_endio+0x3f/0x60 > 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074856] > [<ffffffff813cb897>] blk_update_request+0x87/0x310 > 2018-01-05T04:40:53.658776+00:00 node-103 kernel: [632450.074859] > [<ffffffff816bbd66>] end_clone_bio+0x46/0x70 > 2018-01-05T04:40:53.658777+00:00 node-103 kernel: [632450.074861] > [<ffffffff813c3edf>] bio_endio+0x3f/0x60 > 2018-01-05T04:40:53.658778+00:00 node-103 kernel: [632450.074862] > [<ffffffff813cb897>] blk_update_request+0x87/0x310 > 2018-01-05T04:40:53.658780+00:00 node-103 kernel: [632450.074866] > [<ffffffff815c52f3>] scsi_end_request+0x33/0x1d0 > 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074869] > [<ffffffff815c8a26>] scsi_io_completion+0x1b6/0x690 > 2018-01-05T04:40:53.658782+00:00 node-103 kernel: [632450.074873] > [<ffffffff810beb46>] ? rebalance_domains+0x166/0x2d0 > 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074875] > [<ffffffff815bf64f>] scsi_finish_command+0xcf/0x120 > 2018-01-05T04:40:53.658783+00:00 node-103 kernel: [632450.074877] > [<ffffffff815c81b4>] scsi_softirq_done+0x124/0x150 > 2018-01-05T04:40:53.658791+00:00 node-103 kernel: [632450.074880] > [<ffffffff813d3787>] blk_done_softirq+0x87/0xb0 > 2018-01-05T04:40:53.658802+00:00 node-103 kernel: [632450.074885] > [<ffffffff81085dc1>] __do_softirq+0x101/0x290 > 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074886] > [<ffffffff810860c3>] irq_exit+0xa3/0xb0 > 2018-01-05T04:40:53.658804+00:00 node-103 kernel: [632450.074890] > [<ffffffff81050e93>] smp_call_function_single_interrupt+0x33/0x40 > 2018-01-05T04:40:53.658805+00:00 node-103 kernel: [632450.074892] > [<ffffffff81845ae2>] call_function_single_interrupt+0x82/0x90 > 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074893] <EOI> > 2018-01-05T04:40:53.658806+00:00 node-103 kernel: [632450.074895] > [<ffffffff8184245a>] ? __mutex_lock_slowpath+0xaa/0x130 > 2018-01-05T04:40:53.658808+00:00 node-103 kernel: [632450.074908] > [<ffffffffc08b9099>] ? ocfs2_inode_unlock+0x119/0x120 [ocfs2] > 2018-01-05T04:40:53.658809+00:00 node-103 kernel: [632450.074910] > [<ffffffff818424ff>] mutex_lock+0x1f/0x30 > 2018-01-05T04:40:53.658810+00:00 node-103 kernel: [632450.074922] > [<ffffffffc08c277a>] ocfs2_file_write_iter+0x95a/0xdf0 [ocfs2] > 2018-01-05T04:40:53.658811+00:00 node-103 kernel: [632450.074926] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2018-01-05T04:40:53.658812+00:00 node-103 kernel: [632450.074937] > [<ffffffffc08c1e20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2018-01-05T04:40:53.658814+00:00 node-103 kernel: [632450.074941] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2018-01-05T04:40:53.658815+00:00 node-103 kernel: [632450.074944] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2018-01-05T04:40:53.658816+00:00 node-103 kernel: [632450.074945] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074947] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2018-01-05T04:40:53.658817+00:00 node-103 kernel: [632450.074949] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2018-01-05T04:40:53.658818+00:00 node-103 kernel: [632450.074951] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2018-01-05T04:40:53.658819+00:00 node-103 kernel: [632450.074952] Code: 01 48 > 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00 e9 63 > ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 > 00 66 89 07 5d c3 0f 1f 40 00 0f This traces seems strange to me. It may need more investigation. > > > > Then later on as more nodes started to access the cluster, which is at > 6:00ish, I see messages like these on all the nodes in the cluster. > > > 2018-01-05T6:04:35.720570+00:00 node-115 kernel: [248734.731852] nova-compute > D ffff882036c77888 0 4986 1 0x00000000 > 2018-01-05T6:04:35.720572+00:00 node-115 kernel: [248734.731856] > ffff882036c77888 ffff88203f056e00 ffff882038ede200 ffff88102aca7000 > 2018-01-05T6:04:35.720576+00:00 node-115 kernel: [248734.731858] > ffff882036c78000 ffff882036c77a30 ffff882036c77a28 ffff88102aca7000 > 2018-01-05T6:04:35.720579+00:00 node-115 kernel: [248734.731860] > 0000000000000000 ffff882036c778a0 ffffffff81840585 7fffffffffffffff > 2018-01-05T6:04:35.720581+00:00 node-115 kernel: [248734.731862] Call Trace: > 2018-01-05T6:04:35.720583+00:00 node-115 kernel: [248734.731870] > [<ffffffff81840585>] schedule+0x35/0x80 > 2018-01-05T6:04:35.720584+00:00 node-115 kernel: [248734.731874] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2018-01-05T6:04:35.720586+00:00 node-115 kernel: [248734.731878] > [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220 > 2018-01-05T6:04:35.720589+00:00 node-115 kernel: [248734.731880] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2018-01-05T6:04:35.720591+00:00 node-115 kernel: [248734.731882] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2018-01-05T6:04:35.720594+00:00 node-115 kernel: [248734.731885] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2018-01-05T6:04:35.720595+00:00 node-115 kernel: [248734.731932] > [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2018-01-05T6:04:35.720597+00:00 node-115 kernel: [248734.731945] > [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2] > 2018-01-05T6:04:35.720613+00:00 node-115 kernel: [248734.731956] > [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2018-01-05T6:04:35.720617+00:00 node-115 kernel: [248734.731969] > [<ffffffffc0784644>] ocfs2_lookup_lock_orphan_dir.constprop.28+0x74/0x160 > [ocfs2] > 2018-01-05T6:04:35.720619+00:00 node-115 kernel: [248734.731981] > [<ffffffffc0784782>] ocfs2_prepare_orphan_dir+0x52/0x270 [ocfs2] > 2018-01-05T6:04:35.720621+00:00 node-115 kernel: [248734.731992] > [<ffffffffc07864a7>] ocfs2_rename+0x1027/0x1a30 [ocfs2] > 2018-01-05T6:04:35.720622+00:00 node-115 kernel: [248734.732003] > [<ffffffffc07692fa>] ? __ocfs2_cluster_lock.isra.34+0x5ca/0x750 [ocfs2] > 2018-01-05T6:04:35.720624+00:00 node-115 kernel: [248734.732027] > [<ffffffffc076a3b0>] ? ocfs2_inode_lock_full_nested+0x310/0x920 [ocfs2] > 2018-01-05T6:04:35.720626+00:00 node-115 kernel: [248734.732050] > [<ffffffffc077bdff>] ? ocfs2_wait_for_recovery+0x2f/0xa0 [ocfs2] > 2018-01-05T6:04:35.720629+00:00 node-115 kernel: [248734.732054] > [<ffffffff8121afd4>] ? inode_permission+0x14/0x50 > 2018-01-05T6:04:35.720632+00:00 node-115 kernel: [248734.732056] > [<ffffffff8121e451>] vfs_rename+0x991/0x9d0 > 2018-01-05T6:04:35.720634+00:00 node-115 kernel: [248734.732058] > [<ffffffff81222fbf>] SyS_rename+0x39f/0x3c0 > 2018-01-05T6:04:35.720667+00:00 node-115 kernel: [248734.732060] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2018-01-05T6:04:35.720678+00:00 node-115 kernel: [248734.732097] > kworker/u80:0 D ffff881f2c337b68 0 6190 2 0x00000000 > 2018-01-05T6:04:35.720679+00:00 node-115 kernel: [248734.732111] Workqueue: > ocfs2_wq ocfs2_orphan_scan_work [ocfs2] > 2018-01-05T6:04:35.720681+00:00 node-115 kernel: [248734.732112] > ffff881f2c337b68 ffff881f2c337b30 ffff882038ede200 ffff881f13488000 > 2018-01-05T6:04:35.720682+00:00 node-115 kernel: [248734.732114] > ffff881f2c338000 ffff881f2c337d10 ffff881f2c337d08 ffff881f13488000 > 2018-01-05T6:04:35.720686+00:00 node-115 kernel: [248734.732115] > 0000000000000000 ffff881f2c337b80 ffffffff81840585 7fffffffffffffff > 2018-01-05T6:04:35.720688+00:00 node-115 kernel: [248734.732116] Call Trace: > 2018-01-05T6:04:35.720691+00:00 node-115 kernel: [248734.732118] > [<ffffffff81840585>] schedule+0x35/0x80 > 2018-01-05T6:04:35.720693+00:00 node-115 kernel: [248734.732119] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2018-01-05T6:04:35.720694+00:00 node-115 kernel: [248734.732121] > [<ffffffff818441ee>] ? _raw_spin_unlock_bh+0x1e/0x20 > 2018-01-05T6:04:35.720696+00:00 node-115 kernel: [248734.732124] > [<ffffffff8171fd11>] ? release_sock+0x111/0x160 > 2018-01-05T6:04:35.720699+00:00 node-115 kernel: [248734.732125] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2018-01-05T6:04:35.720701+00:00 node-115 kernel: [248734.732127] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2018-01-05T6:04:35.720703+00:00 node-115 kernel: [248734.732138] > [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2018-01-05T6:04:35.720705+00:00 node-115 kernel: [248734.732140] > [<ffffffff810b5403>] ? update_curr+0xe3/0x160 > 2018-01-05T6:04:35.720706+00:00 node-115 kernel: [248734.732141] > [<ffffffff8171b5cd>] ? sock_recvmsg+0x3d/0x50 > 2018-01-05T6:04:35.720708+00:00 node-115 kernel: [248734.732151] > [<ffffffffc07698a5>] ocfs2_orphan_scan_lock+0x75/0xe0 [ocfs2] > 2018-01-05T6:04:35.720711+00:00 node-115 kernel: [248734.732161] > [<ffffffffc077a60f>] ocfs2_orphan_scan_work+0x6f/0x2e0 [ocfs2] > 2018-01-05T6:04:35.720714+00:00 node-115 kernel: [248734.732164] > [<ffffffff8109a635>] process_one_work+0x165/0x480 > 2018-01-05T6:04:35.720716+00:00 node-115 kernel: [248734.732165] > [<ffffffff8109a99b>] worker_thread+0x4b/0x4c0 > 2018-01-05T6:04:35.720717+00:00 node-115 kernel: [248734.732166] > [<ffffffff8109a950>] ? process_one_work+0x480/0x480 > 2018-01-05T6:04:35.720719+00:00 node-115 kernel: [248734.732168] > [<ffffffff810a0c75>] kthread+0xe5/0x100 > 2018-01-05T6:04:35.720720+00:00 node-115 kernel: [248734.732169] > [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > 2018-01-05T6:04:35.720724+00:00 node-115 kernel: [248734.732171] > [<ffffffff81844a4f>] ret_from_fork+0x3f/0x70 > 2018-01-05T6:04:35.720728+00:00 node-115 kernel: [248734.732172] > [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > 2018-01-05T6:10:35.720707+00:00 node-115 kernel: [249094.694942] > qemu-system-x86 D ffff881024e8b9d8 0 6663 1 0x00000000 > 2018-01-05T6:10:35.720709+00:00 node-115 kernel: [249094.694944] > ffff881024e8b9d8 0000000000000202 ffff882038f38000 ffff881022028000 > 2018-01-05T6:10:35.720711+00:00 node-115 kernel: [249094.694946] > ffff881024e8c000 ffff881024e8bb80 ffff881024e8bb78 ffff881022028000 > 2018-01-05T6:10:35.720712+00:00 node-115 kernel: [249094.694948] > 0000000000000000 ffff881024e8b9f0 ffffffff81840585 7fffffffffffffff > 2018-01-05T6:10:35.720714+00:00 node-115 kernel: [249094.694949] Call Trace: > 2018-01-05T6:10:35.720717+00:00 node-115 kernel: [249094.694951] > [<ffffffff81840585>] schedule+0x35/0x80 > 2018-01-05T6:10:35.720719+00:00 node-115 kernel: [249094.694953] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2018-01-05T6:10:35.720721+00:00 node-115 kernel: [249094.694955] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2018-01-05T6:10:35.720722+00:00 node-115 kernel: [249094.694957] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2018-01-05T6:10:35.720724+00:00 node-115 kernel: [249094.694985] > [<ffffffffc0769145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2018-01-05T6:10:35.720726+00:00 node-115 kernel: [249094.694986] > [<ffffffff810a9d6e>] ? finish_task_switch+0x17e/0x220 > 2018-01-05T6:10:35.720728+00:00 node-115 kernel: [249094.694998] > [<ffffffffc076a20a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2018-01-05T6:10:35.720731+00:00 node-115 kernel: [249094.695003] > [<ffffffff813986d2>] ? aa_file_perm+0x142/0x3c0 > 2018-01-05T6:10:35.720732+00:00 node-115 kernel: [249094.695015] > [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2] > 2018-01-05T6:10:35.720733+00:00 node-115 kernel: [249094.695026] > [<ffffffffc076aa7a>] ocfs2_inode_lock_atime+0x3a/0x190 [ocfs2] > 2018-01-05T6:10:35.720735+00:00 node-115 kernel: [249094.695037] > [<ffffffffc0769521>] ? ocfs2_rw_lock+0xa1/0x170 [ocfs2] > 2018-01-05T6:10:35.720737+00:00 node-115 kernel: [249094.695048] > [<ffffffffc076ef5c>] ocfs2_file_read_iter+0x6c/0x330 [ocfs2] > 2018-01-05T6:10:35.720740+00:00 node-115 kernel: [249094.695059] > [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2] > 2018-01-05T6:10:35.720742+00:00 node-115 kernel: [249094.695070] > [<ffffffffc076eef0>] ? ocfs2_dir_open+0x20/0x20 [ocfs2] > 2018-01-05T6:10:35.720744+00:00 node-115 kernel: [249094.695073] > [<ffffffff812612b0>] aio_run_iocb+0x130/0x2d0 > 2018-01-05T6:10:35.720748+00:00 node-115 kernel: [249094.695077] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2018-01-05T6:10:35.720750+00:00 node-115 kernel: [249094.695079] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2018-01-05T6:10:35.720781+00:00 node-115 kernel: [249094.695080] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2018-01-05T6:10:35.720784+00:00 node-115 kernel: [249094.695082] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > rebooted node 103 (from above) at 6:37 > 2018-01-05T6:37:37.525550+00:00 node-115 kernel: [250716.332150] o2net: > Connection to node node-103 (num 1) at 10.20.243.43:7777 > <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=> > has been idle for 30.62 secs. > 2018-01-05T6:38:07.604427+00:00 node-115 kernel: [250746.409068] o2net: > Connection to node node-103 (num 1) at 10.20.243.43:7777 > <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=> > has been idle for 30.80 secs. > 2018-01-05T6:38:10.088603+00:00 node-115 kernel: [250748.893160] o2net: No > longer connected to node node-103 (num 1) at 10.20.243.43:7777 > <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.20.243.43-3A7777&d=DwIFAw&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=e3CB48EdNDKvfPstYCghaFCr0joVuNH1TI6s1nZMU1U&s=2Y5xN7u8THJC3Ja65-lq3nvqaCxOvPpdAAkgZO3fRT4&e=> > 2018-01-05T6:38:10.088616+00:00 node-115 kernel: [250748.893192] o2cb: o2dlm > has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92 > 2018-01-05T6:38:10.561008+00:00 node-115 kernel: [250749.367653] o2cb: o2dlm > has evicted node 1 from domain 83022C092E5E4625BD58E3C20E4E5D92 > 2018-01-05T6:38:11.096451+00:00 node-115 kernel: [250749.900777] o2dlm: > Waiting on the recovery of node 1 in domain 83022C092E5E4625BD58E3C20E4E5D92 > 2018-01-05T6:38:14.881250+00:00 node-115 kernel: [250753.684410] o2dlm: Begin > recovery on domain 83022C092E5E4625BD58E3C20E4E5D92 for node 1 > 2018-01-05T6:38:14.881655+00:00 node-115 kernel: [250753.684414] o2dlm: Node > 2 (he) is the Recovery Master for the dead node 1 in domain > 83022C092E5E4625BD58E3C20E4E5D92 > 2018-01-05T6:38:14.881658+00:00 node-115 kernel: [250753.684415] o2dlm: End > recovery on domain 83022C092E5E4625BD58E3C20E4E5D92 > 2018-01-05T6:38:16.585255+00:00 node-115 kernel: [250755.391444] ocfs2: Begin > replay journal (node 1, slot 10) on device (252,0) > 2018-01-05T6:38:19.460438+00:00 node-115 kernel: [250758.266976] ocfs2: End > replay journal (node 1, slot 10) on device (252,0) > 2018-01-05T6:38:19.489132+00:00 node-115 kernel: [250758.295509] ocfs2: > Beginning quota recovery on device (252,0) for slot 10 > > > > cluster: > node_count = 13 > name = MSA > > node: > number = 1 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.43 > name = node-103 > > node: > number = 2 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.71 > name = node-104 > > node: > number = 3 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.41 > name = node-113 > > node: > number = 4 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.44 > name = node-114 > > node: > number = 5 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.45 > name = node-115 > > node: > number = 6 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.46 > name = node-116 > > node: > number = 7 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.73 > name = node-120 > > node: > number = 8 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.70 > name = node-99 > > node: > number = 9 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.66 > name = node-122 > > node: > number = 10 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.68 > name = node-123 > > node: > number = 11 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.69 > name = node-124 > > node: > number = 12 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.76 > name = node-125 > > node: > number = 13 > cluster = MSA > ip_port = 7777 > ip_address = 10.20.243.67 > name = node-126 > > > -- Jim > > On Tue, Jan 2, 2018 at 4:57 PM, Jim Okken <j...@jokken.com > <mailto:j...@jokken.com>> wrote: > > I just wanted to resend my last update to this thread in case it got lost > during the holiday weekend, Happy New Year everyone! > > thanks for your reply Changwei, > > no I can't say that any of the nodes lost power or rebooted. It isn't > impossible, but when I assessed the situation none of the nodes where down. > there is other stuck stacks as well yes. > > sorry for the long email but below I have pasted what I believe is > logs from the original "stuck stack" 3-4 days before the "ls" stuck stack > pasted in my original email. > This happened on node-103, the node that was at that point modifying > for the file(s) in the directory I was later ls-ing on. qemu is the > underlying KVM hypervior openstack is using. > > > My ocfs2 filesystem and openstack environment is back up after I > rebooted all the nodes and the storage device. Even the files in that > troubled directory are fine. (this isn't a production environment, only a > testing environment, still important but not crucial, crucial. > > Please let me know any observations or comments. Also please let me > know if this occurs again how to easiest resolve and stabilize the ocfs2 > (rebooting node-103 did not seem to fix anything). > > Also, I am new the the concept of fencing, is ocfs2 fenced > sufficiently by default, or should I have set up some other mechanism....? > > thanks > > 2017-12-17T23:53:42.511398+00:00 node-103 kernel: [974474.883386] > qemu-system-x86 D ffff880ef621b9c8 0 26593 1 0x00000000 > 2017-12-17T23:53:42.511399+00:00 node-103 kernel: [974474.883390] > ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00 > 2017-12-17T23:53:42.511408+00:00 node-103 kernel: [974474.883392] > ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00 > 2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883393] > 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:53:42.511410+00:00 node-103 kernel: [974474.883395] > Call Trace: > 2017-12-17T23:53:42.511411+00:00 node-103 kernel: [974474.883403] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883407] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:53:42.511412+00:00 node-103 kernel: [974474.883411] > [<ffffffff810ac642>] ? default_wake_function+0x12/0x20 > 2017-12-17T23:53:42.511443+00:00 node-103 kernel: [974474.883416] > [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40 > 2017-12-17T23:53:42.511444+00:00 node-103 kernel: [974474.883418] > [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90 > 2017-12-17T23:53:42.511445+00:00 node-103 kernel: [974474.883420] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883421] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:53:42.511446+00:00 node-103 kernel: [974474.883466] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:53:42.511447+00:00 node-103 kernel: [974474.883469] > [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0 > 2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883482] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:53:42.511453+00:00 node-103 kernel: [974474.883494] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:53:42.511454+00:00 node-103 kernel: [974474.883505] > [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2] > 2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883508] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2017-12-17T23:53:42.511455+00:00 node-103 kernel: [974474.883511] > [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0 > 2017-12-17T23:53:42.511456+00:00 node-103 kernel: [974474.883522] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:53:42.511462+00:00 node-103 kernel: [974474.883525] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2017-12-17T23:53:42.511463+00:00 node-103 kernel: [974474.883528] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883529] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2017-12-17T23:53:42.511464+00:00 node-103 kernel: [974474.883530] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2017-12-17T23:53:42.511482+00:00 node-103 kernel: [974474.883532] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2017-12-17T23:53:42.511490+00:00 node-103 kernel: [974474.883534] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883545] > qemu-img D ffff880f19ec7948 0 40743 5019 0x00000000 > 2017-12-17T23:53:42.511495+00:00 node-103 kernel: [974474.883547] > ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00 > 2017-12-17T23:53:42.511502+00:00 node-103 kernel: [974474.883549] > ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00 > 2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883550] > 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:53:42.511503+00:00 node-103 kernel: [974474.883552] > Call Trace: > 2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883554] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:53:42.511504+00:00 node-103 kernel: [974474.883555] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:53:42.511505+00:00 node-103 kernel: [974474.883557] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2017-12-17T23:53:42.511511+00:00 node-103 kernel: [974474.883559] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:53:42.511512+00:00 node-103 kernel: [974474.883560] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883573] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:53:42.511513+00:00 node-103 kernel: [974474.883595] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883605] > [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2] > 2017-12-17T23:53:42.511514+00:00 node-103 kernel: [974474.883620] > [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2] > 2017-12-17T23:53:42.511520+00:00 node-103 kernel: [974474.883623] > [<ffffffff812730f1>] get_acl+0x41/0x60 > 2017-12-17T23:53:42.511521+00:00 node-103 kernel: [974474.883625] > [<ffffffff8121aeab>] generic_permission+0x13b/0x190 > 2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883636] > [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2] > 2017-12-17T23:53:42.511522+00:00 node-103 kernel: [974474.883638] > [<ffffffff8121af77>] __inode_permission+0x77/0xc0 > 2017-12-17T23:53:42.511523+00:00 node-103 kernel: [974474.883640] > [<ffffffff8121afd4>] inode_permission+0x14/0x50 > 2017-12-17T23:53:42.511524+00:00 node-103 kernel: [974474.883641] > [<ffffffff8121b0fb>] may_open+0x5b/0xf0 > 2017-12-17T23:53:42.511534+00:00 node-103 kernel: [974474.883642] > [<ffffffff8121efe8>] path_openat+0x188/0x1330 > 2017-12-17T23:53:42.511549+00:00 node-103 kernel: [974474.883644] > [<ffffffff81221381>] do_filp_open+0x91/0x100 > 2017-12-17T23:53:42.511551+00:00 node-103 kernel: [974474.883645] > [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190 > 2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883647] > [<ffffffff8120f738>] do_sys_open+0x138/0x2a0 > 2017-12-17T23:53:42.511556+00:00 node-103 kernel: [974474.883649] > [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > 2017-12-17T23:53:42.511557+00:00 node-103 kernel: [974474.883651] > [<ffffffff8120f8be>] SyS_open+0x1e/0x20 > 2017-12-17T23:53:42.511558+00:00 node-103 kernel: [974474.883653] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:55:42.511102+00:00 node-103 kernel: [974594.892385] > qemu-system-x86 D ffff880ef621b9c8 0 26593 1 0x00000000 > 2017-12-17T23:55:42.511103+00:00 node-103 kernel: [974594.892388] > ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00 > 2017-12-17T23:55:42.511121+00:00 node-103 kernel: [974594.892390] > ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00 > 2017-12-17T23:55:42.511123+00:00 node-103 kernel: [974594.892391] > 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:55:42.511124+00:00 node-103 kernel: [974594.892393] > Call Trace: > 2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892399] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:55:42.511125+00:00 node-103 kernel: [974594.892402] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:55:42.511126+00:00 node-103 kernel: [974594.892406] > [<ffffffff810ac642>] ? default_wake_function+0x12/0x20 > 2017-12-17T23:55:42.511127+00:00 node-103 kernel: [974594.892409] > [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40 > 2017-12-17T23:55:42.511128+00:00 node-103 kernel: [974594.892411] > [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90 > 2017-12-17T23:55:42.511129+00:00 node-103 kernel: [974594.892413] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:55:42.511130+00:00 node-103 kernel: [974594.892414] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892448] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:55:42.511131+00:00 node-103 kernel: [974594.892451] > [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0 > 2017-12-17T23:55:42.511133+00:00 node-103 kernel: [974594.892463] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:55:42.511134+00:00 node-103 kernel: [974594.892475] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:55:42.511135+00:00 node-103 kernel: [974594.892486] > [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2] > 2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892490] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2017-12-17T23:55:42.511136+00:00 node-103 kernel: [974594.892493] > [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0 > 2017-12-17T23:55:42.511137+00:00 node-103 kernel: [974594.892504] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:55:42.511139+00:00 node-103 kernel: [974594.892507] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2017-12-17T23:55:42.511140+00:00 node-103 kernel: [974594.892510] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2017-12-17T23:55:42.511141+00:00 node-103 kernel: [974594.892511] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2017-12-17T23:55:42.511142+00:00 node-103 kernel: [974594.892513] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2017-12-17T23:55:42.511158+00:00 node-103 kernel: [974594.892515] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2017-12-17T23:55:42.511160+00:00 node-103 kernel: [974594.892517] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892527] > qemu-img D ffff880f19ec7948 0 40743 5019 0x00000000 > 2017-12-17T23:55:42.511163+00:00 node-103 kernel: [974594.892529] > ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00 > 2017-12-17T23:55:42.511165+00:00 node-103 kernel: [974594.892530] > ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00 > 2017-12-17T23:55:42.511166+00:00 node-103 kernel: [974594.892532] > 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892533] > Call Trace: > 2017-12-17T23:55:42.511167+00:00 node-103 kernel: [974594.892535] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892537] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:55:42.511168+00:00 node-103 kernel: [974594.892538] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2017-12-17T23:55:42.511170+00:00 node-103 kernel: [974594.892540] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:55:42.511171+00:00 node-103 kernel: [974594.892542] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:55:42.511172+00:00 node-103 kernel: [974594.892553] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:55:42.511173+00:00 node-103 kernel: [974594.892565] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892576] > [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2] > 2017-12-17T23:55:42.511174+00:00 node-103 kernel: [974594.892592] > [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2] > 2017-12-17T23:55:42.511176+00:00 node-103 kernel: [974594.892594] > [<ffffffff812730f1>] get_acl+0x41/0x60 > 2017-12-17T23:55:42.511177+00:00 node-103 kernel: [974594.892596] > [<ffffffff8121aeab>] generic_permission+0x13b/0x190 > 2017-12-17T23:55:42.511178+00:00 node-103 kernel: [974594.892608] > [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2] > 2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892610] > [<ffffffff8121af77>] __inode_permission+0x77/0xc0 > 2017-12-17T23:55:42.511179+00:00 node-103 kernel: [974594.892612] > [<ffffffff8121afd4>] inode_permission+0x14/0x50 > 2017-12-17T23:55:42.511180+00:00 node-103 kernel: [974594.892613] > [<ffffffff8121b0fb>] may_open+0x5b/0xf0 > 2017-12-17T23:55:42.511181+00:00 node-103 kernel: [974594.892615] > [<ffffffff8121efe8>] path_openat+0x188/0x1330 > 2017-12-17T23:55:42.511183+00:00 node-103 kernel: [974594.892616] > [<ffffffff81221381>] do_filp_open+0x91/0x100 > 2017-12-17T23:55:42.511184+00:00 node-103 kernel: [974594.892618] > [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190 > 2017-12-17T23:55:42.511187+00:00 node-103 kernel: [974594.892620] > [<ffffffff8120f738>] do_sys_open+0x138/0x2a0 > 2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892622] > [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > 2017-12-17T23:55:42.511188+00:00 node-103 kernel: [974594.892624] > [<ffffffff8120f8be>] SyS_open+0x1e/0x20 > 2017-12-17T23:55:42.511197+00:00 node-103 kernel: [974594.892626] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:57:42.511168+00:00 node-103 kernel: [974714.901454] > qemu-system-x86 D ffff880ef621b9c8 0 26593 1 0x00000000 > 2017-12-17T23:57:42.511169+00:00 node-103 kernel: [974714.901457] > ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00 > 2017-12-17T23:57:42.511170+00:00 node-103 kernel: [974714.901459] > ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00 > 2017-12-17T23:57:42.511183+00:00 node-103 kernel: [974714.901461] > 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901463] > Call Trace: > 2017-12-17T23:57:42.511185+00:00 node-103 kernel: [974714.901470] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901473] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:57:42.511186+00:00 node-103 kernel: [974714.901477] > [<ffffffff810ac642>] ? default_wake_function+0x12/0x20 > 2017-12-17T23:57:42.511188+00:00 node-103 kernel: [974714.901481] > [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40 > 2017-12-17T23:57:42.511189+00:00 node-103 kernel: [974714.901482] > [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90 > 2017-12-17T23:57:42.511190+00:00 node-103 kernel: [974714.901484] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:57:42.511197+00:00 node-103 kernel: [974714.901486] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:57:42.511198+00:00 node-103 kernel: [974714.901527] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:57:42.511199+00:00 node-103 kernel: [974714.901530] > [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0 > 2017-12-17T23:57:42.511201+00:00 node-103 kernel: [974714.901543] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:57:42.511202+00:00 node-103 kernel: [974714.901555] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:57:42.511203+00:00 node-103 kernel: [974714.901566] > [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2] > 2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901569] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2017-12-17T23:57:42.511204+00:00 node-103 kernel: [974714.901572] > [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0 > 2017-12-17T23:57:42.511205+00:00 node-103 kernel: [974714.901583] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:57:42.511207+00:00 node-103 kernel: [974714.901587] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2017-12-17T23:57:42.511208+00:00 node-103 kernel: [974714.901590] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2017-12-17T23:57:42.511209+00:00 node-103 kernel: [974714.901591] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2017-12-17T23:57:42.511210+00:00 node-103 kernel: [974714.901593] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2017-12-17T23:57:42.511227+00:00 node-103 kernel: [974714.901595] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2017-12-17T23:57:42.511229+00:00 node-103 kernel: [974714.901598] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901609] > qemu-img D ffff880f19ec7948 0 40743 5019 0x00000000 > 2017-12-17T23:57:42.511233+00:00 node-103 kernel: [974714.901610] > ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00 > 2017-12-17T23:57:42.511235+00:00 node-103 kernel: [974714.901612] > ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00 > 2017-12-17T23:57:42.511236+00:00 node-103 kernel: [974714.901613] > 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:57:42.511237+00:00 node-103 kernel: [974714.901615] > Call Trace: > 2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901617] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:57:42.511238+00:00 node-103 kernel: [974714.901618] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:57:42.511239+00:00 node-103 kernel: [974714.901620] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2017-12-17T23:57:42.511240+00:00 node-103 kernel: [974714.901622] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:57:42.511242+00:00 node-103 kernel: [974714.901623] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901636] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:57:42.511243+00:00 node-103 kernel: [974714.901648] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901659] > [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2] > 2017-12-17T23:57:42.511244+00:00 node-103 kernel: [974714.901685] > [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2] > 2017-12-17T23:57:42.511246+00:00 node-103 kernel: [974714.901687] > [<ffffffff812730f1>] get_acl+0x41/0x60 > 2017-12-17T23:57:42.511247+00:00 node-103 kernel: [974714.901690] > [<ffffffff8121aeab>] generic_permission+0x13b/0x190 > 2017-12-17T23:57:42.511248+00:00 node-103 kernel: [974714.901701] > [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2] > 2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901703] > [<ffffffff8121af77>] __inode_permission+0x77/0xc0 > 2017-12-17T23:57:42.511249+00:00 node-103 kernel: [974714.901704] > [<ffffffff8121afd4>] inode_permission+0x14/0x50 > 2017-12-17T23:57:42.511250+00:00 node-103 kernel: [974714.901706] > [<ffffffff8121b0fb>] may_open+0x5b/0xf0 > 2017-12-17T23:57:42.511252+00:00 node-103 kernel: [974714.901707] > [<ffffffff8121efe8>] path_openat+0x188/0x1330 > 2017-12-17T23:57:42.511253+00:00 node-103 kernel: [974714.901708] > [<ffffffff81221381>] do_filp_open+0x91/0x100 > 2017-12-17T23:57:42.511254+00:00 node-103 kernel: [974714.901710] > [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190 > 2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901712] > [<ffffffff8120f738>] do_sys_open+0x138/0x2a0 > 2017-12-17T23:57:42.511257+00:00 node-103 kernel: [974714.901714] > [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > 2017-12-17T23:57:42.511258+00:00 node-103 kernel: [974714.901715] > [<ffffffff8120f8be>] SyS_open+0x1e/0x20 > 2017-12-17T23:57:42.511260+00:00 node-103 kernel: [974714.901717] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910524] > qemu-system-x86 D ffff880ef621b9c8 0 26593 1 0x00000000 > 2017-12-17T23:59:42.511080+00:00 node-103 kernel: [974834.910528] > ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00 > 2017-12-17T23:59:42.511081+00:00 node-103 kernel: [974834.910529] > ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00 > 2017-12-17T23:59:42.511083+00:00 node-103 kernel: [974834.910531] > 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:59:42.511084+00:00 node-103 kernel: [974834.910533] > Call Trace: > 2017-12-17T23:59:42.511085+00:00 node-103 kernel: [974834.910540] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910543] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:59:42.511086+00:00 node-103 kernel: [974834.910547] > [<ffffffff810ac642>] ? default_wake_function+0x12/0x20 > 2017-12-17T23:59:42.511087+00:00 node-103 kernel: [974834.910551] > [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40 > 2017-12-17T23:59:42.511089+00:00 node-103 kernel: [974834.910553] > [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90 > 2017-12-17T23:59:42.511090+00:00 node-103 kernel: [974834.910555] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910557] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:59:42.511091+00:00 node-103 kernel: [974834.910594] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:59:42.511092+00:00 node-103 kernel: [974834.910596] > [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0 > 2017-12-17T23:59:42.511093+00:00 node-103 kernel: [974834.910609] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:59:42.511095+00:00 node-103 kernel: [974834.910633] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910644] > [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2] > 2017-12-17T23:59:42.511096+00:00 node-103 kernel: [974834.910647] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2017-12-17T23:59:42.511097+00:00 node-103 kernel: [974834.910649] > [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0 > 2017-12-17T23:59:42.511098+00:00 node-103 kernel: [974834.910660] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-17T23:59:42.511129+00:00 node-103 kernel: [974834.910663] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2017-12-17T23:59:42.511133+00:00 node-103 kernel: [974834.910665] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2017-12-17T23:59:42.511135+00:00 node-103 kernel: [974834.910666] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2017-12-17T23:59:42.511137+00:00 node-103 kernel: [974834.910668] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2017-12-17T23:59:42.511154+00:00 node-103 kernel: [974834.910670] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2017-12-17T23:59:42.511156+00:00 node-103 kernel: [974834.910672] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-17T23:59:42.511161+00:00 node-103 kernel: [974834.910686] > qemu-img D ffff880f19ec7948 0 40743 5019 0x00000000 > 2017-12-17T23:59:42.511162+00:00 node-103 kernel: [974834.910688] > ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00 > 2017-12-17T23:59:42.511163+00:00 node-103 kernel: [974834.910689] > ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00 > 2017-12-17T23:59:42.511164+00:00 node-103 kernel: [974834.910691] > 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff > 2017-12-17T23:59:42.511165+00:00 node-103 kernel: [974834.910692] > Call Trace: > 2017-12-17T23:59:42.511166+00:00 node-103 kernel: [974834.910694] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910696] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-17T23:59:42.511167+00:00 node-103 kernel: [974834.910697] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2017-12-17T23:59:42.511168+00:00 node-103 kernel: [974834.910699] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-17T23:59:42.511170+00:00 node-103 kernel: [974834.910700] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-17T23:59:42.511171+00:00 node-103 kernel: [974834.910712] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910722] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-17T23:59:42.511172+00:00 node-103 kernel: [974834.910733] > [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2] > 2017-12-17T23:59:42.511173+00:00 node-103 kernel: [974834.910748] > [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2] > 2017-12-17T23:59:42.511174+00:00 node-103 kernel: [974834.910751] > [<ffffffff812730f1>] get_acl+0x41/0x60 > 2017-12-17T23:59:42.511176+00:00 node-103 kernel: [974834.910753] > [<ffffffff8121aeab>] generic_permission+0x13b/0x190 > 2017-12-17T23:59:42.511177+00:00 node-103 kernel: [974834.910777] > [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2] > 2017-12-17T23:59:42.511178+00:00 node-103 kernel: [974834.910778] > [<ffffffff8121af77>] __inode_permission+0x77/0xc0 > 2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910780] > [<ffffffff8121afd4>] inode_permission+0x14/0x50 > 2017-12-17T23:59:42.511179+00:00 node-103 kernel: [974834.910782] > [<ffffffff8121b0fb>] may_open+0x5b/0xf0 > 2017-12-17T23:59:42.511180+00:00 node-103 kernel: [974834.910783] > [<ffffffff8121efe8>] path_openat+0x188/0x1330 > 2017-12-17T23:59:42.511182+00:00 node-103 kernel: [974834.910785] > [<ffffffff81221381>] do_filp_open+0x91/0x100 > 2017-12-17T23:59:42.511183+00:00 node-103 kernel: [974834.910786] > [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190 > 2017-12-17T23:59:42.511185+00:00 node-103 kernel: [974834.910789] > [<ffffffff8120f738>] do_sys_open+0x138/0x2a0 > 2017-12-17T23:59:42.511186+00:00 node-103 kernel: [974834.910791] > [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > 2017-12-17T23:59:42.511187+00:00 node-103 kernel: [974834.910793] > [<ffffffff8120f8be>] SyS_open+0x1e/0x20 > 2017-12-17T23:59:42.511188+00:00 node-103 kernel: [974834.910795] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-18T00:00:01.271777+00:00 node-103 kernel: [974853.675776] > Process accounting resumed > 2017-12-18T00:01:42.511127+00:00 node-103 kernel: [974954.919618] > qemu-system-x86 D ffff880ef621b9c8 0 26593 1 0x00000000 > 2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919621] > ffff880ef621b9c8 ffff880ef621b9b0 ffff882038edb800 ffff88102c102a00 > 2017-12-18T00:01:42.511128+00:00 node-103 kernel: [974954.919623] > ffff880ef621c000 ffff880ef621bb70 ffff880ef621bb68 ffff88102c102a00 > 2017-12-18T00:01:42.511130+00:00 node-103 kernel: [974954.919625] > 0000000000000004 ffff880ef621b9e0 ffffffff81840585 7fffffffffffffff > 2017-12-18T00:01:42.511131+00:00 node-103 kernel: [974954.919627] > Call Trace: > 2017-12-18T00:01:42.511132+00:00 node-103 kernel: [974954.919634] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-18T00:01:42.511133+00:00 node-103 kernel: [974954.919638] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919643] > [<ffffffff810ac642>] ? default_wake_function+0x12/0x20 > 2017-12-18T00:01:42.511134+00:00 node-103 kernel: [974954.919647] > [<ffffffff810c4422>] ? autoremove_wake_function+0x12/0x40 > 2017-12-18T00:01:42.511136+00:00 node-103 kernel: [974954.919649] > [<ffffffff810c3d52>] ? __wake_up_common+0x52/0x90 > 2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919651] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-18T00:01:42.511138+00:00 node-103 kernel: [974954.919653] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919702] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-18T00:01:42.511139+00:00 node-103 kernel: [974954.919705] > [<ffffffff810f634b>] ? ktime_get+0x3b/0xb0 > 2017-12-18T00:01:42.511141+00:00 node-103 kernel: [974954.919719] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-18T00:01:42.511142+00:00 node-103 kernel: [974954.919732] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-18T00:01:42.511143+00:00 node-103 kernel: [974954.919744] > [<ffffffffc08a0045>] ocfs2_file_write_iter+0x225/0xdf0 [ocfs2] > 2017-12-18T00:01:42.511144+00:00 node-103 kernel: [974954.919746] > [<ffffffff812252c0>] ? poll_select_copy_remaining+0x140/0x140 > 2017-12-18T00:01:42.511145+00:00 node-103 kernel: [974954.919749] > [<ffffffff81349a6d>] ? security_file_permission+0x3d/0xc0 > 2017-12-18T00:01:42.511176+00:00 node-103 kernel: [974954.919761] > [<ffffffffc089fe20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2] > 2017-12-18T00:01:42.511181+00:00 node-103 kernel: [974954.919764] > [<ffffffff812613ea>] aio_run_iocb+0x26a/0x2d0 > 2017-12-18T00:01:42.511182+00:00 node-103 kernel: [974954.919766] > [<ffffffff8122e8e5>] ? __fget_light+0x25/0x60 > 2017-12-18T00:01:42.511184+00:00 node-103 kernel: [974954.919767] > [<ffffffff8122e933>] ? __fdget+0x13/0x20 > 2017-12-18T00:01:42.511185+00:00 node-103 kernel: [974954.919769] > [<ffffffff812622cf>] do_io_submit+0x25f/0x500 > 2017-12-18T00:01:42.511203+00:00 node-103 kernel: [974954.919771] > [<ffffffff81262580>] SyS_io_submit+0x10/0x20 > 2017-12-18T00:01:42.511205+00:00 node-103 kernel: [974954.919773] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > 2017-12-18T00:01:42.511209+00:00 node-103 kernel: [974954.919786] > qemu-img D ffff880f19ec7948 0 40743 5019 0x00000000 > 2017-12-18T00:01:42.511210+00:00 node-103 kernel: [974954.919788] > ffff880f19ec7948 ffff882033fff060 ffff882038f3f000 ffff880b39739c00 > 2017-12-18T00:01:42.511211+00:00 node-103 kernel: [974954.919789] > ffff880f19ec8000 ffff880f19ec7af0 ffff880f19ec7ae8 ffff880b39739c00 > 2017-12-18T00:01:42.511212+00:00 node-103 kernel: [974954.919791] > 0000000000000004 ffff880f19ec7960 ffffffff81840585 7fffffffffffffff > 2017-12-18T00:01:42.511213+00:00 node-103 kernel: [974954.919792] > Call Trace: > 2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919794] > [<ffffffff81840585>] schedule+0x35/0x80 > 2017-12-18T00:01:42.511215+00:00 node-103 kernel: [974954.919795] > [<ffffffff818436d5>] schedule_timeout+0x1b5/0x270 > 2017-12-18T00:01:42.511216+00:00 node-103 kernel: [974954.919797] > [<ffffffff8183fed6>] ? __schedule+0x3b6/0xa30 > 2017-12-18T00:01:42.511217+00:00 node-103 kernel: [974954.919799] > [<ffffffff81840fe3>] wait_for_completion+0xb3/0x140 > 2017-12-18T00:01:42.511218+00:00 node-103 kernel: [974954.919801] > [<ffffffff810ac630>] ? wake_up_q+0x70/0x70 > 2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919826] > [<ffffffffc0896145>] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2] > 2017-12-18T00:01:42.511220+00:00 node-103 kernel: [974954.919838] > [<ffffffffc089720a>] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2] > 2017-12-18T00:01:42.511221+00:00 node-103 kernel: [974954.919850] > [<ffffffffc0898d6e>] ? ocfs2_extent_map_trunc+0x10e/0x150 [ocfs2] > 2017-12-18T00:01:42.511222+00:00 node-103 kernel: [974954.919866] > [<ffffffffc08f9b32>] ocfs2_iop_get_acl+0x52/0x100 [ocfs2] > 2017-12-18T00:01:42.511223+00:00 node-103 kernel: [974954.919869] > [<ffffffff812730f1>] get_acl+0x41/0x60 > 2017-12-18T00:01:42.511224+00:00 node-103 kernel: [974954.919872] > [<ffffffff8121aeab>] generic_permission+0x13b/0x190 > 2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919895] > [<ffffffffc089aeea>] ocfs2_permission+0xca/0xe0 [ocfs2] > 2017-12-18T00:01:42.511226+00:00 node-103 kernel: [974954.919897] > [<ffffffff8121af77>] __inode_permission+0x77/0xc0 > 2017-12-18T00:01:42.511227+00:00 node-103 kernel: [974954.919898] > [<ffffffff8121afd4>] inode_permission+0x14/0x50 > 2017-12-18T00:01:42.511228+00:00 node-103 kernel: [974954.919900] > [<ffffffff8121b0fb>] may_open+0x5b/0xf0 > 2017-12-18T00:01:42.511229+00:00 node-103 kernel: [974954.919901] > [<ffffffff8121efe8>] path_openat+0x188/0x1330 > 2017-12-18T00:01:42.511231+00:00 node-103 kernel: [974954.919903] > [<ffffffff81221381>] do_filp_open+0x91/0x100 > 2017-12-18T00:01:42.511232+00:00 node-103 kernel: [974954.919904] > [<ffffffff8122edb6>] ? __alloc_fd+0x46/0x190 > 2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919907] > [<ffffffff8120f738>] do_sys_open+0x138/0x2a0 > 2017-12-18T00:01:42.511235+00:00 node-103 kernel: [974954.919909] > [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > 2017-12-18T00:01:42.511236+00:00 node-103 kernel: [974954.919910] > [<ffffffff8120f8be>] SyS_open+0x1e/0x20 > 2017-12-18T00:01:42.511238+00:00 node-103 kernel: [974954.919912] > [<ffffffff818446b2>] entry_SYSCALL_64_fastpath+0x16/0x71 > > > -- Jim > > On Wed, Dec 27, 2017 at 8:03 PM, Changwei Ge <ge.chang...@h3c.com > <mailto:ge.chang...@h3c.com>> wrote: > > On 2017/12/28 3:02, Jim Okken wrote: > > Peter, > > > > I did not want to flood my first email with details and make > it 3 pages long. i gladly will provide more details. first I'd like to ask > that you be less condescending. You have no idea the journey I took toward > using ocfs2 in this environment, and also the requirements I needed to meet. > > you were amazed and astonished by my question, and I was > amazed and astonished by your answer. > > > > let's start over: > > if ocfs2 isnt the right solution for what I'm doing I can > admit that, and move off of it. > > if OpenStack and perhaps newer kernels do not necessarily work > with ocfs2 I can admit that too, and move off of it. > > I had high hopes it was the right solution, and at first it > did the job. > > > > I have a healthy HP MSA 2040 storage appliance connected to > via fiber channel. It has a 7TB storage volume on a fiber channel LUN. From > what I know I need a shared storage filesystem so each of my client systems, > also on the fiber channel network, can access this storage simultaneously > with corrupting data (I need file locking). This HP MSA is healthy and > stable. This isn't exactly local storage I know, but each client system sees > this MSA storage volume as a local drive, ie: /dev/sdb > > > > what could cause a "lost" wakeup from the OCFS2 lock manager? > > Hi Jim, > Did a node crash or lose power supply before the stuck stack was > found? > And is the stuck stack the only one you can find in your kernel > log? > > Thanks, > Changwei > > > > > Ubuntu has ocfs2 packages in it's repos. So I hope it has some > level of support in it's OSs and distributed kernels... > > I am not well versed in storage concepts but i'll surprise > you, and today my employer (who signs my paycheck) asks me, and tasks me, > with making this storage solution work better. > > > > please let me know if I can provide more details. please let > me know any further comments > > > > thanks! > > > > -- Jim > > > > On Wed, Dec 27, 2017 at 1:16 PM, Peter Grandi > <p...@ocfs.list.sabi.co.uk <mailto:p...@ocfs.list.sabi.co.uk> > <mailto:p...@ocfs.list.sabi.co.uk <mailto:p...@ocfs.list.sabi.co.uk>>> wrote: > > > > > I have a ocfs2 filesystem setup as a shared filesystem > between > > > 12 openstack compute nodes which are Ubuntu 16.04.3. > > > > I am amazed by how unconstrained are the imaginations of > some > > other people. That is a truly astonishing setup. > > > > > I have a very big concern of stability. A month ago I > lost a > > > good deal of files, I don't know the real reason, but > things > > > seemed to point to the ofcs2 cluster. > > > > That also seems to me unconstrained by concern about mere > > details. > > > > > Last week I found many of my compute nodes with the nova > > > service down. The node which went down first has a > "stuck" > > > file/directory in the ocfs2 filesystem [ ... ] > > > > The stack trace seems to point at a "lost" wakeup from the > OCFS2 > > lock manager. > > > > > I have other openstack compute nodes that are identical > except > > > they use local storage and do not use ocfs2 and these > have > > > always been stable. > > > > But OCFS2 is meant to work with local physical storage on a > > local phyical machine. What's your current setup? > > > > > maybe ocfs2 just isn't stable on Ubuntu 16.04.3? I am > using > > > version 1.6.4-3.1 > > > > OCFS2 has been extremely stable for many years on very > high load > > share-disk clusters for many users. OpenStack and perhaps > newer > > kernels not necessarily so. > > > > Also OCSF2 requires a storage subsystem with specific > features > > and a high degree of reliable operation. It is astonishing > but > > fairly typical that this reports contains no mention of the > > setup or of the state of the storage subsystem. > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> > <mailto:Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>> > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > <https://oss.oracle.com/mailman/listinfo/ocfs2-users> > <https://oss.oracle.com/mailman/listinfo/ocfs2-users > <https://oss.oracle.com/mailman/listinfo/ocfs2-users>> > > > > > > > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users