Just got this during bonnie test, trying to do an ls -l on the cephfs. I
also have this kworker process constantly at 40% when doing this
bonnie++ test.
[35281.101763] INFO: task bash:1169 blocked for more than 120 seconds.
[35281.102064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[35281.102175] bash D ffffa03fbfc9acc0 0 1169 1167
0x00000004
[35281.102181] Call Trace:
[35281.102275] [<ffffffff84b86d4f>] ? __schedule+0x3af/0x860
[35281.102285] [<ffffffff84b87229>] schedule+0x29/0x70
[35281.102296] [<ffffffff84b84d11>] schedule_timeout+0x221/0x2d0
[35281.102332] [<ffffffff844c6966>] ? finish_wait+0x56/0x70
[35281.102342] [<ffffffff84b85482>] ? mutex_lock+0x12/0x2f
[35281.102381] [<ffffffff846e7ed8>] ? autofs4_wait+0x428/0x920
[35281.102386] [<ffffffff84b875dd>] wait_for_completion+0xfd/0x140
[35281.102407] [<ffffffff844daf40>] ? wake_up_state+0x20/0x20
[35281.102422] [<ffffffff846e902b>] autofs4_expire_wait+0xab/0x160
[35281.102425] [<ffffffff846e6060>] do_expire_wait+0x1e0/0x210
[35281.102429] [<ffffffff846e62b3>] autofs4_d_manage+0x73/0x1c0
[35281.102455] [<ffffffff84658e8a>] follow_managed+0xba/0x310
[35281.102459] [<ffffffff84659e5d>] lookup_fast+0x12d/0x230
[35281.102464] [<ffffffff8465c90d>] path_lookupat+0x16d/0x8d0
[35281.102467] [<ffffffff8465deed>] ? do_last+0x66d/0x1340
[35281.102488] [<ffffffff8464a73a>] ? __check_object_size+0x1ca/0x250
[35281.102499] [<ffffffff84628675>] ? kmem_cache_alloc+0x35/0x1f0
[35281.102503] [<ffffffff8465fc0f>] ? getname_flags+0x4f/0x1a0
[35281.102507] [<ffffffff8465d09b>] filename_lookup+0x2b/0xc0
[35281.102510] [<ffffffff84660da7>] user_path_at_empty+0x67/0xc0
[35281.102513] [<ffffffff84660e11>] user_path_at+0x11/0x20
[35281.102516] [<ffffffff84653603>] vfs_fstatat+0x63/0xc0
[35281.102519] [<ffffffff846539be>] SYSC_newstat+0x2e/0x60
[35281.102529] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102533] [<ffffffff84b94ec9>] ?
system_call_after_swapgs+0x96/0x13a
[35281.102536] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102539] [<ffffffff84b94ec9>] ?
system_call_after_swapgs+0x96/0x13a
[35281.102543] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102546] [<ffffffff84b94ec9>] ?
system_call_after_swapgs+0x96/0x13a
[35281.102549] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102552] [<ffffffff84b94ec9>] ?
system_call_after_swapgs+0x96/0x13a
[35281.102555] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102558] [<ffffffff84b94ec9>] ?
system_call_after_swapgs+0x96/0x13a
[35281.102561] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
[35281.102565] [<ffffffff84653e7e>] SyS_newstat+0xe/0x10
[35281.102568] [<ffffffff84b94f92>] system_call_fastpath+0x25/0x2a
[35281.102572] [<ffffffff84b94ed5>] ?
system_call_after_swapgs+0xa2/0x13a
-----Original Message-----
To: ceph-users
Subject: [ceph-users] kvm vm cephfs mount hangs on osd node (something
like umount -l available?) (help wanted going to production)
I have a vm on a osd node (which can reach host and other nodes via the
macvtap interface (used by the host and guest)). I just did a simple
bonnie++ test and everything seems to be fine. Yesterday however the
dovecot procces apparently caused problems (only using cephfs for an
archive namespace, inbox is on rbd ssd, fs meta also on ssd)
How can I recover from such lock-up. If I have a similar situation with
an nfs-ganesha mount, I have the option to do a umount -l, and clients
recover quickly without any issues.
Having to reset the vm, is not really an option. What is best way to
resolve this?
Ceph cluster: 14.2.11 (the vm has 14.2.16)
I have in my ceph.conf nothing special, these 2x in the mds section:
mds bal fragment size max = 120000
# maybe for nfs-ganesha problems?
# http://docs.ceph.com/docs/master/cephfs/eviction/
#mds_session_blacklist_on_timeout = false
#mds_session_blacklist_on_evict = false
mds_cache_memory_limit = 17179860387
All running:
CentOS Linux release 7.9.2009 (Core)
Linux mail04 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io