Hi, This time, osdc: REQUESTS 0 homeless 0 LINGER REQUESTS
monc: have monmap 2 want 3+ have osdmap 4545 want 4546 have fsmap.user 0 have mdsmap 446 want 447+ fs_cluster_id -1 mdsc: 649065 mds0 setattr #100002e7e5a Anything useful? Yan, Zheng <uker...@gmail.com> 于2018年8月25日周六 上午7:53写道: > Are there hang request in /sys/kernel/debug/ceph/xxxx/osdc > > On Fri, Aug 24, 2018 at 9:32 PM Zhenshi Zhou <deader...@gmail.com> wrote: > > > > I'm afaid that the client hangs again...the log shows: > > > > 2018-08-24 21:27:54.714334 [WRN] slow request 62.607608 seconds old, > received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:27:54.714320 [WRN] 3 slow requests, 1 included below; > oldest blocked for > 843.556758 secs > > 2018-08-24 21:27:24.713740 [WRN] slow request 32.606979 seconds old, > received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:27:24.713729 [WRN] 3 slow requests, 1 included below; > oldest blocked for > 813.556129 secs > > 2018-08-24 21:25:49.711778 [WRN] slow request 483.807963 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:25:49.711766 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 718.554206 secs > > 2018-08-24 21:21:54.707536 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued > pAsLsXsFscr, sent 483.548912 seconds ago > > 2018-08-24 21:21:54.706930 [WRN] slow request 483.549363 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x100002e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > > 2018-08-24 21:21:54.706920 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 483.549363 secs > > 2018-08-24 21:21:49.706838 [WRN] slow request 243.803027 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:21:49.706828 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 478.549269 secs > > 2018-08-24 21:19:49.704294 [WRN] slow request 123.800486 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:19:49.704284 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 358.546729 secs > > 2018-08-24 21:18:49.703073 [WRN] slow request 63.799269 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:18:49.703062 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 298.545511 secs > > 2018-08-24 21:18:19.702465 [WRN] slow request 33.798637 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > > 2018-08-24 21:18:19.702456 [WRN] 2 slow requests, 1 included below; > oldest blocked for > 268.544880 secs > > 2018-08-24 21:17:54.702517 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued > pAsLsXsFscr, sent 243.543893 seconds ago > > 2018-08-24 21:17:54.701904 [WRN] slow request 243.544331 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x100002e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > > 2018-08-24 21:17:54.701894 [WRN] 1 slow requests, 1 included below; > oldest blocked for > 243.544331 secs > > 2018-08-24 21:15:54.700034 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued > pAsLsXsFscr, sent 123.541410 seconds ago > > 2018-08-24 21:15:54.699385 [WRN] slow request 123.541822 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x100002e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > > 2018-08-24 21:15:54.699375 [WRN] 1 slow requests, 1 included below; > oldest blocked for > 123.541822 secs > > 2018-08-24 21:14:57.055183 [WRN] Health check failed: 1 clients failing > to respond to capability release (MDS_CLIENT_LATE_RELEASE) > > 2018-08-24 21:14:56.167868 [WRN] MDS health message (mds.0): Client > docker39 failing to respond to capability release > > 2018-08-24 21:14:54.698753 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued > pAsLsXsFscr, sent 63.540127 seconds ago > > 2018-08-24 21:14:54.698104 [WRN] slow request 63.540533 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x100002e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > > 2018-08-24 21:14:54.698086 [WRN] 1 slow requests, 1 included below; > oldest blocked for > 63.540533 secs > > 2018-08-24 21:14:28.217536 [WRN] Health check failed: 1 MDSs report > slow requests (MDS_SLOW_REQUEST) > > 2018-08-24 21:14:28.167096 [WRN] MDS health message (mds.0): 1 slow > requests are blocked > 30 sec > > > > > > > > Yan, Zheng <uker...@gmail.com> 于2018年8月14日周二 下午3:13写道: > >> > >> On Mon, Aug 13, 2018 at 9:55 PM Zhenshi Zhou <deader...@gmail.com> > wrote: > >> > > >> > Hi Burkhard, > >> > I'm sure the user has permission to read and write. Besides, we're > not using EC data pools. > >> > Now the situation is that any openration to a specific file, the > command will hang. > >> > Operations to any other files won't hang. > >> > > >> > >> can ceph-fuse client read the specific file ? > >> > >> > Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de> > 于2018年8月13日周一 下午9:42写道: > >> >> > >> >> Hi, > >> >> > >> >> > >> >> On 08/13/2018 03:22 PM, Zhenshi Zhou wrote: > >> >> > Hi, > >> >> > Finally, I got a running server with files > /sys/kernel/debug/ceph/xxx/ > >> >> > > >> >> > [root@docker27 > 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat mdsc > >> >> > [root@docker27 > 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat monc > >> >> > have monmap 2 want 3+ > >> >> > have osdmap 4545 want 4546 > >> >> > have fsmap.user 0 > >> >> > have mdsmap 335 want 336+ > >> >> > fs_cluster_id -1 > >> >> > [root@docker27 > 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat osdc > >> >> > REQUESTS 6 homeless 0 > >> >> > 82580 osd10 1.7f9ddac7 [10,13]/10 [10,13]/10 > >> >> > 10000053a04.00000000 0x400024 1 write > >> >> > 81019 osd11 1.184ed679 [11,7]/11 [11,7]/11 > >> >> > 1000005397b.00000000 0x400024 1 write > >> >> > 81012 osd12 1.cd98ed57 [12,9]/12 [12,9]/12 > >> >> > 10000053971.00000000 0x400024 1 write,startsync > >> >> > 82589 osd12 1.7cd5405a [12,8]/12 [12,8]/12 > >> >> > 10000053a13.00000000 0x400024 1 write,startsync > >> >> > 80972 osd13 1.91886156 [13,4]/13 [13,4]/13 > >> >> > 10000053939.00000000 0x400024 1 write > >> >> > 81035 osd13 1.ac5ccb56 [13,4]/13 [13,4]/13 > >> >> > 10000053997.00000000 0x400024 1 write > >> >> > > >> >> > The cluster claims nothing, and shows HEALTH_OK still. > >> >> > What I did is just vim a file storing on cephfs, and then it hung > there. > >> >> > And I got a process with 'D' stat. > >> >> > By the way, the whole mount directory is still in use and with no > error. > >> >> > >> >> So there are no pending mds requests, mon seems to be ok, too. > >> >> > >> >> But the osd requests seems to be stuck. Are you sure the ceph user > used > >> >> for the mount point is allowed to write to the cephfs data pools? Are > >> >> you using additional EC data pools? > >> >> > >> >> Regards, > >> >> Burkhard > >> >> _______________________________________________ > >> >> ceph-users mailing list > >> >> ceph-users@lists.ceph.com > >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com