On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll <[email protected]> wrote: > > Hi all, > > We recently encountered an issue where our CephFS filesystem unexpectedly was > set to read-only. When we look at some of the logs from the daemons I can see > the following: > > On the MDS: > ... > 2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error > (90) Message too long, force readonly... > 2019-05-18 16:34:24.341 7fb3bd610700 1 mds.0.cache force file system > read-only > 2019-05-18 16:34:24.341 7fb3bd610700 0 log_channel(cluster) log [WRN] : > force file system read-only > 2019-05-18 16:34:41.289 7fb3c0616700 1 heartbeat_map is_healthy 'MDSRank' > had timed out after 15 > 2019-05-18 16:34:41.289 7fb3c0616700 0 mds.beacon.objmds00 Skipping beacon > heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is > not healthy! > ... > > On one of the OSDs it was most likely targeting: > ... > 2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( v > 682796'15706523 (682693'15703449,682796'15706523] local-lis/les=673041/673042 > n=10524 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 > 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682796'15706523 lcod > 682796'15706522 mlcod 682796'15706522 active+clean] do_op msg data len > 95146005 > osd_max_write_size 94371840 on osd_op(mds.0.89098:48609421 49.20b > 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals] snapc 0=[] > ondisk+write+known_if_redirected+full_force e682796) v8 > 2019-05-18 17:10:33.695 7f813466b700 0 log_channel(cluster) log [DBG] : > 49.31c scrub starts > 2019-05-18 17:10:34.980 7f813466b700 0 log_channel(cluster) log [DBG] : > 49.31c scrub ok > 2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( v > 682861'15706526 (682693'15703449,682861'15706526] local-lis/les=673041/673042 > n=10525 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 > 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682861'15706526 lcod > 682859'15706525 mlcod 682859'15706525 active+clean] do_op msg data len > 95903764 > osd_max_write_size 94371840 on osd_op(mds.0.91565:357877 49.20b > 49:d0630e4c:::mds0_sessionmap:head > [omap-set-header,omap-set-vals,omap-rm-keys] snapc 0=[] > ondisk+write+known_if_redirected+full_force e683434) v8 > … > > During this time there were some health concerns with the cluster. > Significantly, since the error above seems to be related to the SessionMap, > we had a client that had a few blocked requests for over 35948 secs (it’s a > member of a compute cluster so we let the node drain/finish jobs before > rebooting). We have also had some issues with certain OSDs running older > hardware staying up/responding timely to heartbeats after upgrading to > Nautilus, although that seems to be an iowait/load issue that we are actively > working to mitigate separately. >
This prevent mds from trimming completed requests recorded in session. which results a very large session item. To recovery, blacklist the client that has blocked request, the restart mds. > We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with > an active/standby setup between two MDS nodes. MDS clients are mounted using > the RHEL7.6 kernel driver. > > My read here would be that the MDS is sending too large a message to the OSD, > however my understanding was that the MDS should be using osd_max_write_size > to determine the size of that message [0]. Is this maybe a bug in how this is > calculated on the MDS side? > > > Thanks! > Ryan Leimenstoll > [email protected] > University of Maryland Institute for Advanced Computer Studies > > > > [0] https://www.spinics.net/lists/ceph-devel/msg11951.html > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
