To add a little color here... we started an rsync last night to copy about 4TB worth of files to CephFS. Paused it this morning because CephFS was unresponsive on the machine (e.g. can't cat a file from the filesystem).
Been waiting about 3 hours for the log jam to clear. Slow requests have steadily decreased but still can't cat a file. Seems like there should be something throttling the rsync operation to prevent the queues from backing up so far. Is this is configuration problem or a bug? From reading the Ceph docs, this seems to be the most telling: mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by clients, 1 stray files [Ref: http://docs.ceph.com/docs/master/cephfs/cache-size-limits/] "Be aware that the cache limit is not a hard limit. Potential bugs in the CephFS client or MDS or misbehaving applications might cause the MDS to exceed its cache size. The mds_health_cache_threshold configures the cluster health warning message so that operators can investigate why the MDS cannot shrink its cache." Any suggestions? Thanks, -- Dan > On Dec 5, 2017, at 10:07, Reed Dier <reed.d...@focusvq.com> wrote: > > Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD > backed CephFS pool. > > Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running > mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and > clients. > >> $ ceph versions >> { >> "mon": { >> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) >> luminous (stable)": 3 >> }, >> "mgr": { >> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) >> luminous (stable)": 3 >> }, >> "osd": { >> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) >> luminous (stable)": 74 >> }, >> "mds": { >> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) >> luminous (stable)": 2 >> }, >> "overall": { >> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) >> luminous (stable)": 82 >> } >> } > >> >> <https://www.anandtech.com/show/12116/amd-and-microsoft-announce-azure-vms-with-32core-epyc-cpus>HEALTH_ERR >> 1 MDSs report oversized cache; 1 MDSs have many clients failing to respond >> to cache pressure; 1 MDSs behind on tr >> imming; noout,nodeep-scrub flag(s) set; application not enabled on 1 >> pool(s); 242 slow requests are blocked > 32 sec >> ; 769378 stuck requests are blocked > 4096 sec >> MDS_CACHE_OVERSIZED 1 MDSs report oversized cache >> mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by >> clients, 1 stray files >> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache >> pressure >> mdsdb(mds.0): Many clients (37) failing to respond to cache >> pressureclient_count: 37 >> MDS_TRIM 1 MDSs behind on trimming >> mdsdb(mds.0): Behind on trimming (36252/30)max_segments: 30, >> num_segments: 36252 >> OSDMAP_FLAGS noout,nodeep-scrub flag(s) set >> REQUEST_SLOW 242 slow requests are blocked > 32 sec >> 236 ops are blocked > 2097.15 sec >> 3 ops are blocked > 1048.58 sec >> 2 ops are blocked > 524.288 sec >> 1 ops are blocked > 32.768 sec >> REQUEST_STUCK 769378 stuck requests are blocked > 4096 sec >> 91 ops are blocked > 67108.9 sec >> 121258 ops are blocked > 33554.4 sec >> 308189 ops are blocked > 16777.2 sec >> 251586 ops are blocked > 8388.61 sec >> 88254 ops are blocked > 4194.3 sec >> osds 0,1,3,6,8,12,15,16,17,21,22,23 have stuck requests > 16777.2 sec >> osds 4,7,9,10,11,14,18,20 have stuck requests > 33554.4 sec >> osd.13 has stuck requests > 67108.9 sec > > This is across 8 nodes, holding 3x 8TB HDD’s each, all backed by Intel P3600 > NVMe drives for journaling. > Removed SSD OSD’s for brevity. > >> $ ceph osd tree >> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> -13 87.28799 root ssd >> -1 174.51500 root default >> -10 174.51500 rack default.rack2 >> -55 43.62000 chassis node2425 >> -2 21.81000 host node24 >> 0 hdd 7.26999 osd.0 up 1.00000 1.00000 >> 8 hdd 7.26999 osd.8 up 1.00000 1.00000 >> 16 hdd 7.26999 osd.16 up 1.00000 1.00000 >> -3 21.81000 host node25 >> 1 hdd 7.26999 osd.1 up 1.00000 1.00000 >> 9 hdd 7.26999 osd.9 up 1.00000 1.00000 >> 17 hdd 7.26999 osd.17 up 1.00000 1.00000 >> -56 43.63499 chassis node2627 >> -4 21.81999 host node26 >> 2 hdd 7.27499 osd.2 up 1.00000 1.00000 >> 10 hdd 7.26999 osd.10 up 1.00000 1.00000 >> 18 hdd 7.27499 osd.18 up 1.00000 1.00000 >> -5 21.81499 host node27 >> 3 hdd 7.26999 osd.3 up 1.00000 1.00000 >> 11 hdd 7.26999 osd.11 up 1.00000 1.00000 >> 19 hdd 7.27499 osd.19 up 1.00000 1.00000 >> -57 43.62999 chassis node2829 >> -6 21.81499 host node28 >> 4 hdd 7.26999 osd.4 up 1.00000 1.00000 >> 12 hdd 7.26999 osd.12 up 1.00000 1.00000 >> 20 hdd 7.27499 osd.20 up 1.00000 1.00000 >> -7 21.81499 host node29 >> 5 hdd 7.26999 osd.5 up 1.00000 1.00000 >> 13 hdd 7.26999 osd.13 up 1.00000 1.00000 >> 21 hdd 7.27499 osd.21 up 1.00000 1.00000 >> -58 43.62999 chassis node3031 >> -8 21.81499 host node30 >> 6 hdd 7.26999 osd.6 up 1.00000 1.00000 >> 14 hdd 7.26999 osd.14 up 1.00000 1.00000 >> 22 hdd 7.27499 osd.22 up 1.00000 1.00000 >> -9 21.81499 host node31 >> 7 hdd 7.26999 osd.7 up 1.00000 1.00000 >> 15 hdd 7.26999 osd.15 up 1.00000 1.00000 >> 23 hdd 7.27499 osd.23 up 1.00000 1.00000 > > Trying to figure out what in my configuration is off, because I am told that > CephFS should be able to throttle the requests to match the underlying > storage medium and not create such an extensive log jam. > >> [mds] >> mds_cache_size = 0 >> mds_cache_memory_limit = 8589934592 >> >> [osd] >> osd_op_threads = 4 >> filestore max sync interval = 30 >> osd_max_backfills = 10 >> osd_recovery_max_active = 16 >> osd_op_thread_suicide_timeout = 600 > > I originally had the mds_cache_size set to 10000000 from Jewel, but read that > it is better to 0 that and set limits in the mds_cache_memory_limit now. So I > set that to 8GB to see if that helped any. > > Because I haven’t seen anything less than I believe 4.13 kernel for the > Luminous capabilities CephFS kernel driver, everything is using Jewel > capabilities for CephFS. > >> $ ceph features >> { >> "mon": { >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 3 >> } >> }, >> "mds": { >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 2 >> } >> }, >> "osd": { >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 74 >> } >> }, >> "client": { >> "group": { >> "features": "0x107b84a842aca", >> "release": "hammer", >> "num": 2 >> }, >> "group": { >> "features": "0x40107b86a842ada", >> "release": "jewel", >> "num": 39 >> }, >> "group": { >> "features": "0x7010fb86aa42ada", >> "release": "jewel", >> "num": 1 >> }, >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 189 >> } >> } >> } > > > Any help is appreciated. > > Thanks, > > Reed > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com