Re: [ceph-users] CephFS log jam prevention

Dan Jakubiec Tue, 05 Dec 2017 08:59:32 -0800

To add a little color here... we started an rsync last night to copy about 4TB 
worth of files to CephFS.  Paused it this morning because CephFS was 
unresponsive on the machine (e.g. can't cat a file from the filesystem).


Been waiting about 3 hours for the log jam to clear.  Slow requests have 
steadily decreased but still can't cat a file.

Seems like there should be something throttling the rsync operation to prevent 
the queues from backing up so far.  Is this is configuration problem or a bug?

From reading the Ceph docs, this seems to be the most telling:

mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by clients, 
1 stray files

[Ref: http://docs.ceph.com/docs/master/cephfs/cache-size-limits/]

"Be aware that the cache limit is not a hard limit. Potential bugs in the 
CephFS client or MDS or misbehaving applications might cause the MDS to exceed 
its cache size. The  mds_health_cache_threshold configures the cluster health 
warning message so that operators can investigate why the MDS cannot shrink its 
cache."

Any suggestions?

Thanks,

-- Dan



> On Dec 5, 2017, at 10:07, Reed Dier <reed.d...@focusvq.com> wrote:
> 
> Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD 
> backed CephFS pool.
> 
> Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running 
> mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and 
> clients.
> 
>> $ ceph versions
>> {
>>     "mon": {
>>         "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
>> luminous (stable)": 3
>>     },
>>     "mgr": {
>>         "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
>> luminous (stable)": 3
>>     },
>>     "osd": {
>>         "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
>> luminous (stable)": 74
>>     },
>>     "mds": {
>>         "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
>> luminous (stable)": 2
>>     },
>>     "overall": {
>>         "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
>> luminous (stable)": 82
>>     }
>> }
> 
>>  
>> <https://www.anandtech.com/show/12116/amd-and-microsoft-announce-azure-vms-with-32core-epyc-cpus>HEALTH_ERR
>>  1 MDSs report oversized cache; 1 MDSs have many clients failing to respond 
>> to cache pressure; 1 MDSs behind on tr
>> imming; noout,nodeep-scrub flag(s) set; application not enabled on 1 
>> pool(s); 242 slow requests are blocked > 32 sec
>> ; 769378 stuck requests are blocked > 4096 sec
>> MDS_CACHE_OVERSIZED 1 MDSs report oversized cache
>>     mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by 
>> clients, 1 stray files
>> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache 
>> pressure
>>     mdsdb(mds.0): Many clients (37) failing to respond to cache 
>> pressureclient_count: 37
>> MDS_TRIM 1 MDSs behind on trimming
>>     mdsdb(mds.0): Behind on trimming (36252/30)max_segments: 30, 
>> num_segments: 36252
>> OSDMAP_FLAGS noout,nodeep-scrub flag(s) set
>> REQUEST_SLOW 242 slow requests are blocked > 32 sec
>>     236 ops are blocked > 2097.15 sec
>>     3 ops are blocked > 1048.58 sec
>>     2 ops are blocked > 524.288 sec
>>     1 ops are blocked > 32.768 sec
>> REQUEST_STUCK 769378 stuck requests are blocked > 4096 sec
>>     91 ops are blocked > 67108.9 sec
>>     121258 ops are blocked > 33554.4 sec
>>     308189 ops are blocked > 16777.2 sec
>>     251586 ops are blocked > 8388.61 sec
>>     88254 ops are blocked > 4194.3 sec
>>     osds 0,1,3,6,8,12,15,16,17,21,22,23 have stuck requests > 16777.2 sec
>>     osds 4,7,9,10,11,14,18,20 have stuck requests > 33554.4 sec
>>     osd.13 has stuck requests > 67108.9 sec
> 
> This is across 8 nodes, holding 3x 8TB HDD’s each, all backed by Intel P3600 
> NVMe drives for journaling.
> Removed SSD OSD’s for brevity.
> 
>> $ ceph osd tree
>> ID  CLASS WEIGHT    TYPE NAME                         STATUS REWEIGHT PRI-AFF
>> -13        87.28799 root ssd
>>  -1       174.51500 root default
>> -10       174.51500     rack default.rack2
>> -55        43.62000         chassis node2425
>>  -2        21.81000             host node24
>>   0   hdd   7.26999                 osd.0                 up  1.00000 1.00000
>>   8   hdd   7.26999                 osd.8                 up  1.00000 1.00000
>>  16   hdd   7.26999                 osd.16                up  1.00000 1.00000
>>  -3        21.81000             host node25
>>   1   hdd   7.26999                 osd.1                 up  1.00000 1.00000
>>   9   hdd   7.26999                 osd.9                 up  1.00000 1.00000
>>  17   hdd   7.26999                 osd.17                up  1.00000 1.00000
>> -56        43.63499         chassis node2627
>>  -4        21.81999             host node26
>>   2   hdd   7.27499                 osd.2                 up  1.00000 1.00000
>>  10   hdd   7.26999                 osd.10                up  1.00000 1.00000
>>  18   hdd   7.27499                 osd.18                up  1.00000 1.00000
>>  -5        21.81499             host node27
>>   3   hdd   7.26999                 osd.3                 up  1.00000 1.00000
>>  11   hdd   7.26999                 osd.11                up  1.00000 1.00000
>>  19   hdd   7.27499                 osd.19                up  1.00000 1.00000
>> -57        43.62999         chassis node2829
>>  -6        21.81499             host node28
>>   4   hdd   7.26999                 osd.4                 up  1.00000 1.00000
>>  12   hdd   7.26999                 osd.12                up  1.00000 1.00000
>>  20   hdd   7.27499                 osd.20                up  1.00000 1.00000
>>  -7        21.81499             host node29
>>   5   hdd   7.26999                 osd.5                 up  1.00000 1.00000
>>  13   hdd   7.26999                 osd.13                up  1.00000 1.00000
>>  21   hdd   7.27499                 osd.21                up  1.00000 1.00000
>> -58        43.62999         chassis node3031
>>  -8        21.81499             host node30
>>   6   hdd   7.26999                 osd.6                 up  1.00000 1.00000
>>  14   hdd   7.26999                 osd.14                up  1.00000 1.00000
>>  22   hdd   7.27499                 osd.22                up  1.00000 1.00000
>>  -9        21.81499             host node31
>>   7   hdd   7.26999                 osd.7                 up  1.00000 1.00000
>>  15   hdd   7.26999                 osd.15                up  1.00000 1.00000
>>  23   hdd   7.27499                 osd.23                up  1.00000 1.00000
> 
> Trying to figure out what in my configuration is off, because I am told that 
> CephFS should be able to throttle the requests to match the underlying 
> storage medium and not create such an extensive log jam. 
> 
>> [mds]
>> mds_cache_size = 0
>> mds_cache_memory_limit = 8589934592
>> 
>> [osd]
>> osd_op_threads = 4
>> filestore max sync interval = 30
>> osd_max_backfills = 10
>> osd_recovery_max_active = 16
>> osd_op_thread_suicide_timeout = 600
> 
> I originally had the mds_cache_size set to 10000000 from Jewel, but read that 
> it is better to 0 that and set limits in the mds_cache_memory_limit now. So I 
> set that to 8GB to see if that helped any.
> 
> Because I haven’t seen anything less than I believe 4.13 kernel for the 
> Luminous capabilities CephFS kernel driver, everything is using Jewel 
> capabilities for CephFS.
> 
>> $ ceph features
>> {
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 3
>>         }
>>     },
>>     "mds": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 2
>>         }
>>     },
>>     "osd": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 74
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0x107b84a842aca",
>>             "release": "hammer",
>>             "num": 2
>>         },
>>         "group": {
>>             "features": "0x40107b86a842ada",
>>             "release": "jewel",
>>             "num": 39
>>         },
>>         "group": {
>>             "features": "0x7010fb86aa42ada",
>>             "release": "jewel",
>>             "num": 1
>>         },
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 189
>>         }
>>     }
>> }
> 
> 
> Any help is appreciated.
> 
> Thanks,
> 
> Reed
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS log jam prevention

Reply via email to