[ceph-users] CephFS unresponsive at scale (2M files,

Kevin Sumner Mon, 17 Nov 2014 15:45:06 -0800

I’ve got a test cluster together with a ~500 OSDs and, 5 MON, and 1 MDS.  All 
the OSDs also mount CephFS at /ceph.  I’ve got Graphite pointing at a space 
under /ceph.  Over the weekend, I drove almost 2 million metrics, each of which 
creates a ~3MB file in a hierarchical path, each sending a datapoint into the 
metric file once a minute.  CephFS seemed to handle the writes ok while I was 
driving load.  All files containing each metric are at paths like this:


/ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp

Today, however, with the load generator still running, reading metadata of 
files (e.g. directory entries and stat(2) info) in the filesystem (presumably 
MDS-managed data) seems nearly impossible, especially deeper into the tree.  
For example, in a shell cd seems to work but ls hangs, seemingly indefinitely.  
After turning off the load generator and allowing a while for things to settle 
down, everything seems to behave better.

ceph status and ceph health both return good statuses the entire time.  During 
load generation, the ceph-mds process seems pegged at between 100% and 150%, 
but with load generation turned off, the process has some high variability from 
near-idle up to similar 100-150% CPU.

Hopefully, I’ve missed something in the CephFS tuning.  However, I’m looking 
for direction on figuring out if it is, indeed, a tuning problem or if this 
behavior is a symptom of the “not ready for production” banner in the 
documentation.
--
Kevin Sumner
ke...@sumner.io

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS unresponsive at scale (2M files,

Reply via email to