Re: [ceph-users] What are you doing to locate performance issues in a Ceph cluster?

Chris Kitzmiller Mon, 06 Apr 2015 21:37:30 -0700

On Apr 6, 2015, at 7:04 PM, Robert LeBlanc <rob...@leblancnet.us> wrote:
> I see that ceph has 'ceph osd perf' that gets the latency of the OSDs.
> Is there a similar command that would provide some performance data
> about RBDs in use? I'm concerned about out ability to determine which
> RBD(s) may be "abusing" our storage at any given time.
> 
> What are others doing to locate performance issues in their Ceph clusters?


I graph aggregate stats for `ceph --admin-daemon 
/var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too 
far outside of my mean latency I know to go look for the troublemaker. My 
graphs look something like this:



So on Thursday just before noon a drive dies. The blue min latency for all 
disks spikes up because all disks are recovering the data on the lost OSD. The 
min drops back down to normal pretty quickly but then the red max line spikes 
way up for that single new disk which replaced the dead drive. It stays pretty 
high until it is done moving data back to itself at which time it becomes 
normal again just before midnight.

I do this style of graphing because I have 30 OSDs per chassis and a chart with 
30 individual lines on it would be kind of tough to read. Though on less dense 
nodes that would probably be the way to go.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] What are you doing to locate performance issues in a Ceph cluster?

Reply via email to