One thing I do right now for ceph performance testing is run a copy of
collectl during every test. This gives you a TON of information about
CPU usage, network stats, disk stats, etc. It's pretty easy to import
the output data into gnuplot. Mark Seger (the creator of collectl) also
has some tools to gather aggregate statistics across multiple nodes.
Beyond collectl, you can get a ton of useful data out of the ceph admin
socket. I especially like dump_historic_ops as it some times is enough
to avoid having to parse through debug 20 logs.
While the following tools have too much overhead to be really useful for
general system monitoring, they are really useful for specific
performance investiations:
1) perf with the dwarf/unwind support
2) blktrace (optionally with seekwatcher)
3) valgrind (cachegrind, callgrind, massif)
Beyond that, there are some collectd plugins for Ceph and last time I
checked DreamHost was using Graphite for a lot of visualizations.
There's always ganglia too. :)
Mark
On 04/12/2014 09:41 AM, Jason Villalta wrote:
I know ceph throws some warnings if there is high write latency. But i
would be most intrested in the delay for io requests, linking directly
to iops. If iops start to drop because the disk are overwhelmed then
latency for requests would be increasing. This would tell me that I
need to add more OSDs/Nodes. I am not sure there is a specific metric
in ceph for this but it would be awesome if there was.
On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier <greg.poir...@opower.com
<mailto:greg.poir...@opower.com>> wrote:
Curious as to how you define cluster latency.
On Sat, Apr 12, 2014 at 7:21 AM, Jason Villalta <ja...@rubixnet.com
<mailto:ja...@rubixnet.com>> wrote:
Hi, i have not don't anything with metrics yet but the only ones
I personally would be interested in is total capacity
utilization and cluster latency.
Just my 2 cents.
On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier
<greg.poir...@opower.com <mailto:greg.poir...@opower.com>> wrote:
I'm in the process of building a dashboard for our Ceph
nodes. I was wondering if anyone out there had instrumented
their OSD / MON clusters and found particularly useful
visualizations.
At first, I was trying to do ridiculous things (like
graphing % used for every disk in every OSD host), but I
realized quickly that that is simply too many metrics and
far too visually dense to be useful. I am attempting to put
together a few simpler, more dense visualizations like...
overcall cluster utilization, aggregate cpu and memory
utilization per osd host, etc.
Just looking for some suggestions. Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
--
*/Jason Villalta/*
Co-founder
Inline image 1
800.799.4407x1230 | www.RubixTechnology.com
<http://www.rubixtechnology.com/>
--
--
*/Jason Villalta/*
Co-founder
Inline image 1
800.799.4407x1230 | www.RubixTechnology.com
<http://www.rubixtechnology.com/>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com