I found a way to get the stats you mentioned: mds_server.handle_client_request & objecter.op_active. I can see these values when I run:
ceph daemon mds.<id> perf dump I recently restarted the mds server so my stats reset but I still have something to share: "mds_server.handle_client_request": 4406055 "objecter.op_active": 0 Should I assume that op_active might be operations in writes or reads that are queued? I haven't been able to find anything describing what these stats actually mean so if anyone knows where to find them, please advise. On Wed, Aug 5, 2015 at 4:59 PM, Bob Ababurko <b...@ababurko.net> wrote: > I have installed diamond(built by ksingh found at > https://github.com/ksingh7/ceph-calamari-packages) on the MDS node and I > am not seeing the mds_server.handle_client_request OR objecter.op_active > metrics being sent to graphite. Mind you, this is not the graphite that is > part of the calamari install but our own internal graphite cluster. > Perhaps that is the reason? I could not get calamari working correctly on > hammerhead/centos7.1 so I put it on pause for now to concentrate on the > cluster itself. > > Ultimately, I need to find a way to get a hold of these metrics to > determine the health of my MDS so I can justify moving forward on a SSD > based cephfs metadata pool. > > On Wed, Aug 5, 2015 at 4:05 PM, Bob Ababurko <b...@ababurko.net> wrote: > >> Hi John, >> >> You are correct in that my expectations may be incongruent with what is >> possible with ceph(fs). I'm currently copying many small files(images) >> from a netapp to the cluster...~35k sized files to be exact and the number >> of objects/files copied thus far is fairly significant(below in bold): >> >> [bababurko@cephmon01 ceph]$ sudo rados df >> pool name KB objects clones degraded >> unfound rd rd KB wr wr KB >> cephfs_data 3289284749 *163993660* 0 0 >> 0 0 0 328097038 3369847354 >> cephfs_metadata 133364 524363 0 0 >> 0 3600023 5264453980 95600004 1361554516 >> rbd 0 0 0 0 >> 0 0 0 0 0 >> total used 9297615196 164518023 >> total avail 19990923044 >> total space 29288538240 >> >> Yes, that looks like ~164 million objects copied to the cluster. I would >> assume this will potentially be a burden to the MDS but I have yet to >> confirm with the ceph daemontool mds.<id>. I cannot seem to run it on the >> mds host as it doesn't seem to know about that command: >> >> [bababurko@cephmds01]$ sudo ceph daemonperf mds.cephmds01 >> no valid command found; 10 closest matches: >> osd lost <int[0-]> {--yes-i-really-mean-it} >> osd create {<uuid>} >> osd primary-temp <pgid> <id> >> osd primary-affinity <osdname (id|osd.id)> <float[0.0-1.0]> >> osd reweight <int[0-]> <float[0.0-1.0]> >> osd pg-temp <pgid> {<id> [<id>...]} >> osd in <ids> [<ids>...] >> osd rm <ids> [<ids>...] >> osd down <ids> [<ids>...] >> osd out <ids> [<ids>...] >> Error EINVAL: invalid command >> >> This fails in a similar manner on all the hosts in the cluster. I'm very >> green w/ ceph and i'm probably missing something obvious. Is there >> something I need to install to get access to the 'ceph daemonperf' command >> in hammerhead? >> >> thanks, >> Bob >> >> On Wed, Aug 5, 2015 at 2:43 AM, John Spray <jsp...@redhat.com> wrote: >> >>> On Tue, Aug 4, 2015 at 10:36 PM, Bob Ababurko <b...@ababurko.net> wrote: >>> > My writes are not going as I would expect wrt to IOPS(50-1000 IOPs) & >>> write >>> > throughput( ~25MB/s max). I'm interested in understanding what it >>> takes to >>> > create a SSD pool that I can then migrate the current Cephfs_metadata >>> pool >>> > to. I suspect that the spinning disk metadata pool is a bottleneck >>> and I >>> > want to try to get the max performance out of this cluster to prove >>> that we >>> > would build out a larger version. One caveat is that I have copied >>> about 4 >>> > TB of data to the cluster via cephfs and dont want to lose the data so >>> I >>> > obviously need to keep the metadata intact. >>> >>> I'm a bit suspicious of this: your IOPS expectations sort of imply >>> doing big files, but you're then suggesting that metadata is the >>> bottleneck (i.e. small file workload). >>> >>> There are lots of statistics that come out of the MDS, you may be >>> particular interested in mds_server.handle_client_request, >>> objecter.op_active, to work out if there really are lots of RADOS >>> operations getting backed up on the MDS (which would be the symptom of >>> a too-slow metadata pool). "ceph daemonperf mds.<id>" may be some >>> help if you don't already have graphite or similar set up. >>> >>> > If anyone has done this OR understands how this can be done, I would >>> > appreciate the advice. >>> >>> You could potentially do this in a two-phase process where you >>> initially set a crush rule that includes both SSDs and spinners, and >>> then finally set a crush rule that just points to SSDs. Obviously >>> that'll do lots of data movement, but your metadata is probably a fair >>> bit smaller than your data so that might be acceptable. >>> >>> John >>> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com