Re: [ceph-users] migrating cephfs metadata pool from spinning disk to SSD.

Bob Ababurko Thu, 06 Aug 2015 17:37:50 -0700

@John,

Can you clarify which values would suggest that my metadata pool is too
slow?   I have added a link that includes values for the "op_active"
& "handle_client_request"....gathered in a crude fashion but should
hopefully give enough data to paint a picture of what is happening.


http://pastebin.com/5zAG8VXT

thanks in advance,
Bob

On Thu, Aug 6, 2015 at 1:24 AM, Bob Ababurko <b...@ababurko.net> wrote:

> I should have probably condensed my finding over the course of the day
> into one post but, I guess that just not how i'm built.....
>
> Another data point.  I ran the `ceph daemon mds.cephmds02 perf dump` in a
> while loop w/ 1 second sleep and grepping out the stats John mentioned and
> at times(~every 10-15 seconds), I have some large objector.op_active
> values.  After the high values hit, there are 5-10 seconds of zero values.
>
>     "handle_client_request": 5785438,
>         "op_active": 2375,
>         "handle_client_request": 5785438,
>         "op_active": 2444,
>         "handle_client_request": 5785438,
>         "op_active": 2239,
>         "handle_client_request": 5785438,
>         "op_active": 1648,
>         "handle_client_request": 5785438,
>         "op_active": 1121,
>         "handle_client_request": 5785438,
>         "op_active": 709,
>         "handle_client_request": 5785438,
>         "op_active": 235,
>         "handle_client_request": 5785572,
>         "op_active": 0,
>    ...............
>
> Should I be concerned about these "op_active" values?  I see that in my
> narrow slice of output, "handle_client_request" does not increment.  What
> is happening there?
>
> thanks,
> Bob
>
> On Wed, Aug 5, 2015 at 11:43 PM, Bob Ababurko <b...@ababurko.net> wrote:
>
>> I found a way to get the stats you mentioned: 
>> mds_server.handle_client_request
>> & objecter.op_active.  I can see these values when I run:
>>
>> ceph daemon mds.<id> perf dump
>>
>> I recently restarted the mds server so my stats reset but I still have
>> something to share:
>>
>> "mds_server.handle_client_request": 4406055
>> "objecter.op_active": 0
>>
>> Should I assume that op_active might be operations in writes or reads
>> that are queued?  I haven't been able to find anything describing what
>> these stats actually mean so if anyone knows where to find them, please
>> advise.
>>
>> On Wed, Aug 5, 2015 at 4:59 PM, Bob Ababurko <b...@ababurko.net> wrote:
>>
>>> I have installed diamond(built by ksingh found at
>>> https://github.com/ksingh7/ceph-calamari-packages) on the MDS node and
>>> I am not seeing the mds_server.handle_client_request OR objecter.op_active
>>> metrics being sent to graphite.  Mind you, this is not the graphite that is
>>> part of the calamari install but our own internal graphite cluster.
>>> Perhaps that is the reason?  I could not get calamari working correctly on
>>> hammerhead/centos7.1 so I put it on pause for now to concentrate on the
>>> cluster itself.
>>>
>>> Ultimately, I need to find a way to get a hold of these metrics to
>>> determine the health of my MDS so I can justify moving forward on a SSD
>>> based cephfs metadata pool.
>>>
>>> On Wed, Aug 5, 2015 at 4:05 PM, Bob Ababurko <b...@ababurko.net> wrote:
>>>
>>>> Hi John,
>>>>
>>>> You are correct in that my expectations may be incongruent with what is
>>>> possible with ceph(fs).  I'm currently copying many small files(images)
>>>> from a netapp to the cluster...~35k sized files to be exact and the number
>>>> of objects/files copied thus far is fairly significant(below in bold):
>>>>
>>>> [bababurko@cephmon01 ceph]$ sudo rados df
>>>> pool name                 KB      objects       clones     degraded
>>>>  unfound           rd        rd KB           wr        wr KB
>>>> cephfs_data       3289284749    *163993660*            0            0
>>>>           0            0            0    328097038   3369847354
>>>> cephfs_metadata       133364       524363            0            0
>>>>       0      3600023   5264453980     95600004   1361554516
>>>> rbd                        0            0            0            0
>>>>       0            0            0            0            0
>>>>   total used      9297615196    164518023
>>>>   total avail    19990923044
>>>>   total space    29288538240
>>>>
>>>> Yes, that looks like ~164 million objects copied to the cluster.  I
>>>> would assume this will potentially be a burden to the MDS but I have yet to
>>>> confirm with the ceph daemontool mds.<id>.  I cannot seem to run it on the
>>>> mds host as it doesn't seem to know about that command:
>>>>
>>>> [bababurko@cephmds01]$ sudo ceph daemonperf mds.cephmds01
>>>> no valid command found; 10 closest matches:
>>>> osd lost <int[0-]> {--yes-i-really-mean-it}
>>>> osd create {<uuid>}
>>>> osd primary-temp <pgid> <id>
>>>> osd primary-affinity <osdname (id|osd.id)> <float[0.0-1.0]>
>>>> osd reweight <int[0-]> <float[0.0-1.0]>
>>>> osd pg-temp <pgid> {<id> [<id>...]}
>>>> osd in <ids> [<ids>...]
>>>> osd rm <ids> [<ids>...]
>>>> osd down <ids> [<ids>...]
>>>> osd out <ids> [<ids>...]
>>>> Error EINVAL: invalid command
>>>>
>>>> This fails in a similar manner on all the hosts in the cluster.  I'm
>>>> very green w/ ceph and i'm probably missing something obvious.  Is there
>>>> something I need to install to get access to the 'ceph daemonperf' command
>>>> in hammerhead?
>>>>
>>>> thanks,
>>>> Bob
>>>>
>>>> On Wed, Aug 5, 2015 at 2:43 AM, John Spray <jsp...@redhat.com> wrote:
>>>>
>>>>> On Tue, Aug 4, 2015 at 10:36 PM, Bob Ababurko <b...@ababurko.net>
>>>>> wrote:
>>>>> > My writes are not going as I would expect wrt to IOPS(50-1000 IOPs)
>>>>> & write
>>>>> > throughput( ~25MB/s max).  I'm interested in understanding what it
>>>>> takes to
>>>>> > create a SSD pool that I can then migrate the current
>>>>> Cephfs_metadata pool
>>>>> > to.  I suspect that the spinning disk metadata pool is a bottleneck
>>>>> and I
>>>>> > want to try to get the max performance out of this cluster to prove
>>>>> that we
>>>>> > would build out a larger version.  One caveat is that I have copied
>>>>> about 4
>>>>> > TB of data to the cluster via cephfs and dont want to lose the data
>>>>> so I
>>>>> > obviously need to keep the metadata intact.
>>>>>
>>>>> I'm a bit suspicious of this: your IOPS expectations sort of imply
>>>>> doing big files, but you're then suggesting that metadata is the
>>>>> bottleneck (i.e. small file workload).
>>>>>
>>>>> There are lots of statistics that come out of the MDS, you may be
>>>>> particular interested in mds_server.handle_client_request,
>>>>> objecter.op_active, to work out if there really are lots of RADOS
>>>>> operations getting backed up on the MDS (which would be the symptom of
>>>>> a too-slow metadata pool).  "ceph daemonperf mds.<id>" may be some
>>>>> help if you don't already have graphite or similar set up.
>>>>>
>>>>> > If anyone has done this OR understands how this can be done, I would
>>>>> > appreciate the advice.
>>>>>
>>>>> You could potentially do this in a two-phase process where you
>>>>> initially set a crush rule that includes both SSDs and spinners, and
>>>>> then finally set a crush rule that just points to SSDs.  Obviously
>>>>> that'll do lots of data movement, but your metadata is probably a fair
>>>>> bit smaller than your data so that might be acceptable.
>>>>>
>>>>> John
>>>>>
>>>>
>>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] migrating cephfs metadata pool from spinning disk to SSD.

Reply via email to