Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

Zack Brenton Thu, 07 Mar 2019 11:03:09 -0800

On Thu, Mar 7, 2019 at 2:38 PM Patrick Donnelly <pdonn...@redhat.com> wrote:


> Is this with one active MDS and one standby-replay? The graph is odd
> to me because the session count shows sessions on fs-b and fs-d but
> not fs-c. Or maybe max_mds=2 and fs-d has no activity and fs-c is
> standby-replay?
>

The graphs were taken when we were running with 2 active MDS and 2
standby-replay. Currently we're running with 1 active and 1 standby-replay.


> Three OSDs are going to really struggle with the client load you're
> putting on it. It doesn't surprise me you are getting slow requests
> warning on the MDS for this reason. When you were running Luminous
> 12.2.9+ or Mimic 13.2.2+, were you seeing slow metadata I/O warnings?
> Even if you did not, it possible that the MDS is delayed issuing caps
> to clients because it's waiting for another client to flush writes and
> release conflicting caps.
>

We didn't see any slow metadata I/O warnings, but this is the sort of thing
that I've been suspecting is the underlying issue. However, one thing to
note is that my current test only walks a directory and reads all the files
in it, and it's running on a dev cluster that only I'm using, so I'm not
sure which client would be generating writes that the MDS would be waiting
on.


> Generally we recommend that the metadata pool be located on OSDs with
> fast devices separate from the data pool. This avoids priority
> inversion of MDS metadata I/O with data I/O. See [1] to configure the
> metadata pool on a separate set of OSDs.
>
> Also, you're not going to saturate a 1.9TB NVMe SSD with one OSD. You
> must partition it and setup multiple OSDs. This ends up being positive
> for you so that you can put the metadata pool on its own set of OSDs.
>
> [1] https://ceph.com/community/new-luminous-crush-device-classes/


This is excellent info, especially the second point as it lets me split out
the metadata pool onto separate OSDs without spinning up any additional
resources or using EBS volumes. I will look into partitioning the NVMe
drives, split out the metadata pool, and check how this impacts performance.

Thanks,
Zack

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

Reply via email to