Re: [ceph-users] CRUSH map advice

John Morris Mon, 11 Aug 2014 23:29:14 -0700

On 08/11/2014 08:26 PM, Craig Lewis wrote:

Your MON nodes are separate hardware from the OSD nodes, right?


Two nodes are OSD + MON, plus a separate MON node.

If so,
with replication=2, you should be able to shut down one of the two OSD
nodes, and everything will continue working.

IIUC, the third MON node is sufficient for a quorum if one of the OSD +MON nodes shuts down, is that right?

Replication=2 is a little worrisome, since we've already seen two diskssimultaneously fail just in the year the cluster has been running. Thatstatistically unlikely situation is the first and probably last timeI'll see that, but they say lightning can strike twice....

Since it's for
experimentation, I wouldn't deal with the extra hassle of replication=4
and custom CRUSH rules to make it work.  If you have your heart set on
that, it should be possible.  I'm no CRUSH expert though, so I can't say
for certain until I've actually done it.

I'm a bit confused why your performance is horrible though.  I'm
assuming your HDDs are 7200 RPM.  With the SSD journals and
replication=3, you won't have a ton of IO, but you shouldn't have any
problem doing > 100 MB/s with 4 MB blocks.  Unless your SSDs are very
low quality, the HDDs should be your bottleneck.

The below setup is tomorrow's plan; today's reality is 3 OSDs on onenode and 2 OSDs on another, crappy SSDs, 1Gb networks, pgs stuck uncleanand no monitoring to pinpoint bottlenecks. My work is cut out for me. :)

Thanks for the helpful reply. I wish we could just add a third OSD nodeand have these issues just go away, but it's not in the budget ATM.


        John





On Fri, Aug 8, 2014 at 10:24 PM, John Morris <j...@zultron.com
<mailto:j...@zultron.com>> wrote:

    Our experimental Ceph cluster is performing terribly (with the
    operator to blame!), and while it's down to address some issues, I'm
    curious to hear advice about the following ideas.

    The cluster:
    - two disk nodes (6 * CPU, 16GB RAM each)
    - 8 OSDs (4 each)
    - 3 monitors
    - 10Gb front + back networks
    - 2TB Enterprise SATA drives
    - HP RAID controller w/battery-backed cache
    - one SSD journal drive for each two OSDs

    First, I'd like to play with taking one machine down, but with the
    other node continuing to serve the cluster.  To maintain redundancy
    in this scenario, I'm thinking of setting the pool size to 4 and the
    min_size to 2, with the idea that a proper CRUSH map should always
    keep two copies on each disk node.  Again, *this is for
    experimentation* and probably raises red flags for production, but
    I'm just asking if it's *possible*:  Could one node go down and the
    other node continue to serve r/w data?  Any anecdotes of performance
    differences between size=4 and size=3 in other clusters?

    Second, does it make any sense to divide the CRUSH map into an extra
    level for the SSD disks, which each hold journals for two OSDs?
      This might increase redundancy in case of a journal disk failure,
    but ISTR something about too few OSDs in a bucket causing problems
    with the CRUSH algorithm.

    Thanks-

             John
    _________________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH map advice

Reply via email to