Re: [ceph-users] Large storage nodes - best practices

Scottix Mon, 05 Aug 2013 17:50:38 -0700

In the previous email, you are forgetting Raid1 has a write penalty of 2
since it is mirroring and now we are talking about different types of raid
and nothing really to do about Ceph. One of the main advantages of Ceph is
to have data replicated so you don't have to do Raid to that degree. I am
sure there is math to do this but larger quantity of smaller nodes have
better fail-over than a few large nodes. If you are competing over CPU
resources then you can use Raid0 with minimal write penalty (never thought
I suggest Raid0 haha). You may not max out the drive speed because of CPU
but that is the cost of switching to a data system the machine was not
intended for. It would be good information to know the limits of what a
machine could do with Ceph, so please do share if you do some tests.


Overall from my understanding it is generally better to move to the ideal
node size for Ceph then slowly deprecate the larger nodes also
fundamentally since replication is done at a higher level than individual
spinners. The idea of doing raid falls farther behind.


On Mon, Aug 5, 2013 at 5:05 PM, James Harper
<james.har...@bendigoit.com.au>wrote:

> > I am looking at evaluating ceph for use with large storage nodes (24-36
> SATA
> > disks per node, 3 or 4TB per disk, HBAs, 10G ethernet).
> >
> > What would be the best practice for deploying this? I can see two main
> > options.
> >
> > (1) Run 24-36 osds per node. Configure ceph to replicate data to one or
> more
> > other nodes. This means that if a disk fails, there will have to be an
> > operational process to stop the osd, unmount and replace the disk, mkfs a
> > new filesystem, mount it, and restart the osd - which could be more
> > complicated and error-prone than a RAID swap would be.
> >
> > (2) Combine the disks using some sort of RAID (or ZFS raidz/raidz2), and
> run
> > one osd per node. In this case:
> > * if I use RAID0 or LVM, then a single disk failure will cause all the
> data on the
> > node to be lost and rebuilt
> > * if I use RAID5/6, then write performance is likely to be poor
> > * if I use RAID10, then capacity is reduced by half; with ceph
> replication each
> > piece of data will be replicated 4 times (twice on one node, twice on the
> > replica node)
> >
> > It seems to me that (1) is what ceph was designed to achieve, maybe with
> 2
> > or 3 replicas. Is this what's recommended?
> >
>
> There is a middle ground to consider - 12-18 OSD's each running on a pair
> of disks in a RAID1 configuration. This would reduce most disk failures to
> a simple disk swap (assuming an intelligent hardware RAID controller).
> Obviously you still have a 50% reduction in disk space, but you have the
> advantage that your filesystem never sees the bad disk and all the problems
> that can cause.
>
> James
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Follow Me: @Scottix <http://www.twitter.com/scottix>
http://about.me/scottix
scot...@gmail.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Large storage nodes - best practices

Reply via email to