Re: [ceph-users] Large storage nodes - best practices

James Harper Mon, 05 Aug 2013 17:07:02 -0700

> I am looking at evaluating ceph for use with large storage nodes (24-36 SATA
> disks per node, 3 or 4TB per disk, HBAs, 10G ethernet).
> 
> What would be the best practice for deploying this? I can see two main
> options.
> 
> (1) Run 24-36 osds per node. Configure ceph to replicate data to one or more
> other nodes. This means that if a disk fails, there will have to be an
> operational process to stop the osd, unmount and replace the disk, mkfs a
> new filesystem, mount it, and restart the osd - which could be more
> complicated and error-prone than a RAID swap would be.
> 
> (2) Combine the disks using some sort of RAID (or ZFS raidz/raidz2), and run
> one osd per node. In this case:
> * if I use RAID0 or LVM, then a single disk failure will cause all the data 
> on the
> node to be lost and rebuilt
> * if I use RAID5/6, then write performance is likely to be poor
> * if I use RAID10, then capacity is reduced by half; with ceph replication 
> each
> piece of data will be replicated 4 times (twice on one node, twice on the
> replica node)
> 
> It seems to me that (1) is what ceph was designed to achieve, maybe with 2
> or 3 replicas. Is this what's recommended?
>


There is a middle ground to consider - 12-18 OSD's each running on a pair of 
disks in a RAID1 configuration. This would reduce most disk failures to a 
simple disk swap (assuming an intelligent hardware RAID controller). Obviously 
you still have a 50% reduction in disk space, but you have the advantage that 
your filesystem never sees the bad disk and all the problems that can cause.

James

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Large storage nodes - best practices

Reply via email to