Re: [Openstack] Storage, glusterfs v ceph

Razique Mahroua Thu, 03 Oct 2013 04:29:27 -0700

Hi John, 
please find my answers below - just to clarify, you are talking about the 
storage for the instances right?




Le 3 oct. 2013 à 11:04, John Ashford <logica...@hotmail.com> a écrit :

> 1 – Glusterfs V Ceph
> Im reading a lot of different opinions about which of these is the best 
> storage backend. My need is for a fully stable product that has fault 
> tolerance built in. It needs to support maybe 400 low traffic web sites and a 
> few very high traffic. I saw a Redhat diag suggesting throughput on a Gbit 
> nic with 2 storage servers (Glusterfs) would be around 200Mbps. I can put 
> quad nics in the 2 or 3 storage machines to give extra breathing room. 
> Gluster is of course a mature product and has Redhat pushing it forward but 
> some people complain of speed issues. Any real life experiences of throughput 
> using Ceph? I know Ceph is new but it seems there is considerable weight 
> behind its development so while some say its not production ready I wonder if 
> anyone has the experience to refute/concur?

Based on my experience, and other folks around' experience, we all noticed that 
GlusterFS doesn't seem to cope well as a shared backend for the instances . 
With my setup, around 9 compute nodes, all connected through Gbit, and Sata 
drives, I didn't had the 200 Mpbs/sec. What was their protocol? Measuring 
access speed, latency, and read-time from the shared directory AND from the 
instance itself should give you a good idea of what to expect with your 
infrastructure.
Regarding CephFS, it's getting better, but last time I checked, everytime I was 
running either iozone or vdbench on that, the server kept crashing, and I had 
to manually rebooting it. Next to that, the MDS are not active/active (not sure 
about the Cuttlefish release though), which means that solely one is accessed 
when an object is requested. I always found Ceph somehow *better* than 
GlusterFS, more realiable (based on my testing), even if it's more complicated 
to manage - you have a real overwiew of what's going on within your cluster. 
Everytime I had to debug GlusterFS, it was a real pain ; not to mention I had 
data corrupted and even missing sometime more than one. 
Regarding the load, glusterfs deamons sucked out all the cpu constantly for
 replicating objects.

So I say, deploy both, test, and pick one (iozone, vdbench, dd are three good 
tools to bench your storage)
>  
> 2 – vm instances on clustered storage
You meant on local storage ?
> Im reading how if you run your vm instances on gluster/ceph you benefit from 
> live migration and faster access times since disk access is usually to the 
> local disk. I just need to clarify this – and this may fully expose my 
> ignorance -  but surely the instance runs on the compute node, not storage 
> node so I don’t get how people are claiming its faster running vm instance on 
> the storage cluster unless they are actually running compute on the storage 
> cluster in which case you don’t have proper separation of compute/storage. 
> Also, you would have the networking overhead unless running a compute node on 
> storage cluster? What am I missing?!
When you go with local storage, you lose some benefits, but gain others : 

Pros for local : 
        - More performance
        - Easier to manage (not any skill is necessary to manage the disk)
        - Easier to repair (mkfs_adm check, fsck, etc…)
        - Doesn't rely on network availability

Cons for local :        
        - Live migration takes much much longer since you will need to copy the 
disk over
        - Lose a node, lose everything : instances running on it, storage
        - if you lose the disk, without backup you are screwed. WIth backup, 
it'll take much time to restore
        - If you remove a file, you are screwed (lets say "terminate" instead 
of rebooting)

Pros for shared : 
        - Easier to balance the workload in minutes
        - More reliable since the file is copied in multiple locations
        - If you lose a node or a disk, you can retrieve the file 
        - MooseFS only feature : it has a trash, if you delete a file (let's 
say a terminated instance), you can browse the MooseFS trash (or 
"quanrantaine"). That's a sweet feature

Cons for shared : 
        - More costly in terms of configuration, network ports and knowledge
        - Some corruption can happen if the algorithm doesn't cope well with 
split-brains situations
        - Huge network-load (induced by the replicas, check, recycling, etc..) 
that requires a good architecting (dedicated network)

>  
>  Thanks
> John

Regards,
Razique

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Storage, glusterfs v ceph

Reply via email to