Re: [ceph-users] NAS on RBD

Quenten Grasso Tue, 09 Sep 2014 17:35:42 -0700

We have been using the NFS/Pacemaker/RBD Method for a while explains it a bit 
better here, http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
PS: Thanks Sebastien,

Our use case is vmware storage, So as I mentioned we've been running it for 
some time and we've had pretty mixed results. 
Pros are when it works it works really well!
Cons When it doesn't, I've had a couple of instances where the XFS volumes 
needed fsck and this took about 3 hours on a 4TB Volume. (Lesson learnt use 
smaller volumes)

ZFS RaidZ Option could be interesting but expensive if using say 3 Pools with 
2x replicas with a RBD volume from each and a RaidZ on top of that. (I assume 
you would use 3 Pools here so we don't end up with data in the same PG which 
may be corrupted.)

Currently we also use FreeNAS VM's which are backed via RBD w/ 3 replicas and 
ZFS Striped Volumes and iSCSI/NFS out of these. While not really HA seems 
mostly work be it FreeNAS iSCSI can get a bit cranky at times. 

We are moving towards another KVM Hypervisor such as proxmox for these vm's 
which don't quite fit into our Openstack environment instead of having to use 
"RBD Proxys"

Regards,
Quenten Grasso

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan 
Van Der Ster
Sent: Wednesday, 10 September 2014 12:54 AM
To: Michal Kozanecki
Cc: ceph-users@lists.ceph.com; Blair Bethwaite
Subject: Re: [ceph-users] NAS on RBD

> On 09 Sep 2014, at 16:39, Michal Kozanecki <mkozane...@evertz.com> wrote:
> On 9 September 2014 08:47, Blair Bethwaite <blair.bethwa...@gmail.com> wrote:
>> On 9 September 2014 20:12, Dan Van Der Ster <daniel.vanders...@cern.ch> 
>> wrote:
>>> One thing I’m not comfortable with is the idea of ZFS checking the data in 
>>> addition to Ceph. Sure, ZFS will tell us if there is a checksum error, but 
>>> without any redundancy at the ZFS layer there will be no way to correct 
>>> that error. Of course, the hope is that RADOS will ensure 100% data 
>>> consistency, but what happens if not?...
>> 
>> The ZFS checksumming would tell us if there has been any corruption, which 
>> as you've pointed out shouldn't happen anyway on top of Ceph.
> 
> Just want to quickly address this, someone correct me if I'm wrong, but IIRC 
> even with replica value of 3 or more, ceph does not(currently) have any 
> intelligence when it detects a corrupted/"incorrect" PG, it will always 
> replace/repair the PG with whatever data is in the primary, meaning that if 
> the primary PG is the one that’s corrupted/bit-rotted/"incorrect", it will 
> replace the good replicas with the bad.  

According to the the "scrub error on firefly” thread, repair "tends to choose 
the copy with the lowest osd number which is not obviously corrupted.  Even 
with three replicas, it does not do any kind of voting at this time.”

Cheers, Dan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] NAS on RBD

Reply via email to