Hi Blair,

> On 09 Sep 2014, at 09:05, Blair Bethwaite <blair.bethwa...@gmail.com> wrote:
> 
> Hi folks,
> 
> In lieu of a prod ready Cephfs I'm wondering what others in the user
> community are doing for file-serving out of Ceph clusters (if at all)?
> 
> We're just about to build a pretty large cluster - 2PB for file-based
> NAS and another 0.5PB rgw. For the rgw component we plan to dip our
> toes in and use an EC backing pool with a ~25TB (usable) 10K SAS + SSD
> cache tier.
> 
> For the file storage we're looking at mounting RBDs (out of a standard
> 3-replica pool for now) on a collection of presentation nodes, which
> will use ZFS to stripe together those RBD vdevs into a zpool which we
> can then carve datasets out of for access from NFS & CIFS clients.
> Those presentation servers will have some PCIe SSD in them for ZIL and
> L2ARC devices, and clients will be split across them depending on what
> ID domain they are coming from. Presentation server availability
> issues will be handled by mounting the relevant zpool on a spare
> server, so it won't be HA from a client perspective, but I can't see a
> way to getting this with an RBD backend.
> 
> Wondering what the collective wisdom has to offer on such a setup…
> 

We do this for some small scale NAS use-cases, with ZFS running in a VM with 
rbd volumes. The performance is not great (especially since we throttle the 
IOPS of our RBD). We also tried a few kRBD / ZFS servers with an SSD ZIL — the 
SSD solves any performance problem we ever had with ZFS on RBD.

I would say though that this setup is rather adventurous. ZoL is not rock solid 
— we’ve had a few lockups in testing, all of which have been fixed in the 
latest ZFS code in git (my colleague in CC could elaborate if you’re 
interested).  One thing I’m not comfortable with is the idea of ZFS checking 
the data in addition to Ceph. Sure, ZFS will tell us if there is a checksum 
error, but without any redundancy at the ZFS layer there will be no way to 
correct that error. Of course, the hope is that RADOS will ensure 100% data 
consistency, but what happens if not?...

Personally, I think you’re very brave to consider running 2PB of ZoL on RBD. If 
I were you I would seriously evaluate the CephFS option. It used to be on the 
roadmap for ICE 2.0 coming out this fall, though I noticed its not there 
anymore (??!!!). Anyway I would say that ZoL on kRBD is not necessarily a more 
stable solution than CephFS. Even Gluster striped on top of RBD would probably 
be more stable than ZoL on RBD.

Cheers, Dan


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to