Re: [ceph-users] backing Hadoop with Ceph ??

Shane Gibson Wed, 15 Jul 2015 11:49:52 -0700

Somnath - thanks for the reply ...

:-)  Haven't tried anything yet - just starting to gather info/input/direction 
for this solution.


Looking at the S3 API info [2] - there is no mention of support for the "S3a" 
API extensions - namely "rename" support.  The problem with backing via S3 API 
- if you need to rename a large (say multi GB) data object - you have to copy 
to new name and delete - this is a very IO expensive operation - and something 
we do a lot of.  That in and of itself might be a deal breaker ...   Any 
idea/input/intention of supporting the S3a exentsions within the RadosGW S3 API 
implementation?

Plus - it seems like it's considered a "bad idea" to back Hadoop via S3 (and 
indirectly Ceph via RGW) [3]; though not sure if the architectural differences 
from Amazon's S3 implementation and the far superior Ceph make it more 
palatable?

~~shane

[2] http://ceph.com/docs/master/radosgw/s3/
[3] https://wiki.apache.org/hadoop/AmazonS3



On 7/15/15, 9:50 AM, "Somnath Roy" 
<somnath....@sandisk.com<mailto:somnath....@sandisk.com>> wrote:

Did you try to integrate ceph +rgw+s3 with Hadoop?

Sent from my iPhone

On Jul 15, 2015, at 8:58 AM, Shane Gibson 
<shane_gib...@symantec.com<mailto:shane_gib...@symantec.com>> wrote:



We are in the (very) early stages of considering testing backing Hadoop via 
Ceph - as opposed to HDFS.  I've seen a few very vague references to doing 
that, but haven't found any concrete info (architecture, configuration 
recommendations, gotchas, lessons learned, etc...).   I did find the 
ceph.com/docs/<http://ceph.com/docs/> info [1] which discusses use of CephFS 
for backing Hadoop - but this would be foolish for production clusters given 
that CephFS isn't yet considered production quality/grade.

Does anyone in the ceph-users community have experience with this that they'd 
be willing to share?   Preferably ... via use of Ceph - not via CephFS...but I 
am interested in any CephFS related experiences too.

If we were to do this, and Ceph proved out as a backing store to Hadoop - there 
is the potential to be creating a fairly large multi-Petabyte (100s ??) class 
backing store for Ceph.  We do a very large amount of analytics on a lot of 
data sets for security trending correlations, etc...

Our current Ceph experience is limited to a few small (90 x 4TB OSD size) 
clusters - which we are working towards putting in production for Glance/Cinder 
backing and for Block storage for various large storage need platforms (eg 
software and package repo/mirrors, etc...).

Thanks in  advance for any input, thoughts, or pointers ...

~~shane

[1] http://ceph.com/docs/master/cephfs/hadoop/


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] backing Hadoop with Ceph ??

Reply via email to