Hi Zack,

Comments inline.

On 15-Apr-2014, at 9:24 pm, Zack <[email protected]> wrote:

> Issue being that ideally, I'd like to take advantage of the logic built into 
> Hadoop and Kafka to leverage JBOD for parallel I/O.  My worry is that zfs 
> drivers won't be as clever with application specific I/O optimizations.
> Z

The only IO optimisation I can think of would be parallization of reads/writes
across the spindles.

Looking at how AWS EMR works, it uses S3 instead of HDFS for its storage 
requirements.
The ephemeral disk(s) are only used during a job run. I feel this is a superior
scaling design when compared to using local disks.

Advantages:

(1) Save storage space by eliminating the need for HDFS 3x replication factor. 
Rather
than have 3 copies of each log, let the storage system handle dedup and 
striping across
the JBODs.
(2) Save on bandwidth between TOR switches
(3) Application lifecycle management becomes simple as there is no statefull 
data on
compute nodes. I can blow the data on the local disk without worrying about 
data loss
(4) Scale storage independent of compute nodes
(5) Parallel object PUT/GET requests is possible with object storage

(3) and (4) are really the deal clincher for me. One node going down should not 
bring
hadoop to a standstill and I should be able to manage storage requirements 
without
investing in additional compute nodes (hypervisors) or vice versa.

You can consider building a distributed storage service using your JBOD. These 
days,
most distributed storage systems (Gluster, RiakCS) provide an S3 interface and
stock hadoop (?) seems to support S3 natively now. Let Gluster/Riak handle
striping across the JBODs.

http://basho.com/riak-cloud-storage/
http://gluster.org/community/documentation/index.php/Hadoop
https://ceph.com/docs/master/cephfs/hadoop/

As for Kafka, its been a while since I tracked the project. I am not certain if 
it
supports object storage. Please do check.

On the whole, I would opt for a object storage as a service for your different
workloads than a app specific cloud.

Also given the adoption of EMR + S3 on AWS, object storage seems to be the way
to go forward.

YMMV.


--
@shankerbalan

M: +91 98860 60539 | O: +91 (80) 67935867
[email protected] | www.shapeblue.com | Twitter:@shapeblue
ShapeBlue Services India LLP, 22nd floor, Unit 2201A, World Trade Centre, 
Bangalore - 560 055

Need Enterprise Grade Support for Apache CloudStack?
Our CloudStack Infrastructure 
Support<http://shapeblue.com/cloudstack-infrastructure-support/> offers the 
best 24/7 SLA for CloudStack Environments.

Apache CloudStack Bootcamp training courses

**NEW!** CloudStack 4.2.1 training<http://shapeblue.com/cloudstack-training/>
28th-29th May 2014, Bangalore. 
Classromm<http://shapeblue.com/cloudstack-training/>
16th-20th June 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
23rd-27th June 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
15th-20th September 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
22nd-27th September 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
1st-6th December 2014, Region A. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>
8th-12th December 2014, Region B. Instructor led, 
On-line<http://shapeblue.com/cloudstack-training/>

This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. Shape Blue Ltd is a company 
incorporated in England & Wales. ShapeBlue Services India LLP is a company 
incorporated in India and is operated under license from Shape Blue Ltd. Shape 
Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is 
operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.

Reply via email to