Hi Zack, Comments inline.
On 15-Apr-2014, at 9:24 pm, Zack <[email protected]> wrote: > Issue being that ideally, I'd like to take advantage of the logic built into > Hadoop and Kafka to leverage JBOD for parallel I/O. My worry is that zfs > drivers won't be as clever with application specific I/O optimizations. > Z The only IO optimisation I can think of would be parallization of reads/writes across the spindles. Looking at how AWS EMR works, it uses S3 instead of HDFS for its storage requirements. The ephemeral disk(s) are only used during a job run. I feel this is a superior scaling design when compared to using local disks. Advantages: (1) Save storage space by eliminating the need for HDFS 3x replication factor. Rather than have 3 copies of each log, let the storage system handle dedup and striping across the JBODs. (2) Save on bandwidth between TOR switches (3) Application lifecycle management becomes simple as there is no statefull data on compute nodes. I can blow the data on the local disk without worrying about data loss (4) Scale storage independent of compute nodes (5) Parallel object PUT/GET requests is possible with object storage (3) and (4) are really the deal clincher for me. One node going down should not bring hadoop to a standstill and I should be able to manage storage requirements without investing in additional compute nodes (hypervisors) or vice versa. You can consider building a distributed storage service using your JBOD. These days, most distributed storage systems (Gluster, RiakCS) provide an S3 interface and stock hadoop (?) seems to support S3 natively now. Let Gluster/Riak handle striping across the JBODs. http://basho.com/riak-cloud-storage/ http://gluster.org/community/documentation/index.php/Hadoop https://ceph.com/docs/master/cephfs/hadoop/ As for Kafka, its been a while since I tracked the project. I am not certain if it supports object storage. Please do check. On the whole, I would opt for a object storage as a service for your different workloads than a app specific cloud. Also given the adoption of EMR + S3 on AWS, object storage seems to be the way to go forward. YMMV. -- @shankerbalan M: +91 98860 60539 | O: +91 (80) 67935867 [email protected] | www.shapeblue.com | Twitter:@shapeblue ShapeBlue Services India LLP, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 560 055 Need Enterprise Grade Support for Apache CloudStack? Our CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> offers the best 24/7 SLA for CloudStack Environments. Apache CloudStack Bootcamp training courses **NEW!** CloudStack 4.2.1 training<http://shapeblue.com/cloudstack-training/> 28th-29th May 2014, Bangalore. Classromm<http://shapeblue.com/cloudstack-training/> 16th-20th June 2014, Region A. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> 23rd-27th June 2014, Region B. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> 15th-20th September 2014, Region A. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> 22nd-27th September 2014, Region B. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> 1st-6th December 2014, Region A. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> 8th-12th December 2014, Region B. Instructor led, On-line<http://shapeblue.com/cloudstack-training/> This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.
