I would say this is worth benchmarking before jumping to conclusions. The network being a bottleneck (or latency causing) for EBS is, to my knowledge, supposition, and instances can be started with direct connections to EBS if this is a concern. The blog post below shows that even without SSDs the EBS-optimised provisioned-IOPS instances show pretty consistent latency numbers, although those latencies are higher than you would typically expect from locally attached storage.
http://blog.parse.com/2012/09/17/parse-databases-upgraded-to-amazon-provisioned-iops/ Note, I'm not endorsing the use of EBS. Cassandra is designed to scale up with number of nodes, not with depth of nodes (as Ben mentions, saturating a single node's data capacity is pretty easy these days. CPUs rapidly become the bottleneck as you try to go deep). However the argument that EBS cannot provide consistent performance seems overly pessimistic, and should probably be empirically determined for your use case. On Thu, Jun 19, 2014 at 9:50 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Ok, looks fair enough. > > Thanks guys. I would be great to be able to add disks when amount of data > raises and add nodes when throughput increases... :) > > > 2014-06-19 5:27 GMT+02:00 Ben Bromhead <b...@instaclustr.com>: > > >> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html >> >> From the link: >> >> EBS volumes are not recommended for Cassandra data volumes for the >> following reasons: >> >> • EBS volumes contend directly for network throughput with standard >> packets. This means that EBS throughput is likely to fail if you saturate a >> network link. >> • EBS volumes have unreliable performance. I/O performance can be >> exceptionally slow, causing the system to back load reads and writes until >> the entire cluster becomes unresponsive. >> • Adding capacity by increasing the number of EBS volumes per host does >> not scale. You can easily surpass the ability of the system to keep >> effective buffer caches and concurrently serve requests for all of the data >> it is responsible for managing. >> >> Still applies, especially the network contention and latency issues. >> >> Ben Bromhead >> Instaclustr | www.instaclustr.com | @instaclustr >> <http://twitter.com/instaclustr> | +61 415 936 359 >> >> On 18 Jun 2014, at 7:18 pm, Daniel Chia <danc...@coursera.org> wrote: >> >> While they guarantee IOPS, they don't really make any guarantees about >> latency. Since EBS goes over the network, there's so many things in the >> path of getting at your data, I would be concerned with random latency >> spikes, unless proven otherwise. >> >> Thanks, >> Daniel >> >> >> On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ <arodr...@gmail.com> >> wrote: >> >>> In this document it is said : >>> >>> >>> - Provisioned IOPS (SSD) - Volumes of this type are ideal for the >>> most demanding I/O intensive, transactional workloads and large >>> relational >>> or NoSQL databases. This volume type provides the most consistent >>> performance and allows you to provision the exact level of performance >>> you >>> need with the most predictable and consistent performance. With this type >>> of volume you provision exactly what you need, and pay for what you >>> provision. Once again, you can achieve up to 48,000 IOPS by connecting >>> multiple volumes together using RAID. >>> >>> >>> >>> 2014-06-18 10:57 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>: >>> >>> Hi, >>>> >>>> I just saw this : >>>> http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/ >>>> >>>> Since the problem with EBS was the network, there is no chance that >>>> this hardware architecture might be useful alongside Cassandra, right ? >>>> >>>> Alain >>>> >>> >>> >> >> >