ephemeral storage on ssd will be very painful to maintain especially with large datasets. we will pretty soon have somewhere in PB.
I am thinking to leverage something like below. But not sure how much performance gain we could get out of that. https://github.com/stec-inc/EnhanceIO On Sat, Dec 3, 2016 at 8:28 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > What about ephemeral storage on ssd ? If performance is required it's > generally for production so the cluster would never be stopped. Then a > spark job to backup/restore on S3 allows to shut down completely the cluster > > Le 3 déc. 2016 1:28 PM, "David Mitchell" <jdavidmitch...@gmail.com> a > écrit : > >> To get a node local read from Spark to Cassandra, one has to use a read >> consistency level of LOCAL_ONE. For some use cases, this is not an >> option. For example, if you need to use a read consistency level >> of LOCAL_QUORUM, as many use cases demand, then one is not going to get a >> node local read. >> >> Also, to insure a node local read, one has to set spark.locality.wait to >> zero. Whether or not a partition will be streamed to another node or >> computed locally is dependent on the spark.locality.wait parameters. This >> parameter can be set to 0 to force all partitions to only be computed on >> local nodes. >> >> If you do some testing, please post your performance numbers. >> >> >>