Re: Best way to bring up Spark with Cassandra (and Elasticsearch) in production.

2016-02-15 Thread Ted Yu
Sounds reasonable. Please consider posting question on Spark C* connector on their mailing list if you have any. On Sun, Feb 14, 2016 at 7:51 PM, Kevin Burton wrote: > Afternoon. > > About 6 months ago I tried (and failed) to get Spark and Cassandra working > together in production due to depen

Best way to bring up Spark with Cassandra (and Elasticsearch) in production.

2016-02-14 Thread Kevin Burton
Afternoon. About 6 months ago I tried (and failed) to get Spark and Cassandra working together in production due to dependency hell. I'm going to give it another try! Here's my general strategy. I'm going to create a maven module for my code... with spark dependencies. Then I'm going to get th

Spark with cassandra

2015-05-22 Thread lucas
e.spark.rdd.RDD.foreach(RDD.scala:797) Do you have any idea ? To conclude, I would like to but my map on a cassandra table from my rddvalues org.apache.spark.rdd.RDD[scala.collection.Map[String,Any]] Best regards, -- View this message in context: http://apache-spark-user-l

Re: Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Ankur Srivastava
Thank you Cody!! I am going to try with the two settings you have mentioned. We are currently running with Spark standalone cluster manager. Thanks Ankur On Wed, Jan 7, 2015 at 1:20 PM, Cody Koeninger wrote: > General ideas regarding too many open files: > > Make sure ulimit is actually being

Re: Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Cody Koeninger
General ideas regarding too many open files: Make sure ulimit is actually being set, especially if you're on mesos (because of https://issues.apache.org/jira/browse/MESOS-123 ) Find the pid of the executor process, and cat /proc//limits set spark.shuffle.consolidateFiles = true try spark.shuffl

Spark with Cassandra - Shuffle opening to many files

2015-01-07 Thread Ankur Srivastava
Hello, We are currently running our data pipeline on spark which uses Cassandra as the data source. We are currently facing issue with the step where we create an rdd on data in cassandra table and then try to run "flatMapToPair" to transform the data but we are running into "Too many open files"