"can you share please where can I read about mesos integration for HA and StandAlone mode execution?" --> You can find all the info in the Spark documentation, read this: http://spark.apache.org/docs/latest/cluster-overview.html
Basically, you have 3 choices: 1) Stand alone mode: get your hands dirty, have good ops team to set up manual failure & fail over handling 2) Apache Mesos 3) Hadoop YARN If you want to stay away from the Hadoop stack, I'll recommend Mesos. Side note: I'm been told that the DSE (Cassandra Enterprise version) offers tight integration with Spark, in the sense that you even don't need to use Mesos. Datastax have a proprietary implementation so that the Spark & Cassandra are running side-by-side and the fail over is managed automatically (state is saved in Cassandra). I personally never use it I cannot tell more. If somebody have more input, please provide, I'll be interested too to know how it is handled. On Wed, Sep 10, 2014 at 6:49 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Thanks for the info. > can you share please where can I read about mesos integration for HA > and StandAlone mode execution? > > Thanks > Oleg. > > On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan <doanduy...@gmail.com> > wrote: > >> Hello Oleg >> >> Question 2: yes. The official spark cassandra connector can be found >> here: https://github.com/datastax/spark-cassandra-connector >> >> There is docs in the doc/ folder. You can read & write directly from/to >> Cassandra without EVER using HDFS. You still need a resource manager like >> Apache Mesos though to have high availability of your Spark cluster, on run >> in stand alone mode and manage fail over yourself, choice is yours >> >> Question 3: yes, you can save a massive amount of data into Cassandra >> >> Question 4: I've played a little bit with it, it's quite smart, data >> locality is guaranteed by creating Spark RDD partition mapping directly to >> Cassandra node having the primary partition range. I have still not played >> with it into production though so I can't tell anything about stability. >> >> Maybe other guys on the list may give their thoughts about it ? >> >> Regards >> >> Duy Hai DOAN >> >> >> >> Le 10 sept. 2014 17:35, "Oleg Ruchovets" <oruchov...@gmail.com> a écrit : >> >> Hi , >>> I try to evaluate different option of spark + cassandra and I have >>> couple of questions: >>> My aim is to use cassandra+spark without hadoop: >>> >>> 1) Is it possible to use only cassandra as input/output parameter for >>> PySpark? >>> 2) In case I'll use Spark (java,scala) is it possible to use only >>> cassandra - input/output without hadoop? >>> 3) I know there are couple of strategies for storage level, in case my >>> data set is quite big and I have no enough memory to process - can I use >>> DISK_ONLY option without hadoop (having only cassandra)? >>> 4) please share your experience how stable cassandra + spark integration? >>> >>> Thanks >>> Oleg >>> >> >