Re: cassandra + spark / pyspark

DuyHai Doan Wed, 10 Sep 2014 10:00:58 -0700

"can you share please where can I read about mesos integration for HA and
StandAlone mode execution?" --> You can find all the info in the Spark
documentation, read this:
http://spark.apache.org/docs/latest/cluster-overview.html


Basically, you have 3 choices:

 1) Stand alone mode: get your hands dirty, have good ops team to set up
manual failure & fail over handling
 2) Apache Mesos
 3) Hadoop YARN

If you want to stay away from the Hadoop stack, I'll recommend Mesos.

 Side note: I'm been told that the DSE (Cassandra Enterprise version)
offers tight integration with Spark, in the sense that you even don't need
to use Mesos. Datastax have a proprietary implementation so that the Spark
& Cassandra are running side-by-side and the fail over is managed
automatically (state is saved in Cassandra). I personally never use it I
cannot tell more.

 If somebody have more input, please provide, I'll be interested too to
know how it is handled.


On Wed, Sep 10, 2014 at 6:49 PM, Oleg Ruchovets <oruchov...@gmail.com>
wrote:

> Thanks for the info.
>    can you share please where can I read about mesos integration for HA
> and StandAlone mode execution?
>
> Thanks
> Oleg.
>
> On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan <doanduy...@gmail.com>
> wrote:
>
>> Hello Oleg
>>
>> Question 2: yes. The official spark cassandra connector can be found
>> here: https://github.com/datastax/spark-cassandra-connector
>>
>> There is docs in the doc/ folder. You can read & write directly from/to
>> Cassandra without EVER using HDFS. You still need a resource manager like
>> Apache Mesos though to have high availability of your Spark cluster, on run
>> in stand alone mode and manage fail over yourself, choice is yours
>>
>> Question 3: yes, you can save a massive amount of data into Cassandra
>>
>> Question 4: I've played a little bit with it, it's quite smart, data
>> locality is guaranteed by creating Spark RDD partition mapping directly to
>> Cassandra node having the primary partition range. I have still not played
>> with it into production though so I can't tell anything about stability.
>>
>>  Maybe other guys on the list may give their thoughts about it ?
>>
>> Regards
>>
>> Duy Hai DOAN
>>
>>
>>
>> Le 10 sept. 2014 17:35, "Oleg Ruchovets" <oruchov...@gmail.com> a écrit :
>>
>> Hi ,
>>>   I try to evaluate different option of spark + cassandra and I have
>>> couple of questions:
>>>   My aim is to use cassandra+spark  without hadoop:
>>>
>>> 1) Is it possible to use only cassandra as input/output parameter for
>>> PySpark?
>>>   2) In case I'll use Spark (java,scala) is it possible to use only
>>> cassandra - input/output without hadoop?
>>>   3) I know there are couple of strategies for storage level, in case my
>>> data set is quite big and I have no enough memory to process - can I use
>>> DISK_ONLY option without hadoop (having only cassandra)?
>>> 4) please share your experience how stable cassandra + spark integration?
>>>
>>> Thanks
>>> Oleg
>>>
>>
>

Re: cassandra + spark / pyspark

Reply via email to