Re: cassandra + spark / pyspark

Oleg Ruchovets Wed, 10 Sep 2014 09:50:44 -0700

Thanks for the info.
   can you share please where can I read about mesos integration for HA and
StandAlone mode execution?


Thanks
Oleg.

On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hello Oleg
>
> Question 2: yes. The official spark cassandra connector can be found here:
> https://github.com/datastax/spark-cassandra-connector
>
> There is docs in the doc/ folder. You can read & write directly from/to
> Cassandra without EVER using HDFS. You still need a resource manager like
> Apache Mesos though to have high availability of your Spark cluster, on run
> in stand alone mode and manage fail over yourself, choice is yours
>
> Question 3: yes, you can save a massive amount of data into Cassandra
>
> Question 4: I've played a little bit with it, it's quite smart, data
> locality is guaranteed by creating Spark RDD partition mapping directly to
> Cassandra node having the primary partition range. I have still not played
> with it into production though so I can't tell anything about stability.
>
>  Maybe other guys on the list may give their thoughts about it ?
>
> Regards
>
> Duy Hai DOAN
>
>
>
> Le 10 sept. 2014 17:35, "Oleg Ruchovets" <oruchov...@gmail.com> a écrit :
>
> Hi ,
>>   I try to evaluate different option of spark + cassandra and I have
>> couple of questions:
>>   My aim is to use cassandra+spark  without hadoop:
>>
>> 1) Is it possible to use only cassandra as input/output parameter for
>> PySpark?
>>   2) In case I'll use Spark (java,scala) is it possible to use only
>> cassandra - input/output without hadoop?
>>   3) I know there are couple of strategies for storage level, in case my
>> data set is quite big and I have no enough memory to process - can I use
>> DISK_ONLY option without hadoop (having only cassandra)?
>> 4) please share your experience how stable cassandra + spark integration?
>>
>> Thanks
>> Oleg
>>
>

Re: cassandra + spark / pyspark

Reply via email to