Re: cassandra + spark / pyspark

2014-09-12 Thread Francisco Madrid-Salvador
Hi Oleg, Connectors don't deal with HA, they rely on Spark for that, so neither the Datastax connector, Stratio Deep nor Calliope have anything to do with Spark's HA. You should have previously configured Spark so that it meets your high availability needs. Furthermore, as I mentioned in a pr

Re: cassandra + spark / pyspark

2014-09-11 Thread Oleg Ruchovets
Thank you Rohit. I sent the email to you. Thanks Oleg. On Thu, Sep 11, 2014 at 10:51 PM, Rohit Rai wrote: > Hi Oleg, > > I am the creator of Calliope. Calliope doesn't force any deployment > model... that means you can run it with Mesos or Hadoop or Standalone. To > be fair I don't think the

Re: cassandra + spark / pyspark

2014-09-11 Thread Rohit Rai
Hi Oleg, I am the creator of Calliope. Calliope doesn't force any deployment model... that means you can run it with Mesos or Hadoop or Standalone. To be fair I don't think the other libs mentioned here should work too. The Spark cluster HA can be provided using ZooKeeper even in the standalone d

Re: cassandra + spark / pyspark

2014-09-11 Thread Oleg Ruchovets
Ok. DataStax , Startio are required mesos, hadoop yarn other third party to get spark cluster HA. What in case of calliope? Is it sufficient to have cassandra + calliope + spark to be able process aggregations? In my case we have quite a lot of data so doing aggregation only in memory - impossi

Re: cassandra + spark / pyspark

2014-09-11 Thread DuyHai Doan
2. "still uses thrift for minor stuff" --> I think that the only call using thrift is "describe_ring" to get an estimate of ratio of partition keys within the token range 3. Stratio has a talk today at the SF Summit, presenting Stratio META. For the folks not attending the conference, video should

Re: cassandra + spark / pyspark

2014-09-11 Thread abhinav chowdary
Adding to conversation... there are 3 great open source options available 1. Calliope http://tuplejump.github.io/calliope/ This is the first library that was out some time late last year (as i can recall) and I have been using this for a while, mostly very stable, uses Hadoop i/o in Cassandra

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Typo. I am talking about spark only. Thanks Oleg. On Thursday, September 11, 2014, DuyHai Doan wrote: > Stupid question: do you really need both Storm & Spark ? Can't you > implement the Storm jobs in Spark ? It will be operationally simpler to > have less moving parts. I'm not saying that Stor

Re: cassandra + spark / pyspark

2014-09-10 Thread Paco Madrid
Hi Oleg. Spark can be configured to have high availability without the need for Mesos ( https://spark.apache.org/docs/latest/spark-standalone.html#high-availability), for instance using Zookeeper and standby masters. If I'm not wrong Storm doesn't need Mesos to work, so I imagine you use it to mak

Re: cassandra + spark / pyspark

2014-09-10 Thread Paco Madrid
Good to know. Thanks, DuyHai! I'll take a look (but most probably tomorrow ;-)) Paco 2014-09-10 20:15 GMT+02:00 DuyHai Doan : > Source code check for the Java version: > https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/sp

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Stupid question: do you really need both Storm & Spark ? Can't you implement the Storm jobs in Spark ? It will be operationally simpler to have less moving parts. I'm not saying that Storm is not the right fit, it may be totally suitable for some usages. But if you want to avoid the SPOF thing an

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Interesting things actually: We have hadoop in our eco system. It has single point of failure and I am not sure about inter data center replication. Plan is to use cassandra - no single point of failure , there is data center replication. For aggregation/transformation using SPARK. BUT storm r

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Source code check for the Java version: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector-java/src/main/java/com/datastax/spark/connector/RDDJavaFunctions.java#L26 It's using the RDDFunctions from scala code so yes, it's Java driver again. On Wed, Sep 10

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
"As far as I know, the Datastax connector uses thrift to connect Spark with Cassandra although thrift is already deprecated, could someone confirm this point?" --> the Scala connector is using the latest Java driver, so no there is no Thrift there. For the Java version, I'm not sure, have not lo

cassandra + spark / pyspark

2014-09-10 Thread Francisco Madrid-Salvador
Hi Oleg, Stratio Deep is just a library you must include in your Spark deployment so it doesn't guarantee any high availability at all. To achieve HA you must use Mesos or any other 3rd party resource manager. Stratio doesn't currently support PySpark, just Scala and Java. Perhaps in the fut

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
"can you share please where can I read about mesos integration for HA and StandAlone mode execution?" --> You can find all the info in the Spark documentation, read this: http://spark.apache.org/docs/latest/cluster-overview.html Basically, you have 3 choices: 1) Stand alone mode: get your hands

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Thanks for the info. can you share please where can I read about mesos integration for HA and StandAlone mode execution? Thanks Oleg. On Thu, Sep 11, 2014 at 12:13 AM, DuyHai Doan wrote: > Hello Oleg > > Question 2: yes. The official spark cassandra connector can be found here: > https://git

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Great stuff Paco. Thanks for sharing. Couple of questions: Is it required additional installation to be HA like apache mesos? Are you supporting PySpark? How stable /ready for production ? Thanks Oleg. On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador < pmad...@stratio.com> wrote: >

Re: cassandra + spark / pyspark

2014-09-10 Thread DuyHai Doan
Hello Oleg Question 2: yes. The official spark cassandra connector can be found here: https://github.com/datastax/spark-cassandra-connector There is docs in the doc/ folder. You can read & write directly from/to Cassandra without EVER using HDFS. You still need a resource manager like Apache Meso

cassandra + spark / pyspark

2014-09-10 Thread Francisco Madrid-Salvador
Hi Oleg, If you want to use cassandra+spark without hadoop, perhaps Stratio Deep is your best choice (https://github.com/Stratio/stratio-deep). It's an open-source Spark + Cassandra connector that doesn't make any use of Hadoop or Hadoop component. http://docs.openstratio.org/deep/0.3.3/abou

cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Hi , I try to evaluate different option of spark + cassandra and I have couple of questions: My aim is to use cassandra+spark without hadoop: 1) Is it possible to use only cassandra as input/output parameter for PySpark? 2) In case I'll use Spark (java,scala) is it possible to use only cass