Re: map reduce for Cassandra

2014-07-22 Thread Marcelo Elias Del Valle
Hey, this is cool. I didn't know this project. As far as I see, it doesn't support Python, but it's quite interesting, as it seems to store intermediate results in Cassandra: - *(optional) write the computation results out to Cassandra*: we provide a way to efficiently save the result of yo

Re: map reduce for Cassandra

2014-07-21 Thread Gaspar Muñoz
Check Stratio Deep This integration between spark and Cassandra is not based on the Cassandra's Hadoop interface. 2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle : > Hi, > > >> But if you are only relying on memtables to sort writes, that seems like >>

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi, > But if you are only relying on memtables to sort writes, that seems like a > pretty heavyweight reason to use Cassandra? Actually, it's not a reason to use Cassandra. I already use Cassandra and I need to map reduce data from it. I am trying to see a reason to use the conventional M/R too

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > Although several sstables (disk fragments) may have the same row key, > inside a single sstable row keys and column keys are indexed, right? > Otherwise, doing a GET in Cassandra would take some time. > Fr

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi Robert, First of all, thanks for answering. 2014-07-21 20:18 GMT-03:00 Robert Coli : > You're wrong, unless you're talking about insertion into a memtable, which > you probably aren't and which probably doesn't actually work that way > enough to be meaningful. > > On disk, Cassandra has immu

Re: map reduce for Cassandra

2014-07-21 Thread Robert Coli
On Mon, Jul 21, 2014 at 10:54 AM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > My understanding (please some correct me if I am wrong) is that when you > insert N items in a Cassandra CF, you are executing N binary searches to > insert the item already indexed by a key. When you rea

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Jonathan, By what I have read in the docs, Python API has some limitations yet, not being possible to use any hadoop binary input format. The python example for Cassandra is only in the master branch: https://github.com/apache/spark/blob/master/examples/src/main/python/cassandra_inputformat.py I

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
I haven't tried pyspark yet, but it's part of the distribution. My main language is Python too, so I intend on getting deep into it. On Mon, Jul 21, 2014 at 9:38 AM, Marcelo Elias Del Valle wrote: > Hi Jonathan, > > Do you know if this RDD can be used with Python? AFAIK, python + Cassandra > wil

Re: map reduce for Cassandra

2014-07-21 Thread Marcelo Elias Del Valle
Hi Jonathan, Do you know if this RDD can be used with Python? AFAIK, python + Cassandra will be supported just in the next version, but I would like to be wrong... Best regards, Marcelo Valle. 2014-07-21 13:06 GMT-03:00 Jonathan Haddad : > Hey Marcelo, > > You should check out spark. It inte

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
Hey Marcelo, You should check out spark. It intelligently deals with a lot of the issues you're mentioning. Al Tobey did a walkthrough of how to set up the OSS side of things here: http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html It'll be less work than writing a M/