Hey, this is cool. I didn't know this project. As far as I see, it doesn't support Python, but it's quite interesting, as it seems to store intermediate results in Cassandra:
- *(optional) write the computation results out to Cassandra*: we provide a way to efficiently save the result of your computation to Cassandra. In order to do that you must have another configuration object where you specify the output keyspace/column family. We can create the output column family for you if needed. Please, refer to the comprehensive Stratio Deep documentation at Stratio website <http://www.openstratio.org/examples/using-stratio-deep/>. Thank for the link Gaspar. Best regards, Marcelo. 2014-07-22 3:53 GMT-03:00 Gaspar Muñoz <gmu...@stratio.com>: > Check Stratio Deep <https://github.com/Stratio/stratio-deep> This > integration between spark and Cassandra is not based on the Cassandra's > Hadoop interface. > > > 2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle <marc...@s1mbi0se.com.br > >: > > Hi, >> >> >>> But if you are only relying on memtables to sort writes, that seems like >>> a pretty heavyweight reason to use Cassandra? >> >> >> Actually, it's not a reason to use Cassandra. I already use Cassandra and >> I need to map reduce data from it. I am trying to see a reason to use the >> conventional M/R tools or to build a tool "specific" to Cassandra. >> >> but Cassandra, as a datastore with immutable data files, is not typically >>> a good choice for short lived intermediate result sets... >> >> >> Indeed, but so far I am seeing it as the best option. I storing this >> intermediate files in HDFS is better, then I agree there is no reason to >> consider Cassandra to do it. >> >> are you planning to use DSE? >> >> >> Our company will probably hire DSE support when it reaches some size, but >> DSE as a product doesn't seem interesting to our case so far. The only tool >> that would help be at this moment would be HIVE, but honestly I didn't like >> the way DSE supports hive and I don't want to use a solution not available >> to DSC (see >> http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community >> for details). >> >> []s >> >> >> >> 2014-07-21 22:09 GMT-03:00 Robert Coli <rc...@eventbrite.com>: >> >> On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle < >>> marc...@s1mbi0se.com.br> wrote: >>> >>>> Although several sstables (disk fragments) may have the same row key, >>>> inside a single sstable row keys and column keys are indexed, right? >>>> Otherwise, doing a GET in Cassandra would take some time. >>>> From the M/R perspective, I was reffering to the mem table, as I am >>>> trying to compare the time to insert in Cassandra against the time of >>>> sorting in hadoop. >>>> >>> >>> I was confused, because unless you are using new "in-memory" >>> columnfamilies, which I believe are only available in DSE, there is no way >>> to ensure that any given row stays in a memtable. Very rarely is there a >>> view of the function of a memtable that only cares about its properties and >>> not the closely related properties of SSTables. However yours is one of >>> them, I see now why your question makes sense, you only care about the >>> memtable for how quickly it sorts. >>> >>> But if you are only relying on memtables to sort writes, that seems like >>> a pretty heavyweight reason to use Cassandra? >>> >>> I'm certainly not an expert in this area of Cassandra... but Cassandra, >>> as a datastore with immutable data files, is not typically a good choice >>> for short lived intermediate result sets... are you planning to use DSE? >>> >>> =Rob >>> >>> >> >