Check Stratio Deep <https://github.com/Stratio/stratio-deep> This integration between spark and Cassandra is not based on the Cassandra's Hadoop interface.
2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle <marc...@s1mbi0se.com.br>: > Hi, > > >> But if you are only relying on memtables to sort writes, that seems like >> a pretty heavyweight reason to use Cassandra? > > > Actually, it's not a reason to use Cassandra. I already use Cassandra and > I need to map reduce data from it. I am trying to see a reason to use the > conventional M/R tools or to build a tool "specific" to Cassandra. > > but Cassandra, as a datastore with immutable data files, is not typically >> a good choice for short lived intermediate result sets... > > > Indeed, but so far I am seeing it as the best option. I storing this > intermediate files in HDFS is better, then I agree there is no reason to > consider Cassandra to do it. > > are you planning to use DSE? > > > Our company will probably hire DSE support when it reaches some size, but > DSE as a product doesn't seem interesting to our case so far. The only tool > that would help be at this moment would be HIVE, but honestly I didn't like > the way DSE supports hive and I don't want to use a solution not available > to DSC (see > http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community > for details). > > []s > > > > 2014-07-21 22:09 GMT-03:00 Robert Coli <rc...@eventbrite.com>: > > On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle < >> marc...@s1mbi0se.com.br> wrote: >> >>> Although several sstables (disk fragments) may have the same row key, >>> inside a single sstable row keys and column keys are indexed, right? >>> Otherwise, doing a GET in Cassandra would take some time. >>> From the M/R perspective, I was reffering to the mem table, as I am >>> trying to compare the time to insert in Cassandra against the time of >>> sorting in hadoop. >>> >> >> I was confused, because unless you are using new "in-memory" >> columnfamilies, which I believe are only available in DSE, there is no way >> to ensure that any given row stays in a memtable. Very rarely is there a >> view of the function of a memtable that only cares about its properties and >> not the closely related properties of SSTables. However yours is one of >> them, I see now why your question makes sense, you only care about the >> memtable for how quickly it sorts. >> >> But if you are only relying on memtables to sort writes, that seems like >> a pretty heavyweight reason to use Cassandra? >> >> I'm certainly not an expert in this area of Cassandra... but Cassandra, >> as a datastore with immutable data files, is not typically a good choice >> for short lived intermediate result sets... are you planning to use DSE? >> >> =Rob >> >> >