Hi,

> But if you are only relying on memtables to sort writes, that seems like a
> pretty heavyweight reason to use Cassandra?


Actually, it's not a reason to use Cassandra. I already use Cassandra and I
need to map reduce data from it. I am trying to see a reason to use the
conventional M/R tools or to build a tool "specific" to Cassandra.

but Cassandra, as a datastore with immutable data files, is not typically a
> good choice for short lived intermediate result sets...


Indeed, but so far I am seeing it as the best option. I storing this
intermediate files in HDFS is better, then I agree there is no reason to
consider Cassandra to do it.

are you planning to use DSE?


Our company will probably hire DSE support when it reaches some size, but
DSE as a product doesn't seem interesting to our case so far. The only tool
that would help be at this moment would be HIVE, but honestly I didn't like
the way DSE supports hive and I don't want to use a solution not available
to DSC (see
http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community
for details).

[]s



2014-07-21 22:09 GMT-03:00 Robert Coli <rc...@eventbrite.com>:

> On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle <
> marc...@s1mbi0se.com.br> wrote:
>
>> Although several sstables (disk fragments) may have the same row key,
>> inside a single sstable row keys and column keys are indexed, right?
>> Otherwise, doing a GET in Cassandra would take some time.
>> From the M/R perspective, I was reffering to the mem table, as I am
>> trying to compare the time to insert in Cassandra against the time of
>> sorting in hadoop.
>>
>
> I was confused, because unless you are using new "in-memory"
> columnfamilies, which I believe are only available in DSE, there is no way
> to ensure that any given row stays in a memtable. Very rarely is there a
> view of the function of a memtable that only cares about its properties and
> not the closely related properties of SSTables. However yours is one of
> them, I see now why your question makes sense, you only care about the
> memtable for how quickly it sorts.
>
> But if you are only relying on memtables to sort writes, that seems like a
> pretty heavyweight reason to use Cassandra?
>
> I'm certainly not an expert in this area of Cassandra... but Cassandra, as
> a datastore with immutable data files, is not typically a good choice for
> short lived intermediate result sets... are you planning to use DSE?
>
> =Rob
>
>

Reply via email to