Re: map reduce for Cassandra

Marcelo Elias Del Valle Tue, 22 Jul 2014 06:03:34 -0700

Hey, this is cool.
I didn't know this project.
As far as I see, it doesn't support Python, but it's quite interesting, as
it seems to store intermediate results in Cassandra:



   - *(optional) write the computation results out to Cassandra*: we
   provide a way to efficiently save the result of your computation to
   Cassandra. In order to do that you must have another configuration object
   where you specify the output keyspace/column family. We can create the
   output column family for you if needed. Please, refer to the comprehensive
   Stratio Deep documentation at Stratio website
   <http://www.openstratio.org/examples/using-stratio-deep/>.


Thank for the link Gaspar.

Best regards,
Marcelo.


2014-07-22 3:53 GMT-03:00 Gaspar Muñoz <gmu...@stratio.com>:

> Check Stratio Deep <https://github.com/Stratio/stratio-deep> This
> integration between spark and Cassandra is not based on the Cassandra's
> Hadoop interface.
>
>
> 2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle <marc...@s1mbi0se.com.br
> >:
>
> Hi,
>>
>>
>>> But if you are only relying on memtables to sort writes, that seems like
>>> a pretty heavyweight reason to use Cassandra?
>>
>>
>> Actually, it's not a reason to use Cassandra. I already use Cassandra and
>> I need to map reduce data from it. I am trying to see a reason to use the
>> conventional M/R tools or to build a tool "specific" to Cassandra.
>>
>> but Cassandra, as a datastore with immutable data files, is not typically
>>> a good choice for short lived intermediate result sets...
>>
>>
>> Indeed, but so far I am seeing it as the best option. I storing this
>> intermediate files in HDFS is better, then I agree there is no reason to
>> consider Cassandra to do it.
>>
>> are you planning to use DSE?
>>
>>
>> Our company will probably hire DSE support when it reaches some size, but
>> DSE as a product doesn't seem interesting to our case so far. The only tool
>> that would help be at this moment would be HIVE, but honestly I didn't like
>> the way DSE supports hive and I don't want to use a solution not available
>> to DSC (see
>> http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community
>> for details).
>>
>> []s
>>
>>
>>
>> 2014-07-21 22:09 GMT-03:00 Robert Coli <rc...@eventbrite.com>:
>>
>> On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle <
>>> marc...@s1mbi0se.com.br> wrote:
>>>
>>>> Although several sstables (disk fragments) may have the same row key,
>>>> inside a single sstable row keys and column keys are indexed, right?
>>>> Otherwise, doing a GET in Cassandra would take some time.
>>>> From the M/R perspective, I was reffering to the mem table, as I am
>>>> trying to compare the time to insert in Cassandra against the time of
>>>> sorting in hadoop.
>>>>
>>>
>>>  I was confused, because unless you are using new "in-memory"
>>> columnfamilies, which I believe are only available in DSE, there is no way
>>> to ensure that any given row stays in a memtable. Very rarely is there a
>>> view of the function of a memtable that only cares about its properties and
>>> not the closely related properties of SSTables. However yours is one of
>>> them, I see now why your question makes sense, you only care about the
>>> memtable for how quickly it sorts.
>>>
>>> But if you are only relying on memtables to sort writes, that seems like
>>> a pretty heavyweight reason to use Cassandra?
>>>
>>> I'm certainly not an expert in this area of Cassandra... but Cassandra,
>>> as a datastore with immutable data files, is not typically a good choice
>>> for short lived intermediate result sets... are you planning to use DSE?
>>>
>>> =Rob
>>>
>>>
>>
>

Re: map reduce for Cassandra

Reply via email to