Hey, this is cool.
I didn't know this project.
As far as I see, it doesn't support Python, but it's quite interesting, as
it seems to store intermediate results in Cassandra:

   - *(optional) write the computation results out to Cassandra*: we
   provide a way to efficiently save the result of your computation to
   Cassandra. In order to do that you must have another configuration object
   where you specify the output keyspace/column family. We can create the
   output column family for you if needed. Please, refer to the comprehensive
   Stratio Deep documentation at Stratio website

Thank for the link Gaspar.

Best regards,

2014-07-22 3:53 GMT-03:00 Gaspar Muñoz <gmu...@stratio.com>:

> Check Stratio Deep <https://github.com/Stratio/stratio-deep> This
> integration between spark and Cassandra is not based on the Cassandra's
> Hadoop interface.
> 2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle <marc...@s1mbi0se.com.br
> >:
> Hi,
>>> But if you are only relying on memtables to sort writes, that seems like
>>> a pretty heavyweight reason to use Cassandra?
>> Actually, it's not a reason to use Cassandra. I already use Cassandra and
>> I need to map reduce data from it. I am trying to see a reason to use the
>> conventional M/R tools or to build a tool "specific" to Cassandra.
>> but Cassandra, as a datastore with immutable data files, is not typically
>>> a good choice for short lived intermediate result sets...
>> Indeed, but so far I am seeing it as the best option. I storing this
>> intermediate files in HDFS is better, then I agree there is no reason to
>> consider Cassandra to do it.
>> are you planning to use DSE?
>> Our company will probably hire DSE support when it reaches some size, but
>> DSE as a product doesn't seem interesting to our case so far. The only tool
>> that would help be at this moment would be HIVE, but honestly I didn't like
>> the way DSE supports hive and I don't want to use a solution not available
>> to DSC (see
>> http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community
>> for details).
>> []s
>> 2014-07-21 22:09 GMT-03:00 Robert Coli <rc...@eventbrite.com>:
>> On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle <
>>> marc...@s1mbi0se.com.br> wrote:
>>>> Although several sstables (disk fragments) may have the same row key,
>>>> inside a single sstable row keys and column keys are indexed, right?
>>>> Otherwise, doing a GET in Cassandra would take some time.
>>>> From the M/R perspective, I was reffering to the mem table, as I am
>>>> trying to compare the time to insert in Cassandra against the time of
>>>> sorting in hadoop.
>>>  I was confused, because unless you are using new "in-memory"
>>> columnfamilies, which I believe are only available in DSE, there is no way
>>> to ensure that any given row stays in a memtable. Very rarely is there a
>>> view of the function of a memtable that only cares about its properties and
>>> not the closely related properties of SSTables. However yours is one of
>>> them, I see now why your question makes sense, you only care about the
>>> memtable for how quickly it sorts.
>>> But if you are only relying on memtables to sort writes, that seems like
>>> a pretty heavyweight reason to use Cassandra?
>>> I'm certainly not an expert in this area of Cassandra... but Cassandra,
>>> as a datastore with immutable data files, is not typically a good choice
>>> for short lived intermediate result sets... are you planning to use DSE?
>>> =Rob

Reply via email to