I meant in the first sentence "running the get_range_slices from a single
point"
On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu wrote:
> Do you mean, running the get_range_slices from a single? Yes, it would be
> reasonable for a relatively small key range, when it comes to analyze a
> really b
Do you mean, running the get_range_slices from a single? Yes, it would be
reasonable for a relatively small key range, when it comes to analyze a
really big range in really big data collection (i.e. like the one we
currently populate) having a way for distributing the reads among the
cluster seems
Sounds like doing this w/o m/r with get_range_slices is a reasonable way to go.
On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu wrote:
> I'm currently writing collected data continuously to Cassandra, having keys
> starting with a timestamp and a unique identifier (like
> 2009.01.01.00.00.00.RAND
I'm currently writing collected data continuously to Cassandra, having keys
starting with a timestamp and a unique identifier (like
2009.01.01.00.00.00.RANDOM) for being able to query in time ranges.
I'm thinking of running periodical mapreduce jobs which will go through a
designated time period.
It's technically possible but 0.6 does not support this, no.
What is the use case you are thinking of?
On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu wrote:
> Hi,
>
> I've been trying to use Cassandra for some kind of a supplementary input
> source for Hadoop MapReduce jobs.
>
> The default us
Hi,
I've been trying to use Cassandra for some kind of a supplementary input
source for Hadoop MapReduce jobs.
The default usage of the ColumnFamilyInputFormat does a full columnfamily
scan for using within the MapReduce framework as map input.
However I believe that, it should be possible to gi