Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
I meant in the first sentence "running the get_range_slices from a single point" On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu wrote: > Do you mean, running the get_range_slices from a single? Yes, it would be > reasonable for a relatively small key range, when it comes to analyze a > really b

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
Do you mean, running the get_range_slices from a single? Yes, it would be reasonable for a relatively small key range, when it comes to analyze a really big range in really big data collection (i.e. like the one we currently populate) having a way for distributing the reads among the cluster seems

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Jonathan Ellis
Sounds like doing this w/o m/r with get_range_slices is a reasonable way to go. On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu wrote: > I'm currently writing collected data continuously to Cassandra, having keys > starting with a timestamp and a unique identifier (like > 2009.01.01.00.00.00.RAND

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
I'm currently writing collected data continuously to Cassandra, having keys starting with a timestamp and a unique identifier (like 2009.01.01.00.00.00.RANDOM) for being able to query in time ranges. I'm thinking of running periodical mapreduce jobs which will go through a designated time period.

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Jonathan Ellis
It's technically possible but 0.6 does not support this, no. What is the use case you are thinking of? On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu wrote: > Hi, > > I've been trying to use Cassandra for some kind of a supplementary input > source for Hadoop MapReduce jobs. > > The default us

ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
Hi, I've been trying to use Cassandra for some kind of a supplementary input source for Hadoop MapReduce jobs. The default usage of the ColumnFamilyInputFormat does a full columnfamily scan for using within the MapReduce framework as map input. However I believe that, it should be possible to gi