Thanks! Very helpful.
On Mon, Dec 3, 2012 at 4:04 PM, aaron morton <aa...@thelastpickle.com>wrote: > For background, you may find the wide row setting useful > http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration > > AFAIK all the input row readers for Hadoop do range scans. And I think the > support for setting the start and end token is used so that jobs only > select data which is local to the node. It's not really possible to select > individual rows by token. > > If you had a secondary index on the row you could use the setInputRange > overload that takes an index expression. > > Or it may be easier to use hive. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 1/12/2012, at 3:04 PM, Jamie Rothfeder <jamie.rothfe...@gmail.com> > wrote: > > > Hey All, > > > > I have a bunch of time-series data stored in a cluster using a > ByteOrderedPartitioner. My keys are time buckets representing events that > occurred in an hour. I've been trying to write a mapreduce job that > considers only events with in a certain time range by specifying an input > range, but this doesn't seem to be working. > > > > I expect the following code to scan data for a single key (1353456000), > but it is scanning all keys. > > > > int key = 1353456000; > > IPartitioner part = > ConfigHelper.getInputPartitioner(job.getConfiguration()); > > Token token = part.getToken(ByteBufferUtil.bytes(key)); > > ConfigHelper.setInputRange(job.getConfiguration(), > part.getTokenFactory().toString(token), > part.getTokenFactory().toString(token)); > > > > Any idea what I'm doing wrong? > > > > Thanks, > > Jamie > >