Re: Hadoop Integration: Limiting scan to a range of keys

Jamie Rothfeder Mon, 03 Dec 2012 17:16:03 -0800

Thanks! Very helpful.


On Mon, Dec 3, 2012 at 4:04 PM, aaron morton <aa...@thelastpickle.com>wrote:

> For background, you may find the wide row setting useful
> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration
>
> AFAIK all the input row readers for Hadoop do range scans. And I think the
> support for setting the start and end token is used so that jobs only
> select data which is local to the node. It's not really possible to select
> individual rows by token.
>
> If you had a secondary index on the row you could use the setInputRange
> overload that takes an index expression.
>
> Or it may be easier to use hive.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/12/2012, at 3:04 PM, Jamie Rothfeder <jamie.rothfe...@gmail.com>
> wrote:
>
> > Hey All,
> >
> > I have a bunch of time-series data stored in a cluster using a
> ByteOrderedPartitioner. My keys are time buckets representing events that
> occurred in an hour. I've been trying to write a mapreduce job that
> considers only events with in a certain time range by specifying an input
> range, but this doesn't seem to be working.
> >
> > I expect the following code to scan data for a single key (1353456000),
> but it is scanning all keys.
> >
> > int key = 1353456000;
> > IPartitioner part =
> ConfigHelper.getInputPartitioner(job.getConfiguration());
> > Token token =  part.getToken(ByteBufferUtil.bytes(key));
> > ConfigHelper.setInputRange(job.getConfiguration(),
> part.getTokenFactory().toString(token),
> part.getTokenFactory().toString(token));
> >
> > Any idea what I'm doing wrong?
> >
> > Thanks,
> > Jamie
>
>

Re: Hadoop Integration: Limiting scan to a range of keys

Reply via email to