Re: KeyRange in the CoumnFamilyInputFormat

Mick Semb Wever Mon, 05 Sep 2011 10:05:18 -0700

On Mon, 2011-09-05 at 18:18 +0300, Vitaly Vengrov wrote:
> See these rows in the ColumnFamilyInputFormat.getSplits method : 
> 
> assert jobKeyRange.start_key == null : "only start_token supported";          
>        
> assert jobKeyRange.end_key == null : "only end_token supported"; 
> 
> So, the question is why start_key and end_key aren't supported ? 
> 
> What I actually need is the ability to specify exact rowKey (UUID).
> Not a key range.  I believe I can do this with same start and end keys
> but not with tokes.


The background to this is CASSANDRA-1125 and specifically this comment
https://issues.apache.org/jira/browse/CASSANDRA-1125?focusedCommentId=13058858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13058858

Tokens are used here to be consistent with the thrift API.

What you want is:


        ConfigHelper.setInputRange(
                jobConf,
                
partitioner.getTokenFactory().toString(partitioner.getToken(myKey)),
                
partitioner.getTokenFactory().toString(partitioner.getToken(myKey)));


In fact this would not be possible if you were using range.start_key and
range.end_key since that would exclude the one row you are trying to
include.

Out of curiosity why are you using hadoop to process one row?
Won't this be solely processed by one split and therefore only one task?

~mck

-- 
"The only thing I know, is that I know nothing." Socrates 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |

signature.asc
Description: This is a digitally signed message part

Re: KeyRange in the CoumnFamilyInputFormat

Reply via email to