i just came across this and i use tokens in range queries because it is an easy straightforward way to divide the keyspace and operate on it using multiple threads and throttle the processing. maybe this is what hadoop does, i don't know much about hadoop.
so i don't really agree that i'm doing it wrong. why is this? On Wed, 2010-08-18 at 11:18 -0700, Ran Tavory wrote: > > > On Wed, Aug 18, 2010 at 4:30 PM, Jonathan Ellis <jbel...@gmail.com> > wrote: > (a) if you're using token queries and you're not hadoop, > you're doing it wrong > ah, didn't know that, so I guess I'll remove support for it from > hector... > > (b) they are expected to be of the form generated by > TokenFactory.toString and fromString. You should not be > generating > them yourself. > > > On Wed, Aug 18, 2010 at 7:56 AM, Ran Tavory <ran...@gmail.com> > wrote: > > I'm a bit confused WRT KeyRange's tokens in 0.7.0 > > When making a range query you can either use KeyRange.key or > KeyRange.token. > > In 0.7.0 key was typed as byte[]. tokens remain strings. > > What does this string represent in case of a RP and in case > of an OPP? Did > > this change in 0.7.0? > > AFAIK in 0.6.0 if the partitioner is OPP then the tokens are > actual strings > > and they might just be actual subset of the keys. When using > a RP tokens are > > BigIntegers (keys are still strings) and I'm not actually > sure if you're > > allowed to shoot a range query using tokens... > > In 0.7.0 since keys are now bytes, when using an OPP, how do > those bytes > > translate to strings? I'd assume it'd just be byte[] -> UTF8 > conversion, > > only that this may result in illegal UTF8 chars when keys > are just random > > bytes, so I guess not... Perhaps md5 hashing? But then if > using an OPP and > > keys are actual strings, I want to have the same 0.6.0 > functionality in > > place, meaning tokens are strings like the keys. I actually > tested this > > scenario and it looks working, so it seems like the String > keys are > > translated to UTF8, but what happens when they are invalid > UTF8? > > Another question is what's the story with RP in 0.7.0? > Should range query > > even be supported with tokens? If so, then are the tokens > expected to be > > string of integers? (e.g. "1234567890") > > Thanks. > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra > support > http://riptano.com > >