Hey, [junit] key24 [junit] Query w/ Range(key24,,10) result size: 10 [junit] key24
I think this is actually the expected result, whenever you are using range_slices with start_key/end_key you must increment the last key you received and then use that in the next slice start_key. I also tried to use token because of exactly that behaviour and the doc talking about inclusive/exclusive. Tokens are actually what the Partitioner uses to decide which nodes your data goes to, so in case of RPP it the the MD5 hash of your actual key as a 128bit BigInteger (just try nodetool ring to see some Tokens ;). get_range_slices with start/end_token is best used together with describe_ring/describe_splits so you can talk to the nodes directly. The Hadoop/Pig stuff uses tokens for example. HTH, /thomas On Sat, Aug 7, 2010 at 12:06 AM, Adam Crain <adam.cr...@greenenergycorp.com> wrote: > I ran against the 0.6 branch I still see similarly odd results. My test cases > prove that set of keys have been successfully inserted, but usually I never > see the first key again or I reach the first key before having seen all of > the keys. > > -Adam > > > > -----Original Message----- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Fri 8/6/2010 4:25 PM > To: user@cassandra.apache.org > Subject: Re: error using get_range_slice with random partitioner > > If you're willing to try it out, the easiest way to check to see if it is > resolved by the patch for CASSANDRA-1145, you could checkout the 0.6 branch: > > svn checkout > http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/ > cassandra-0.6 > > Then run `ant` to build the binaries. > > On Aug 6, 2010, at 2:57 PM, Adam Crain wrote: > >> Hi Jeremy, >> >> So, I fixed my client so it preserves the ordering and I get results that >> may be related to the bug. >> >> If I insert 30 keys into the random partitioner with names [key1, key2, ... >> key30] and then start the iteration (with a batch size of 10) I get the >> following debug output during the iteration: >> >> [junit] Query w/ Range(,,10) result size: 10 >> [junit] key18 >> [junit] key23 >> [junit] key26 >> [junit] key27 >> [junit] key12 >> [junit] key28 >> [junit] key4 >> [junit] key3 >> [junit] key1 >> [junit] key24 >> [junit] Query w/ Range(key24,,10) result size: 10 >> [junit] key24 >> [junit] key5 >> [junit] key17 >> [junit] key29 >> [junit] key19 >> [junit] key8 >> [junit] key15 >> [junit] key22 >> [junit] key6 >> [junit] key25 >> [junit] Query w/ Range(key25,,10) result size: 3 >> [junit] key25 >> [junit] key14 >> [junit] key2 >> [junit] Query w/ Range(key2,,10), result size: 1 >> [junit] key2 >> >> I never make it back around to key 18 as expected, and I never see all of >> the keys. >> >> -Adam >> >> -----Original Message----- >> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] >> Sent: Fri 8/6/2010 11:45 AM >> To: user@cassandra.apache.org >> Subject: Re: error using get_range_slice with random partitioner >> >> Sounds like what you're seeing is in the client, but there was another >> duplicate bug with get_range_slice that was recently fixed on cassandra-0.6 >> branch. It's slated for 0.6.5 which will probably be out sometime this >> month, based on previous minor releases. >> >> https://issues.apache.org/jira/browse/CASSANDRA-1145 >> >> On Aug 6, 2010, at 10:29 AM, Adam Crain wrote: >> >>> Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I just >>> discovered that the client I'm using mutates the order of keys after >>> retrieving the result with the thrift API... pretty much making key >>> iteration impossible. So time to fork and see if they'll fix it :(. >>> >>> I'll review yours as soon as I get the client fixed that I'm using. >>> >>> Adam >>> >>> >>> -----Original Message----- >>> From: davevi...@gmail.com on behalf of Dave Viner >>> Sent: Fri 8/6/2010 11:28 AM >>> To: user@cassandra.apache.org >>> Subject: Re: error using get_range_slice with random partitioner >>> >>> Funny you should ask... I just went through the same exercise. >>> >>> You must use Cassandra 0.6.4. Otherwise you will get duplicate keys. >>> However, here is a snippet of perl that you can use. >>> >>> our $WANTED_COLUMN_NAME = 'mycol'; >>> get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM, >>> \%map); >>> >>> sub get_key_to_one_column_map >>> { >>> my ($keyspace, $column_family_name, $super_column_name, >>> $consistency_level, $returned_keys) = @_; >>> >>> >>> my($socket, $transport, $protocol, $client, $result, $predicate, >>> $column_parent, $keyrange); >>> >>> $column_parent = new Cassandra::ColumnParent(); >>> $column_parent->{'column_family'} = $column_family_name; >>> $column_parent->{'super_column'} = $super_column_name; >>> >>> $keyrange = new Cassandra::KeyRange({ >>> 'start_key' => '', 'end_key' => '', 'count' => 10 >>> }); >>> >>> >>> $predicate = new Cassandra::SlicePredicate(); >>> $predicate->{'column_names'} = [$WANTED_COLUMN_NAME]; >>> >>> eval >>> { >>> $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT); >>> $transport = new Thrift::BufferedTransport($socket, 1024, 1024); >>> $protocol = new Thrift::BinaryProtocol($transport); >>> $client = new Cassandra::CassandraClient($protocol); >>> $transport->open(); >>> >>> >>> my($next_start_key, $one_res, $iteration, $have_more, $value, >>> $local_count, $previous_start_key); >>> >>> $iteration = 0; >>> $have_more = 1; >>> while ($have_more == 1) >>> { >>> $iteration++; >>> $result = undef; >>> >>> $result = $client->get_range_slices($keyspace, $column_parent, >>> $predicate, $keyrange, $consistency_level); >>> >>> # on success, results is an array of objects. >>> >>> if (scalar(@$result) == 1) >>> { >>> # we only got 1 result... check to see if it's the >>> # same key as the start key... if so, we're done. >>> if ($result->[0]->{'key'} eq $keyrange->{'start_key'}) >>> { >>> $have_more = 0; >>> last; >>> } >>> } >>> >>> # check to see if we are starting with some value >>> # if so, we throw away the first result. >>> if ($keyrange->{'start_key'}) >>> { >>> shift(@$result); >>> } >>> if (scalar(@$result) == 0) >>> { >>> $have_more = 0; >>> last; >>> } >>> >>> $previous_start_key = $keyrange->{'start_key'}; >>> $local_count = 0; >>> >>> for (my $r = 0; $r < scalar(@$result); $r++) >>> { >>> $one_res = $result->[$r]; >>> $next_start_key = $one_res->{'key'}; >>> >>> $keyrange->{'start_key'} = $next_start_key; >>> >>> if (!exists($returned_keys->{$next_start_key})) >>> { >>> $have_more = 1; >>> $local_count++; >>> } >>> >>> >>> next if (scalar(@{ $one_res->{'columns'} }) == 0); >>> >>> $value = undef; >>> >>> for (my $i = 0; $i < scalar(@{ $one_res->{'columns'} }); >>> $i++) >>> { >>> if ($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq >>> $WANTED_COLUMN_NAME) >>> { >>> $value = >>> $one_res->{'columns'}->[$i]->{'column'}->{'value'}; >>> if (!exists($returned_keys->{$next_start_key})) >>> { >>> $returned_keys->{$next_start_key} = $value; >>> } >>> else >>> { >>> # NOTE: prior to Cassandra 0.6.4, the >>> get_range_slices returns duplicates sometimes. >>> #warn "Found second value for key >>> [$next_start_key] was [" . $returned_keys->{$next_start_key} . "] now >>> [$value]!"; >>> } >>> } >>> } >>> $have_more = 1; >>> } # end results loop >>> >>> if ($keyrange->{'start_key'} eq $previous_start_key) >>> { >>> $have_more = 0; >>> } >>> >>> } # end while() loop >>> >>> $transport->close(); >>> }; >>> if ($@) >>> { >>> warn "Problem with Cassandra: " . Dumper($@); >>> } >>> >>> # cleanup >>> undef $client; >>> undef $protocol; >>> undef $transport; >>> undef $socket; >>> } >>> >>> >>> HTH >>> Dave Viner >>> >>> On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain >>> <adam.cr...@greenenergycorp.com>wrote: >>> >>>> Thomas, >>>> >>>> That was indeed the source of the problem. I naively assumed that the token >>>> range would help me avoid retrieving duplicate rows. >>>> >>>> If you iterate over the keys, how do you avoid retrieving duplicate keys? I >>>> tried this morning and I seem to get odd results. Maybe this is just a >>>> consequence of the random partitioner. I really don't care about the order >>>> of the iteration, but only each key once and that I see all keys is >>>> important. >>>> >>>> -Adam >>>> >>>> >>>> -----Original Message----- >>>> From: th.hel...@gmail.com on behalf of Thomas Heller >>>> Sent: Fri 8/6/2010 7:27 AM >>>> To: user@cassandra.apache.org >>>> Subject: Re: error using get_range_slice with random partitioner >>>> >>>> Wild guess here, but are you using start_token/end_token here when you >>>> should be using start_key? Looks to me like you are trying end_token >>>> = ''. >>>> >>>> HTH, >>>> /thomas >>>> >>>> On Thursday, August 5, 2010, Adam Crain <adam.cr...@greenenergycorp.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated >>>> that iterating over the keys in keyspace is possible, even with the random >>>> partitioner. This is mostly desirable in my case for testing purposes only. >>>>> >>>>> I get the following error: >>>>> >>>>> [junit] Internal error processing get_range_slices >>>>> [junit] org.apache.thrift.TApplicationException: Internal error >>>> processing get_range_slices >>>>> >>>>> and the following server traceback: >>>>> >>>>> java.lang.NumberFormatException: Zero length BigInteger >>>>> at java.math.BigInteger.<init>(BigInteger.java:295) >>>>> at java.math.BigInteger.<init>(BigInteger.java:467) >>>>> at >>>> org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100) >>>>> at >>>> org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575) >>>>> >>>>> I am using the scala cascal client, but am sure that get_range_slice is >>>> being called with start and stop set to "". >>>>> >>>>> 1) Is batch iteration possible with random partioner? >>>>> >>>>> This isn't clear from the FAQ entry on the subject: >>>>> >>>>> http://wiki.apache.org/cassandra/FAQ#iter_world >>>>> >>>>> 2) The FAQ states that start argument should be "". What should the end >>>> argument be? >>>>> >>>>> thanks! >>>>> Adam >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> <winmail.dat> >> >> >> >> >> >> <winmail.dat> > > > > > >