I ran against the 0.6 branch I still see similarly odd results. My test cases prove that set of keys have been successfully inserted, but usually I never see the first key again or I reach the first key before having seen all of the keys.
-Adam -----Original Message----- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Fri 8/6/2010 4:25 PM To: user@cassandra.apache.org Subject: Re: error using get_range_slice with random partitioner If you're willing to try it out, the easiest way to check to see if it is resolved by the patch for CASSANDRA-1145, you could checkout the 0.6 branch: svn checkout http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/ cassandra-0.6 Then run `ant` to build the binaries. On Aug 6, 2010, at 2:57 PM, Adam Crain wrote: > Hi Jeremy, > > So, I fixed my client so it preserves the ordering and I get results that may > be related to the bug. > > If I insert 30 keys into the random partitioner with names [key1, key2, ... > key30] and then start the iteration (with a batch size of 10) I get the > following debug output during the iteration: > > [junit] Query w/ Range(,,10) result size: 10 > [junit] key18 > [junit] key23 > [junit] key26 > [junit] key27 > [junit] key12 > [junit] key28 > [junit] key4 > [junit] key3 > [junit] key1 > [junit] key24 > [junit] Query w/ Range(key24,,10) result size: 10 > [junit] key24 > [junit] key5 > [junit] key17 > [junit] key29 > [junit] key19 > [junit] key8 > [junit] key15 > [junit] key22 > [junit] key6 > [junit] key25 > [junit] Query w/ Range(key25,,10) result size: 3 > [junit] key25 > [junit] key14 > [junit] key2 > [junit] Query w/ Range(key2,,10), result size: 1 > [junit] key2 > > I never make it back around to key 18 as expected, and I never see all of the > keys. > > -Adam > > -----Original Message----- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Fri 8/6/2010 11:45 AM > To: user@cassandra.apache.org > Subject: Re: error using get_range_slice with random partitioner > > Sounds like what you're seeing is in the client, but there was another > duplicate bug with get_range_slice that was recently fixed on cassandra-0.6 > branch. It's slated for 0.6.5 which will probably be out sometime this > month, based on previous minor releases. > > https://issues.apache.org/jira/browse/CASSANDRA-1145 > > On Aug 6, 2010, at 10:29 AM, Adam Crain wrote: > >> Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I just >> discovered that the client I'm using mutates the order of keys after >> retrieving the result with the thrift API... pretty much making key >> iteration impossible. So time to fork and see if they'll fix it :(. >> >> I'll review yours as soon as I get the client fixed that I'm using. >> >> Adam >> >> >> -----Original Message----- >> From: davevi...@gmail.com on behalf of Dave Viner >> Sent: Fri 8/6/2010 11:28 AM >> To: user@cassandra.apache.org >> Subject: Re: error using get_range_slice with random partitioner >> >> Funny you should ask... I just went through the same exercise. >> >> You must use Cassandra 0.6.4. Otherwise you will get duplicate keys. >> However, here is a snippet of perl that you can use. >> >> our $WANTED_COLUMN_NAME = 'mycol'; >> get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM, >> \%map); >> >> sub get_key_to_one_column_map >> { >> my ($keyspace, $column_family_name, $super_column_name, >> $consistency_level, $returned_keys) = @_; >> >> >> my($socket, $transport, $protocol, $client, $result, $predicate, >> $column_parent, $keyrange); >> >> $column_parent = new Cassandra::ColumnParent(); >> $column_parent->{'column_family'} = $column_family_name; >> $column_parent->{'super_column'} = $super_column_name; >> >> $keyrange = new Cassandra::KeyRange({ >> 'start_key' => '', 'end_key' => '', 'count' => 10 >> }); >> >> >> $predicate = new Cassandra::SlicePredicate(); >> $predicate->{'column_names'} = [$WANTED_COLUMN_NAME]; >> >> eval >> { >> $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT); >> $transport = new Thrift::BufferedTransport($socket, 1024, 1024); >> $protocol = new Thrift::BinaryProtocol($transport); >> $client = new Cassandra::CassandraClient($protocol); >> $transport->open(); >> >> >> my($next_start_key, $one_res, $iteration, $have_more, $value, >> $local_count, $previous_start_key); >> >> $iteration = 0; >> $have_more = 1; >> while ($have_more == 1) >> { >> $iteration++; >> $result = undef; >> >> $result = $client->get_range_slices($keyspace, $column_parent, >> $predicate, $keyrange, $consistency_level); >> >> # on success, results is an array of objects. >> >> if (scalar(@$result) == 1) >> { >> # we only got 1 result... check to see if it's the >> # same key as the start key... if so, we're done. >> if ($result->[0]->{'key'} eq $keyrange->{'start_key'}) >> { >> $have_more = 0; >> last; >> } >> } >> >> # check to see if we are starting with some value >> # if so, we throw away the first result. >> if ($keyrange->{'start_key'}) >> { >> shift(@$result); >> } >> if (scalar(@$result) == 0) >> { >> $have_more = 0; >> last; >> } >> >> $previous_start_key = $keyrange->{'start_key'}; >> $local_count = 0; >> >> for (my $r = 0; $r < scalar(@$result); $r++) >> { >> $one_res = $result->[$r]; >> $next_start_key = $one_res->{'key'}; >> >> $keyrange->{'start_key'} = $next_start_key; >> >> if (!exists($returned_keys->{$next_start_key})) >> { >> $have_more = 1; >> $local_count++; >> } >> >> >> next if (scalar(@{ $one_res->{'columns'} }) == 0); >> >> $value = undef; >> >> for (my $i = 0; $i < scalar(@{ $one_res->{'columns'} }); >> $i++) >> { >> if ($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq >> $WANTED_COLUMN_NAME) >> { >> $value = >> $one_res->{'columns'}->[$i]->{'column'}->{'value'}; >> if (!exists($returned_keys->{$next_start_key})) >> { >> $returned_keys->{$next_start_key} = $value; >> } >> else >> { >> # NOTE: prior to Cassandra 0.6.4, the >> get_range_slices returns duplicates sometimes. >> #warn "Found second value for key >> [$next_start_key] was [" . $returned_keys->{$next_start_key} . "] now >> [$value]!"; >> } >> } >> } >> $have_more = 1; >> } # end results loop >> >> if ($keyrange->{'start_key'} eq $previous_start_key) >> { >> $have_more = 0; >> } >> >> } # end while() loop >> >> $transport->close(); >> }; >> if ($@) >> { >> warn "Problem with Cassandra: " . Dumper($@); >> } >> >> # cleanup >> undef $client; >> undef $protocol; >> undef $transport; >> undef $socket; >> } >> >> >> HTH >> Dave Viner >> >> On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain >> <adam.cr...@greenenergycorp.com>wrote: >> >>> Thomas, >>> >>> That was indeed the source of the problem. I naively assumed that the token >>> range would help me avoid retrieving duplicate rows. >>> >>> If you iterate over the keys, how do you avoid retrieving duplicate keys? I >>> tried this morning and I seem to get odd results. Maybe this is just a >>> consequence of the random partitioner. I really don't care about the order >>> of the iteration, but only each key once and that I see all keys is >>> important. >>> >>> -Adam >>> >>> >>> -----Original Message----- >>> From: th.hel...@gmail.com on behalf of Thomas Heller >>> Sent: Fri 8/6/2010 7:27 AM >>> To: user@cassandra.apache.org >>> Subject: Re: error using get_range_slice with random partitioner >>> >>> Wild guess here, but are you using start_token/end_token here when you >>> should be using start_key? Looks to me like you are trying end_token >>> = ''. >>> >>> HTH, >>> /thomas >>> >>> On Thursday, August 5, 2010, Adam Crain <adam.cr...@greenenergycorp.com> >>> wrote: >>>> Hi, >>>> >>>> I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated >>> that iterating over the keys in keyspace is possible, even with the random >>> partitioner. This is mostly desirable in my case for testing purposes only. >>>> >>>> I get the following error: >>>> >>>> [junit] Internal error processing get_range_slices >>>> [junit] org.apache.thrift.TApplicationException: Internal error >>> processing get_range_slices >>>> >>>> and the following server traceback: >>>> >>>> java.lang.NumberFormatException: Zero length BigInteger >>>> at java.math.BigInteger.<init>(BigInteger.java:295) >>>> at java.math.BigInteger.<init>(BigInteger.java:467) >>>> at >>> org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100) >>>> at >>> org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575) >>>> >>>> I am using the scala cascal client, but am sure that get_range_slice is >>> being called with start and stop set to "". >>>> >>>> 1) Is batch iteration possible with random partioner? >>>> >>>> This isn't clear from the FAQ entry on the subject: >>>> >>>> http://wiki.apache.org/cassandra/FAQ#iter_world >>>> >>>> 2) The FAQ states that start argument should be "". What should the end >>> argument be? >>>> >>>> thanks! >>>> Adam >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >> >> <winmail.dat> > > > > > > <winmail.dat>
<<winmail.dat>>