Re: error using get_range_slice with random partitioner

Dave Viner Fri, 06 Aug 2010 08:26:52 -0700

Funny you should ask... I just went through the same exercise.

You must use Cassandra 0.6.4.  Otherwise you will get duplicate keys.
 However, here is a snippet of perl that you can use.


our $WANTED_COLUMN_NAME = 'mycol';
get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM,
\%map);

sub get_key_to_one_column_map
{
    my ($keyspace, $column_family_name, $super_column_name,
$consistency_level, $returned_keys) = @_;


    my($socket, $transport, $protocol, $client, $result, $predicate,
$column_parent, $keyrange);

    $column_parent = new Cassandra::ColumnParent();
    $column_parent->{'column_family'} = $column_family_name;
    $column_parent->{'super_column'} = $super_column_name;

    $keyrange = new Cassandra::KeyRange({
            'start_key' => '', 'end_key' => '', 'count' => 10
    });


    $predicate = new Cassandra::SlicePredicate();
    $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];

    eval
    {
        $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT);
        $transport = new Thrift::BufferedTransport($socket, 1024, 1024);
        $protocol = new Thrift::BinaryProtocol($transport);
        $client = new Cassandra::CassandraClient($protocol);
        $transport->open();


        my($next_start_key, $one_res, $iteration, $have_more, $value,
$local_count, $previous_start_key);

        $iteration = 0;
        $have_more = 1;
        while ($have_more == 1)
        {
            $iteration++;
            $result = undef;

            $result = $client->get_range_slices($keyspace, $column_parent,
$predicate, $keyrange, $consistency_level);

            # on success, results is an array of objects.

            if (scalar(@$result) == 1)
            {
                # we only got 1 result... check to see if it's the
                # same key as the start key... if so, we're done.
                if ($result->[0]->{'key'} eq $keyrange->{'start_key'})
                {
                    $have_more = 0;
                    last;
                }
            }

            # check to see if we are starting with some value
            # if so, we throw away the first result.
            if ($keyrange->{'start_key'})
            {
                shift(@$result);
            }
            if (scalar(@$result) == 0)
            {
                $have_more = 0;
                last;
            }

            $previous_start_key = $keyrange->{'start_key'};
            $local_count = 0;

            for (my $r = 0; $r < scalar(@$result); $r++)
            {
                $one_res = $result->[$r];
                $next_start_key = $one_res->{'key'};

                $keyrange->{'start_key'} = $next_start_key;

                if (!exists($returned_keys->{$next_start_key}))
                {
                    $have_more = 1;
                    $local_count++;
                }


                next if (scalar(@{ $one_res->{'columns'} }) == 0);

                $value = undef;

                for (my $i = 0; $i < scalar(@{ $one_res->{'columns'} });
$i++)
                {
                    if ($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq
$WANTED_COLUMN_NAME)
                    {
                        $value =
$one_res->{'columns'}->[$i]->{'column'}->{'value'};
                        if (!exists($returned_keys->{$next_start_key}))
                        {
                            $returned_keys->{$next_start_key} = $value;
                        }
                        else
                        {
                            # NOTE: prior to Cassandra 0.6.4, the
get_range_slices returns duplicates sometimes.
                            #warn "Found second value for key
[$next_start_key]  was [" . $returned_keys->{$next_start_key} . "] now
[$value]!";
                        }
                    }
                }
                $have_more = 1;
            } # end results loop

            if ($keyrange->{'start_key'} eq $previous_start_key)
            {
                $have_more = 0;
            }

        } # end while() loop

        $transport->close();
    };
    if ($@)
    {
        warn "Problem with Cassandra: " . Dumper($@);
    }

    # cleanup
    undef $client;
    undef $protocol;
    undef $transport;
    undef $socket;
}


HTH
Dave Viner

On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain
<adam.cr...@greenenergycorp.com>wrote:

> Thomas,
>
> That was indeed the source of the problem. I naively assumed that the token
> range would help me avoid retrieving duplicate rows.
>
> If you iterate over the keys, how do you avoid retrieving duplicate keys? I
> tried this morning and I seem to get odd results. Maybe this is just a
> consequence of the random partitioner. I really don't care about the order
> of the iteration, but only each key once and that I see all keys is
> important.
>
> -Adam
>
>
> -----Original Message-----
> From: th.hel...@gmail.com on behalf of Thomas Heller
> Sent: Fri 8/6/2010 7:27 AM
> To: user@cassandra.apache.org
> Subject: Re: error using get_range_slice with random partitioner
>
> Wild guess here, but are you using start_token/end_token here when you
> should be using start_key? Looks to me like you are trying end_token
> = ''.
>
> HTH,
> /thomas
>
> On Thursday, August 5, 2010, Adam Crain <adam.cr...@greenenergycorp.com>
> wrote:
> > Hi,
> >
> > I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated
> that iterating over the keys in keyspace is possible, even with the random
> partitioner. This is mostly desirable in my case for testing purposes only.
> >
> > I get the following error:
> >
> > [junit] Internal error processing get_range_slices
> > [junit] org.apache.thrift.TApplicationException: Internal error
> processing get_range_slices
> >
> > and the following server traceback:
> >
> > java.lang.NumberFormatException: Zero length BigInteger
> >         at java.math.BigInteger.<init>(BigInteger.java:295)
> >         at java.math.BigInteger.<init>(BigInteger.java:467)
> >         at
> org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
> >         at
> org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
> >
> > I am using the scala cascal client, but am sure that get_range_slice is
> being called with start and stop set to "".
> >
> > 1) Is batch iteration possible with random partioner?
> >
> > This isn't clear from the FAQ entry on the subject:
> >
> > http://wiki.apache.org/cassandra/FAQ#iter_world
> >
> > 2) The FAQ states that start argument should be "". What should the end
> argument be?
> >
> > thanks!
> > Adam
> >
> >
> >
> >
> >
> >
>
>
>
>
>

Re: error using get_range_slice with random partitioner

Reply via email to