BTW one other thing that I have not been able to debug today that maybe
someone can help me with:

I am using a three-node Cassandra cluster with Vagrant.  The nodes in my
cluster are 192.168.200.11, 192.168.200.12, and 192.168.200.13.

If I use cqlsh to connect to 192.168.200.11, I see unique sets of tokens
when I run the following three commands:

select tokens from system.local
select tokens from system.peers where peer=192.168.200.12
select tokens from system.peers where peer=192.168.200.13

This is what I expect.  However, when I tried making an application with
the Java driver that does the following:


   - Create a Session by connecting to 192.168.200.11
   - From that session, "select tokens from system.local"
   - From that session, "select tokens, peer from system.peers"

Now I get the exact-same set of tokens from system.local and from the row
in system.peers in which peer=192.168.200.13.

Anyone have any idea why this would happen?  I'm not sure how to debug
this.  I see the following log from the Java driver:

14/03/30 19:05:24 DEBUG com.datastax.driver.core.Cluster: Starting new
cluster with contact points [/192.168.200.11]
14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra host
/192.168.200.13 added
14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra host
/192.168.200.12 added

I'm running Cassandra 2.0.6 in the virtual machine and I built my
application with version 2.0.1 of the driver.

Best regards,
Clint







On Sun, Mar 30, 2014 at 4:51 PM, Clint Kelly <clint.ke...@gmail.com> wrote:

> Hi all,
>
>
> I am working on a Hadoop InputFormat implementation that uses only the
> native protocol Java driver and not the Thrift API.  I am currently trying
> to replicate some of the behavior of
> *Cassandra.client.describe_ring(myKeyspace)* from the Thrift API.  I
> would like to do the following:
>
>    - Get a list of all of the token ranges for a cluster
>    - For every token range, determine the replica nodes on which the data
>    in the token range resides
>    - Estimate the number of rows for every range of tokens
>    - Groups ranges of tokens on common replica nodes such that we can
>    create a set of input splits for Hadoop with total estimated line counts
>    that are reasonably close to the requested split size
>
> Last week I received some much-appreciated help on this list that pointed
> me to using the system.peers table to get the list of token ranges for the
> cluster and the corresponding hosts.  Today I created a three-node C*
> cluster in Vagrant (https://github.com/dholbrook/vagrant-cassandra) and
> tried inspecting some of the system tables.  I have a couple of questions
> now:
>
> 1. *How many total unique tokens should I expect to see in my cluster?*
> If I have three nodes, and each node has a cassandra.yaml with num_tokens =
> 256, then should I expect a total of 256*3 = 768 distinct vnodes?
>
> 2. *How does the creation of vnodes and their assignment to nodes relate
> to the replication factor for a given keyspace?*  I never thought about
> this until today, and I tried to reread the documentation on virtual nodes,
> replication in Cassandra, etc., and now I am sadly still confused.  Here is
> what I think I understand.  :)
>
>    - Given a row with a partition key, any client request for an
>    operation on that row will go to a coordinator node in the cluster.
>    - The coordinator node will compute the token value for the row and
>    from that determine a set of replica nodes for that token.
>       - One of the replica nodes I assume is the node that "owns" the
>       vnode with the token range that encompasses the token
>       - The identity of the "owner" of this virtual node is a
>       cross-keyspace property
>       - And the other replicas were originally chosen based on the
>       replica-placement strategy
>       - And therefore the other replicas will be different for each
>       keyspace (because replication factors and replica-placement strategy are
>       properties of a keyspace)
>
> 3. What do the values in the "token" column in system.peers and
> system.local refer to then?
>
>    - Since these tables appear to be global, and not per-keyspace
>    properties, I assume that they don't have any information about replication
>    in them, is that correct?
>    - If I have three nodes in my cluster, 256 vnodes per node, and I'm
>    using the Murmur3 partitioner, should I then expect to see the values of
>    "tokens" in system.peers and system.local be 768 evenly-distributed values
>    between -2^63 and 2^63?
>
> 4. Is there any other way, without using Thift, to get as much information
> as possible about what nodes contain replicas of data for all of the token
> ranges in a given cluster?
>
> I really appreciate any help, thanks!
>
> Best regards,
> Clint
>

Reply via email to