On Sat, Jul 7, 2012 at 11:17 AM, prasenjit mukherjee <prasen....@gmail.com> wrote: > Have 2 questions : > > 1. In RP on a given node, are the rows ordered by hash(key) or key ? > If the rows on a node are ordered by hash(key) then essentially it has > to be implemented by a full-scan on that node. > > 2. In RP, How does a cassandra node route a client's range-query > request ? The range is distributed across the ring, so essentially > either it send has to send the request to all nodes in the ring or > just do a local processing. > > On Sat, Jul 7, 2012 at 7:47 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: >> On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee >> <prasen....@gmail.com> wrote: >>> Wondering how a rangequery request is handled if RP is used. Will the >>> receiving node do a fan-out to all the nodes in the ring or it will >>> just execute the rangequery on its own local partition ? >>> >>> -Prasenjit >> >> With RP the data is still ordered. It is ordered pseudo randomly. Like >> all ranging scanning you can start with the null start row key for >> your first range scan. Then for the next range scan use the last row >> key from your results from the first scan. 1) http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/RandomPartitioner.java?view=markup
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/AbstractPartitioner.java?revision=1208993&view=markup 2) A single range slice is not handled by all nodes in the cluster. The request is routed to one or more of the natural endpoints for the range. An exception would be a range slice that crosses a token boundary of a node. Random Partitioner is not actually random the data is ordered by the hash of the key. Thus data is in predictable location and repeated range scans return the same order. However because md5 generates drastically different hashes for similar keys like data will not clump together. To put it another way, if you have a 10 node cluster with RP and you with to range scan the entire dataset, 0 - >2^128 (or whatever that big number is) you will notice that the range scans first make three of the nodes busy, then a forth node starts taking requests as the first nodes starts getting less requests, finally the first node gets no more requests and so on. Another option is that row keys can now be composite and cassandra will use the first part of the composite to locate the node and the second part of the composite to order the data. Sweet!