Appreciate the insightful replies. Understood Sylvain's argument that
having different partitioning locally and globally could create
problem in data movement.

Edward, for a given sstable in a node, why having lexicographically
closer  rows clumped together should matter ? Anyways the lookups for
a key in a sstable is based on sequential reads ( after first random
I/O ) ?

-Thanks,
Prasenjit

On Sat, Jul 14, 2012 at 6:58 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> There is a more subtle and profound aspect here as well. The md5
> transformation is a hash that is good at spraying data around the hash
> space evenly. For a given sstable their should be good entropy where
> if the data was not transformed it could be "clumpy" and the sorted
> string structure and indexes would not be as effective.
>
> On Sat, Jul 14, 2012 at 5:50 AM, Sylvain Lebresne <sylv...@datastax.com> 
> wrote:
>>> Any reason  rowkeys are not stored by their raws keys on a given node
>>> for RP ? I understand the partitioning across nodes should be
>>> randomized, but on a given node why they are sorted by hash of their
>>> keys and not just by the raw keys ?
>>
>> All the operation that change the topology of the cluster
>> (adding/removing/moving a node, but repair too) requires that we are able to
>> transfer data corresponding to token ranges. If rows were ordered locally by
>> their raw key rather than their token, those operation would require a node 
>> to
>> read all it's data even to transfer a small amount of it. This would be 
>> highly
>> inefficient.
>>
>> --
>> Sylvain

Reply via email to