There is a more subtle and profound aspect here as well. The md5
transformation is a hash that is good at spraying data around the hash
space evenly. For a given sstable their should be good entropy where
if the data was not transformed it could be "clumpy" and the sorted
string structure and indexes would not be as effective.

On Sat, Jul 14, 2012 at 5:50 AM, Sylvain Lebresne <sylv...@datastax.com> wrote:
>> Any reason  rowkeys are not stored by their raws keys on a given node
>> for RP ? I understand the partitioning across nodes should be
>> randomized, but on a given node why they are sorted by hash of their
>> keys and not just by the raw keys ?
>
> All the operation that change the topology of the cluster
> (adding/removing/moving a node, but repair too) requires that we are able to
> transfer data corresponding to token ranges. If rows were ordered locally by
> their raw key rather than their token, those operation would require a node to
> read all it's data even to transfer a small amount of it. This would be highly
> inefficient.
>
> --
> Sylvain

Reply via email to