This is an issue with using the BOP. If you are just starting out stick with the Random Partitioner.
Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/10/2012, at 10:33 AM, Andrey Ilinykh <ailin...@gmail.com> wrote: > It was my first thought. > Then I md5 uuid and used the digest as a key: > > MessageDigest md = MessageDigest.getInstance("MD5"); > > //in the loop > UUID uuid = UUID.randomUUID(); > byte[] bytes = md.digest(asByteArray(uuid)); > > the result is exactly the same, first node takes 66%, second 33% and > third one is empty. for some reason rows which should be placed on > third node moved to first one. > > Address DC Rack Status State Load > Effective-Ownership Token > > > Token(bytes[56713727820156410577229101238628035242]) > 127.0.0.1 datacenter1 rack1 Up Normal 7.68 MB > 33.33% Token(bytes[00]) > 127.0.0.3 datacenter1 rack1 Up Normal 79.17 KB > 33.33% > Token(bytes[0113427455640312821154458202477256070485]) > 127.0.0.2 datacenter1 rack1 Up Normal 3.81 MB > 33.33% > Token(bytes[56713727820156410577229101238628035242]) > > > > On Thu, Oct 4, 2012 at 12:33 AM, Tom <fivemile...@gmail.com> wrote: >> Hi Andrey, >> >> while the data values you generated might be following a true random >> distribution, your row key, UUID, is not (because it is created on the same >> machines by the same software with a certain window of time) >> >> For example, if you were using the UUID class in Java, these would be >> composed from several components (related to dimensions such as time and >> version), so you can not expect a random distribution over the whole space. >> >> >> Cheers >> Tom >> >> >> >> >> On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh <ailin...@gmail.com> wrote: >>> >>> Hello, everybody! >>> >>> I'm observing very strange behavior. I have 3 node cluster with >>> ByteOrderPartitioner. (I run 1.1.5) >>> I created a key space with replication factor of 1. >>> Then I created one column family and populated it with random data. >>> I use UUID as a row key, and Integer as a column name. >>> Row keys were generated as >>> >>> UUID uuid = UUID.randomUUID(); >>> >>> I populated about 100000 rows with 100 column each. >>> >>> I would expect equal load on each node, but the result is totally >>> different. This is what nodetool gives me: >>> >>> Address DC Rack Status State Load >>> Effective-Ownership Token >>> >>> >>> Token(bytes[56713727820156410577229101238628035242]) >>> 127.0.0.1 datacenter1 rack1 Up Normal 27.61 MB >>> 33.33% Token(bytes[00]) >>> 127.0.0.3 datacenter1 rack1 Up Normal 206.47 KB >>> 33.33% >>> Token(bytes[0113427455640312821154458202477256070485]) >>> 127.0.0.2 datacenter1 rack1 Up Normal 13.86 MB >>> 33.33% >>> Token(bytes[56713727820156410577229101238628035242]) >>> >>> >>> one node (127.0.0.3) is almost empty. >>> Any ideas what is wrong? >>> >>> >>> Thank you, >>> Andrey >> >>