Re: Why data is not even distributed.

aaron morton Mon, 08 Oct 2012 12:45:53 -0700

This is an issue with using the BOP. 

If you are just starting out stick with the Random Partitioner.


Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2012, at 10:33 AM, Andrey Ilinykh <ailin...@gmail.com> wrote:

> It was my first thought.
> Then I md5 uuid and used the digest as a key:
> 
> MessageDigest md = MessageDigest.getInstance("MD5");
> 
> //in the loop
> UUID uuid = UUID.randomUUID();
> byte[] bytes = md.digest(asByteArray(uuid));
> 
> the result is exactly the same, first node takes 66%, second 33% and
> third one is empty. for some reason rows which should be placed on
> third node moved to first one.
> 
> Address         DC          Rack        Status State   Load
> Effective-Ownership Token
> 
> 
> Token(bytes[56713727820156410577229101238628035242])
> 127.0.0.1       datacenter1 rack1       Up     Normal  7.68 MB
> 33.33%              Token(bytes[00])
> 127.0.0.3       datacenter1 rack1       Up     Normal  79.17 KB
> 33.33%
> Token(bytes[0113427455640312821154458202477256070485])
> 127.0.0.2       datacenter1 rack1       Up     Normal  3.81 MB
> 33.33%
> Token(bytes[56713727820156410577229101238628035242])
> 
> 
> 
> On Thu, Oct 4, 2012 at 12:33 AM, Tom <fivemile...@gmail.com> wrote:
>> Hi Andrey,
>> 
>> while the data values you generated might be following a true random
>> distribution, your row key, UUID, is not (because it is created on the same
>> machines by the same software with a certain window of time)
>> 
>> For example, if you were using the UUID class in Java, these would be
>> composed from several components (related to dimensions such as time and
>> version), so you can not expect a random distribution over the whole space.
>> 
>> 
>> Cheers
>> Tom
>> 
>> 
>> 
>> 
>> On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh <ailin...@gmail.com> wrote:
>>> 
>>> Hello, everybody!
>>> 
>>> I'm observing very strange behavior. I have 3 node cluster with
>>> ByteOrderPartitioner. (I run 1.1.5)
>>> I created a key space with replication factor of 1.
>>> Then I created one column family and populated it with random data.
>>> I use UUID as a row key, and Integer as a column name.
>>> Row keys were generated as
>>> 
>>> UUID uuid = UUID.randomUUID();
>>> 
>>> I populated about 100000 rows with 100 column each.
>>> 
>>> I would expect equal load on each node, but the result is totally
>>> different. This is what nodetool gives me:
>>> 
>>> Address         DC          Rack        Status State   Load
>>> Effective-Ownership Token
>>> 
>>> 
>>> Token(bytes[56713727820156410577229101238628035242])
>>> 127.0.0.1       datacenter1 rack1       Up     Normal  27.61 MB
>>> 33.33%              Token(bytes[00])
>>> 127.0.0.3       datacenter1 rack1       Up     Normal  206.47 KB
>>> 33.33%
>>> Token(bytes[0113427455640312821154458202477256070485])
>>> 127.0.0.2       datacenter1 rack1       Up     Normal  13.86 MB
>>> 33.33%
>>> Token(bytes[56713727820156410577229101238628035242])
>>> 
>>> 
>>> one node (127.0.0.3) is almost empty.
>>> Any ideas what is wrong?
>>> 
>>> 
>>> Thank you,
>>>  Andrey
>> 
>>

Re: Why data is not even distributed.

Reply via email to