Achieve uniform data distribution easily

VincentCE Wed, 17 Jun 2020 11:54:16 -0700

In our project a lot of data records share the same keys, i.e. a given key
might exist in several of our caches (with different values though). We
suspect that this is the reason for our non-uniform data-distribution over
the cluster-nodes that we are facing since all data-records that share the
same keys should finally be placed on the same cluster-nodes. Concerning the
latter an important note: So far our keys are just strings and we have not
used affinity keys so far to influence the assignments to partitions
respectively nodes.


In order to achieve roughly uniform data-distribution but keep the impact in
terms of code refactoring small we thought of extending
RendezvousAffinityFunction in the following manner.

public class CustomizedRendezvousAffinityFunction extends
RendezvousAffinityFunction {

        private static final long serialVersionUID = 1L;
        private final String cacheName;

        /**
         * Constructor.
         * 
         * @param cacheName name of cache
         */
        public CustomizedRendezvousAffinityFunction(String cacheName) {
                this.cacheName = cacheName;
        }

        @Override
        public int partition(Object key) {
                key = new 
StringBuilder(cacheName).append(key.toString()).toString();
                return super.partition(key);
        }

}

followed by cc.setAffinity(new
CustomizedRendezvousAffinityFunction(cacheName)); for a given
cache-configuration. The concatenation key= cacheName/key is then indeed
unique and the assignment to a partition should be (pseudo-)random.

Unfortunately the result for 10 cluster-nodes is not as expected, i.e. there
is a lot of variance, e.g. between node1 and node7: 

node1 :     ^-- Off-heap [used=23633MB, free=34.43%, comm=35920MB]
node2 :     ^-- Off-heap [used=29234MB, free=18.88%, comm=35920MB]
node3 :     ^-- Off-heap [used=25834MB, free=28.32%, comm=35920MB]
node4 :     ^-- Off-heap [used=27847MB, free=22.73%, comm=35920MB]
node5 :     ^-- Off-heap [used=28189MB, free=21.78%, comm=35920MB]
node6 :     ^-- Off-heap [used=26167MB, free=27.39%, comm=35920MB]
node7:     ^-- Off-heap [used=29565MB, free=17.96%, comm=35920MB]
node8 :     ^-- Off-heap [used=25628MB, free=28.89%, comm=35920MB]
node9 :     ^-- Off-heap [used=26583MB, free=26.24%, comm=35920MB]
node10 :     ^-- Off-heap [used=24573MB, free=31.82%, comm=35920MB]

Is our approach flawed or what might be the reason for the remaining
imbalance? We use ignite 2.7.6.

Thanks in advance!




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Achieve uniform data distribution easily

Reply via email to