Thanks for the hint and tool ! By the way, what does the --shards parameter means ?
Thanks Loic On 07/10/2017 05:20 PM, Avi Kivity wrote: > 32 tokens is too few for 33 nodes. I have a sharding simulator [1] and > it shows > > > $ ./shardsim --vnodes 32 --nodes 33 --shards 1 > 33 nodes, 32 vnodes, 1 shards > maximum node overcommit: 1.42642 > maximum shard overcommit: 1.426417 > > > So 40% overcommit over the average. Since some nodes can be > undercommitted, this easily explains the 2X difference (40% overcommit + > 30% undercommit = 2X). > > > Newer versions of Cassandra have better token selection and will suffer > less from this. > > > > [1] https://github.com/avikivity/shardsim > > > On 07/10/2017 04:02 PM, Loic Lambiel wrote: >> Hi, >> >> One of our clusters is becoming somehow unbalanced, at least some of the >> nodes: >> >> (output edited to remove unnecessary information) >> -- Address Load Tokens Owns (effective) Rack >> UN 192.168.1.22 2.99 TB 32 10.6% RACK1 >> UN 192.168.1.23 3.35 TB 32 11.7% RACK1 >> UN 192.168.1.20 3.22 TB 32 11.3% RACK1 >> UN 192.168.1.21 3.21 TB 32 11.2% RACK1 >> UN 192.168.1.18 2.87 TB 32 10.3% RACK1 >> UN 192.168.1.19 3.49 TB 32 12.0% RACK1 >> UN 192.168.1.16 5.32 TB 32 12.9% RACK1 >> UN 192.168.1.17 3.77 TB 32 12.0% RACK1 >> UN 192.168.1.26 4.46 TB 32 11.2% RACK1 >> UN 192.168.1.24 3.24 TB 32 11.4% RACK1 >> UN 192.168.1.25 3.31 TB 32 11.2% RACK1 >> UN 192.168.1.134 2.75 TB 18 7.2% RACK1 >> UN 192.168.1.135 2.52 TB 18 6.0% RACK1 >> UN 192.168.1.132 1.85 TB 18 6.8% RACK1 >> UN 192.168.1.133 2.41 TB 18 5.7% RACK1 >> UN 192.168.1.130 2.95 TB 18 7.1% RACK1 >> UN 192.168.1.131 2.82 TB 18 6.7% RACK1 >> UN 192.168.1.128 3.04 TB 18 7.1% RACK1 >> UN 192.168.1.129 2.47 TB 18 7.2% RACK1 >> UN 192.168.1.14 5.63 TB 32 13.4% RACK1 >> UN 192.168.1.15 2.95 TB 32 10.4% RACK1 >> UN 192.168.1.12 3.83 TB 32 12.4% RACK1 >> UN 192.168.1.13 2.71 TB 32 9.5% RACK1 >> UN 192.168.1.10 3.51 TB 32 11.9% RACK1 >> UN 192.168.1.11 2.96 TB 32 10.3% RACK1 >> UN 192.168.1.126 2.48 TB 18 6.7% RACK1 >> UN 192.168.1.127 2.23 TB 18 5.5% RACK1 >> UN 192.168.1.124 2.05 TB 18 5.5% RACK1 >> UN 192.168.1.125 2.33 TB 18 5.8% RACK1 >> UN 192.168.1.122 1.99 TB 18 5.1% RACK1 >> UN 192.168.1.123 2.44 TB 18 5.7% RACK1 >> UN 192.168.1.120 3.58 TB 28 11.4% RACK1 >> UN 192.168.1.121 2.33 TB 18 6.8% RACK1 >> >> Notice the node 192.168.1.14 owns 13.4% / 5.63TB while node >> 192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load. >> They both have 32 tokens. >> >> The cluster is running: >> >> * Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes >> enabled) >> * RF=3 with single DC and single rack. LCS as the compaction strategy, >> JBOD storage >> * Partitioner: org.apache.cassandra.dht.Murmur3Partitioner >> * Node cleanup performed on all nodes >> >> Almost all of the cluster load comes from a single CF: >> >> CREATE TABLE blobstore.block ( >> inode uuid, >> version timeuuid, >> block bigint, >> offset bigint, >> chunksize int, >> payload blob, >> PRIMARY KEY ((inode, version, block), offset) >> ) WITH CLUSTERING ORDER BY (offset ASC) >> AND bloom_filter_fp_chance = 0.01 >> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >> AND comment = '' >> AND compaction = {'tombstone_threshold': '0.1', >> 'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction': >> 'false', 'class': >> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} >> AND compression = {'sstable_compression': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 172000 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99.0PERCENTILE'; >> >> The payload column is almost the same size in each record. >> >> I understand that an unbalanced cluster may be the result of a bad >> Primary key, which I believe isn't the case here. >> >> Any clue on what could be the cause ? How can I re-balance it without >> any decommission ? >> >> My understanding is that nodetool move may only be used when not using >> the vnodes feature. >> >> Any help appreciated, thanks ! >> >> ---- >> Loic Lambiel >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > -- Loic Lambiel Head of Operations Tel : +41 78 649 53 93 loic.lamb...@exoscale.ch ❬❱ https://www.exoscale.ch --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org