Do you use randompartitiner? What nodetool getendpoints show for several random keys?
-----Original Message----- From: Bart Swedrowski <b...@timedout.org> To: user@cassandra.apache.org Sent: Wed, 14 Dec 2011 12:56 Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes Anyone? On 12 December 2011 15:25, Bart Swedrowski <b...@timedout.org> wrote: > Hello everyone, > > I seem to have came across rather weird (at least for me!) problem / > behaviour with Cassandra. > > I am running a 4-nodes cluster on Cassandra 0.8.7. For the keyspace in > question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside > the KeySpace. On of the ColumnFamilies however seems to have data > distributed across only 3 out of 4 nodes. > > The data on the cluster beside the problematic ColumnFamily seems to be > more or less equal and even. > > # nodetool -h localhost ring > Address DC Rack Status State Load > Owns Token > > 127605887595351923798765477786913079296 > 192.168.81.2 datacenter1 rack1 Up Normal 7.27 GB > 25.00% 0 > 192.168.81.3 datacenter1 rack1 Up Normal 7.74 GB > 25.00% 42535295865117307932921825928971026432 > 192.168.81.4 datacenter1 rack1 Up Normal 7.38 GB > 25.00% 85070591730234615865843651857942052864 > 192.168.81.5 datacenter1 rack1 Up Normal 7.32 GB > 25.00% 127605887595351923798765477786913079296 > > Schema for the relevant bits of the keyspace is as follows: > > [default@A] show schema; > create keyspace A > with placement_strategy = 'SimpleStrategy' > and strategy_options = [{replication_factor : 3}]; > [...] > create column family UserDetails > with column_type = 'Standard' > and comparator = 'IntegerType' > and default_validation_class = 'BytesType' > and key_validation_class = 'BytesType' > and memtable_operations = 0.571875 > and memtable_throughput = 122 > and memtable_flush_after = 1440 > and rows_cached = 0.0 > and row_cache_save_period = 0 > and keys_cached = 200000.0 > and key_cache_save_period = 14400 > and read_repair_chance = 1.0 > and gc_grace = 864000 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'; > > And now the symptoms - output of 'nodetool -h localhost cfstats' on each > node. Please note the figures on node1. > > *node1* > Column Family: UserDetails > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Number of Keys (estimate): 0 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > *node2* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112952788 > Space used (total): 164953743 > Number of Keys (estimate): 384 > Memtable Columns Count: 159419 > Memtable Data Size: 74910890 > Memtable Switch Count: 59 > Read Count: 135307426 > Read Latency: 25.900 ms. > Write Count: 3474673 > Write Latency: 0.040 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 120 > Key cache hit rate: 0.999971684189041 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node3* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112953137 > Space used (total): 112953137 > Number of Keys (estimate): 384 > Memtable Columns Count: 159421 > Memtable Data Size: 74693445 > Memtable Switch Count: 56 > Read Count: 135304486 > Read Latency: 25.552 ms. > Write Count: 3474616 > Write Latency: 0.036 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 109 > Key cache hit rate: 0.9999716840888175 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node4* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 117070926 > Space used (total): 119479484 > Number of Keys (estimate): 384 > Memtable Columns Count: 159979 > Memtable Data Size: 75029672 > Memtable Switch Count: 60 > Read Count: 135294878 > Read Latency: 19.455 ms. > Write Count: 3474982 > Write Latency: 0.028 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 119 > Key cache hit rate: 0.9999752235777154 > Row cache: disabled > Compacted row minimum size: 2346800 > Compacted row maximum size: 62479625 > Compacted row mean size: 42591803 > > When I go to 'data' directory on node1 there is no files regarding the > UserDetails ColumnFamily. > > I tried performing manual repair in hope it will heal the situation, > however without any luck. > > # nodetool -h localhost repair A UserDetails > INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges. > INFO 15:19:54,647 Sending AEService tree for #<TreeRequest > manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, > (A,UserDetails), > (85070591730234615865843651857942052864,127605887595351923798765477786913079296]> > INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are > consistent for UserDetails on > (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are > consistent for UserDetails on > (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > INFO 15:19:54,751 Repair session > manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec (on cfs > [Ljava.lang.String;@3491507b, range > (85070591730234615865843651857942052864,127605887595351923798765477786913079296]) > completed successfully > INFO 15:19:54,816 Sending AEService tree for #<TreeRequest > manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, > (A,UserDetails), > (42535295865117307932921825928971026432,85070591730234615865843651857942052864]> > INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are > consistent for UserDetails on > (42535295865117307932921825928971026432,85070591730234615865843651857942052864] > INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are > consistent for UserDetails on > (42535295865117307932921825928971026432,85070591730234615865843651857942052864] > INFO 15:19:54,874 Repair session > manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd (on cfs > [Ljava.lang.String;@7e541d08, range > (42535295865117307932921825928971026432,85070591730234615865843651857942052864]) > completed successfully > INFO 15:19:54,909 Sending AEService tree for #<TreeRequest > manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, > (A,UserDetails), (127605887595351923798765477786913079296,0]> > INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are > consistent for UserDetails on (127605887595351923798765477786913079296,0] > INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are > consistent for UserDetails on (127605887595351923798765477786913079296,0] > INFO 15:19:54,975 Repair session > manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243 (on cfs > [Ljava.lang.String;@48c651f2, range > (127605887595351923798765477786913079296,0]) completed successfully > INFO 15:19:54,975 Repair command #8 completed successfully > > As I am using SimpleStrategy I would expect the keys to be split, more or > less, equally across the nodes, however this don't seem to be the case. > > Has anyone came across similar behaviour before? Has anyone have any > suggestions what I could do to bring some data into node1? Obviously, this > kind of data split means node2, node3 and node4 need to do all the read > work which is not ideal. > > Any suggestions much appreciated. > > Kind regards, > Bart > >