getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this...
On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > Then the next step is to check StorageService.getRangeToEndpointMap via jmx > > On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <ran...@gmail.com> wrote: > > I'm using RackAwareStrategy. But it still doesn't make sense I think... > > let's see what did I miss... > > According to http://wiki.apache.org/cassandra/Operations > > > > RackAwareStrategy: replica 2 is placed in the first node along the ring > the > > belongs in another data center than the first; the remaining N-2 > replicas, > > if any, are placed on the first nodes along the ring in the same rack as > the > > first > > > > 192.168.252.124Up 803.33 MB > > 56713727820156410577229101238628035242 |<--| > > 192.168.252.99Up 352.85 MB > > 56713727820156410577229101238628035243 | ^ > > 192.168.252.125Up 134.24 MB > > 85070591730234615865843651857942052863 v | > > 192.168.254.57Up 676.41 MB > > 113427455640312821154458202477256070485 | ^ > > 192.168.254.58Up 99.74 MB > > 141784319550391026443072753096570088106 v | > > 192.168.254.59Up 99.94 MB > > 170141183460469231731687303715884105727 |-->| > > Alright, so I made a mistake and didn't use the alternate-datacenter > > suggestion on the page so the first node of every DC is overloaded with > > replicas. However, the current situation still doesn't make sense to me. > > .252.124 will be overloaded b/c it has the first token in the 252 dc. > > .254.57 will also be overloaded since it has the first token in the .254 > DC. > > But for which node does 252.99 serve as a replicator? It's not the first > in > > the DC and it's just one single token more than it's predecessor (which > is > > in the same DC). > > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> I'm saying that .99 is getting a copy of all the data for which .124 > >> is the primary. (If you are using RackUnawarePartitioner. If you are > >> using RackAware it is some other node.) > >> > >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ran...@gmail.com> wrote: > >> > ok, let me try and translate your answer ;) > >> > Are you saying that the data that was left on the node is > >> > non-primary-replicas of rows from the time before the move? > >> > So this implies that when a node moves in the ring, it will affect > >> > distribution of: > >> > - new keys > >> > - old keys primary node > >> > -- but will not affect distribution of old keys non-primary replicas. > >> > If so, still I don't understand something... I would expect even the > >> > non-primary replicas of keys to be moved since if they don't, how > would > >> > they > >> > be found? I mean upon reads the serving node should not care about > >> > whether > >> > the row is new or old, it should have a consistent and global mapping > of > >> > tokens. So I guess this ruins my theory... > >> > What did you mean then? Is this deletions of non-primary replicated > >> > data? > >> > How does the replication factor affect the load on the moved host > then? > >> > > >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jbel...@gmail.com> > >> > wrote: > >> >> > >> >> well, there you are then. > >> >> > >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ran...@gmail.com> > wrote: > >> >> > yes, replication factor = 2 > >> >> > > >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis < > jbel...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> you have replication factor > 1 ? > >> >> >> > >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ran...@gmail.com> > >> >> >> wrote: > >> >> >> > I hope I understand nodetool cleanup correctly - it should clean > >> >> >> > up > >> >> >> > all > >> >> >> > data > >> >> >> > that does not (currently) belong to this node. If so, I think it > >> >> >> > might > >> >> >> > not > >> >> >> > be working correctly. > >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below > >> >> >> > 192.168.252.99Up 279.35 MB > >> >> >> > 3544607988759775661076818827414252202 > >> >> >> > |<--| > >> >> >> > 192.168.252.124Up 167.23 MB > >> >> >> > 56713727820156410577229101238628035242 | ^ > >> >> >> > 192.168.252.125Up 82.91 MB > >> >> >> > 85070591730234615865843651857942052863 v | > >> >> >> > 192.168.254.57Up 366.6 MB > >> >> >> > 113427455640312821154458202477256070485 | ^ > >> >> >> > 192.168.254.58Up 88.44 MB > >> >> >> > 141784319550391026443072753096570088106 v | > >> >> >> > 192.168.254.59Up 88.45 MB > >> >> >> > 170141183460469231731687303715884105727 |-->| > >> >> >> > I wanted 124 to take all the load from 99. So I issued a move > >> >> >> > command. > >> >> >> > $ nodetool -h cass99 -p 9004 move > >> >> >> > 56713727820156410577229101238628035243 > >> >> >> > > >> >> >> > This command tells 99 to take the space b/w > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > (56713727820156410577229101238628035242, > 56713727820156410577229101238628035243] > >> >> >> > which is basically just one item in the token space, almost > >> >> >> > nothing... I > >> >> >> > wanted it to be very slim (just playing around). > >> >> >> > So, next I get this: > >> >> >> > 192.168.252.124Up 803.33 MB > >> >> >> > 56713727820156410577229101238628035242 |<--| > >> >> >> > 192.168.252.99Up 352.85 MB > >> >> >> > 56713727820156410577229101238628035243 | ^ > >> >> >> > 192.168.252.125Up 134.24 MB > >> >> >> > 85070591730234615865843651857942052863 v | > >> >> >> > 192.168.254.57Up 676.41 MB > >> >> >> > 113427455640312821154458202477256070485 | ^ > >> >> >> > 192.168.254.58Up 99.74 MB > >> >> >> > 141784319550391026443072753096570088106 v | > >> >> >> > 192.168.254.59Up 99.94 MB > >> >> >> > 170141183460469231731687303715884105727 |-->| > >> >> >> > The tokens are correct, but it seems that 99 still has a lot of > >> >> >> > data. > >> >> >> > Why? > >> >> >> > OK, that might be b/c it didn't delete its moved data. > >> >> >> > So next I issued a nodetool cleanup, which should have taken > care > >> >> >> > of > >> >> >> > that. > >> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why? > >> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't > >> >> >> > cleaned > >> >> >> > up. > >> >> >> > I restarted the server. Still, data wasn't cleaned up... I > issued > >> >> >> > a > >> >> >> > cleanup > >> >> >> > again... still no good... what's up with this node? > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Jonathan Ellis > >> >> >> Project Chair, Apache Cassandra > >> >> >> co-founder of Riptano, the source for professional Cassandra > support > >> >> >> http://riptano.com > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Jonathan Ellis > >> >> Project Chair, Apache Cassandra > >> >> co-founder of Riptano, the source for professional Cassandra support > >> >> http://riptano.com > >> > > >> > > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of Riptano, the source for professional Cassandra support > >> http://riptano.com > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >