Re: nodetool cleanup isn't cleaning up?

Ran Tavory Wed, 02 Jun 2010 23:34:57 -0700

getRangeToEndpointMap is very useful, thanks, I didn't know about it...
however, I've reconfigured my cluster since (moved some nodes and tokens) so
not the problem is gone. I guess I'll use getRangeToEndpointMap next time I
see something like this...


On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Then the next step is to check StorageService.getRangeToEndpointMap via jmx
>
> On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <ran...@gmail.com> wrote:
> > I'm using RackAwareStrategy. But it still doesn't make sense I think...
> > let's see what did I miss...
> > According to http://wiki.apache.org/cassandra/Operations
> >
> > RackAwareStrategy: replica 2 is placed in the first node along the ring
> the
> > belongs in another data center than the first; the remaining N-2
> replicas,
> > if any, are placed on the first nodes along the ring in the same rack as
> the
> > first
> >
> > 192.168.252.124Up        803.33 MB
> > 56713727820156410577229101238628035242     |<--|
> > 192.168.252.99Up         352.85 MB
> > 56713727820156410577229101238628035243     |   ^
> > 192.168.252.125Up        134.24 MB
> > 85070591730234615865843651857942052863     v   |
> > 192.168.254.57Up         676.41 MB
> >  113427455640312821154458202477256070485    |   ^
> > 192.168.254.58Up          99.74 MB
> >  141784319550391026443072753096570088106    v   |
> > 192.168.254.59Up          99.94 MB
> >  170141183460469231731687303715884105727    |-->|
> > Alright, so I made a mistake and didn't use the alternate-datacenter
> > suggestion on the page so the first node of every DC is overloaded with
> > replicas. However,  the current situation still doesn't make sense to me.
> > .252.124 will be overloaded b/c it has the first token in the 252 dc.
> > .254.57 will also be overloaded since it has the first token in the .254
> DC.
> > But for which node does 252.99 serve as a replicator? It's not the first
> in
> > the DC and it's just one single token more than it's predecessor (which
> is
> > in the same DC).
> > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >>
> >> I'm saying that .99 is getting a copy of all the data for which .124
> >> is the primary.  (If you are using RackUnawarePartitioner.  If you are
> >> using RackAware it is some other node.)
> >>
> >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ran...@gmail.com> wrote:
> >> > ok, let me try and translate your answer ;)
> >> > Are you saying that the data that was left on the node is
> >> > non-primary-replicas of rows from the time before the move?
> >> > So this implies that when a node moves in the ring, it will affect
> >> > distribution of:
> >> > - new keys
> >> > - old keys primary node
> >> > -- but will not affect distribution of old keys non-primary replicas.
> >> > If so, still I don't understand something... I would expect even the
> >> > non-primary replicas of keys to be moved since if they don't, how
> would
> >> > they
> >> > be found? I mean upon reads the serving node should not care about
> >> > whether
> >> > the row is new or old, it should have a consistent and global mapping
> of
> >> > tokens. So I guess this ruins my theory...
> >> > What did you mean then? Is this deletions of non-primary replicated
> >> > data?
> >> > How does the replication factor affect the load on the moved host
> then?
> >> >
> >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jbel...@gmail.com>
> >> > wrote:
> >> >>
> >> >> well, there you are then.
> >> >>
> >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ran...@gmail.com>
> wrote:
> >> >> > yes, replication factor = 2
> >> >> >
> >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <
> jbel...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> you have replication factor > 1 ?
> >> >> >>
> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ran...@gmail.com>
> >> >> >> wrote:
> >> >> >> > I hope I understand nodetool cleanup correctly - it should clean
> >> >> >> > up
> >> >> >> > all
> >> >> >> > data
> >> >> >> > that does not (currently) belong to this node. If so, I think it
> >> >> >> > might
> >> >> >> > not
> >> >> >> > be working correctly.
> >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> >> >> >> > 192.168.252.99Up         279.35 MB
> >> >> >> > 3544607988759775661076818827414252202
> >> >> >> >      |<--|
> >> >> >> > 192.168.252.124Up         167.23 MB
> >> >> >> > 56713727820156410577229101238628035242     |   ^
> >> >> >> > 192.168.252.125Up         82.91 MB
> >> >> >> >  85070591730234615865843651857942052863     v   |
> >> >> >> > 192.168.254.57Up         366.6 MB
> >> >> >> >  113427455640312821154458202477256070485    |   ^
> >> >> >> > 192.168.254.58Up         88.44 MB
> >> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> >> > 192.168.254.59Up         88.45 MB
> >> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> >> > I wanted 124 to take all the load from 99. So I issued a move
> >> >> >> > command.
> >> >> >> > $ nodetool -h cass99 -p 9004 move
> >> >> >> > 56713727820156410577229101238628035243
> >> >> >> >
> >> >> >> > This command tells 99 to take the space b/w
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> (56713727820156410577229101238628035242, 
> 56713727820156410577229101238628035243]
> >> >> >> > which is basically just one item in the token space, almost
> >> >> >> > nothing... I
> >> >> >> > wanted it to be very slim (just playing around).
> >> >> >> > So, next I get this:
> >> >> >> > 192.168.252.124Up         803.33 MB
> >> >> >> > 56713727820156410577229101238628035242     |<--|
> >> >> >> > 192.168.252.99Up         352.85 MB
> >> >> >> > 56713727820156410577229101238628035243     |   ^
> >> >> >> > 192.168.252.125Up         134.24 MB
> >> >> >> > 85070591730234615865843651857942052863     v   |
> >> >> >> > 192.168.254.57Up         676.41 MB
> >> >> >> > 113427455640312821154458202477256070485    |   ^
> >> >> >> > 192.168.254.58Up         99.74 MB
> >> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> >> > 192.168.254.59Up         99.94 MB
> >> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> >> > The tokens are correct, but it seems that 99 still has a lot of
> >> >> >> > data.
> >> >> >> > Why?
> >> >> >> > OK, that might be b/c it didn't delete its moved data.
> >> >> >> > So next I issued a nodetool cleanup, which should have taken
> care
> >> >> >> > of
> >> >> >> > that.
> >> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
> >> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't
> >> >> >> > cleaned
> >> >> >> > up.
> >> >> >> > I restarted the server. Still, data wasn't cleaned up... I
> issued
> >> >> >> > a
> >> >> >> > cleanup
> >> >> >> > again... still no good... what's up with this node?
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jonathan Ellis
> >> >> >> Project Chair, Apache Cassandra
> >> >> >> co-founder of Riptano, the source for professional Cassandra
> support
> >> >> >> http://riptano.com
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jonathan Ellis
> >> >> Project Chair, Apache Cassandra
> >> >> co-founder of Riptano, the source for professional Cassandra support
> >> >> http://riptano.com
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: nodetool cleanup isn't cleaning up?

Reply via email to