>  Does cleanup only cleanup keys that no longer belong to that node. 
Yes.

I guess it could be an artefact of the bulk load. It's not been reported 
previously though. Try the cleanup and see how it goes. 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/06/2012, at 1:34 AM, Raj N wrote:

> Nick, thanks for the response. Does cleanup only cleanup keys that no longer 
> belong to that node. Just to add more color, when I bulk loaded all my data 
> into these 6 nodes, all of them had the same amount of data. After the first 
> nodetool repair, the first node started having more data than the rest of the 
> cluster. And since then it has never come back down. When I run cfstats on 
> the node, the amount of data for every column family is almost 2 times the 
> the amount of data for other. This is true for the number of keys estimate as 
> well. For 1 CF I see more than double the number of keys and that's the 
> largest cf as well with 34 GB data.
> 
> Thanks
> -Rajesh
> 
> On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey <n...@datastax.com> wrote:
> No. Cleanup will scan each sstable to remove data that is no longer
> owned by that specific node. It won't compact the sstables together
> however.
> 
> On Tue, Jun 19, 2012 at 11:11 PM, Raj N <raj.cassan...@gmail.com> wrote:
> > But wont that also run a major compaction which is not recommended anymore.
> >
> > -Raj
> >
> >
> > On Sun, Jun 17, 2012 at 11:58 PM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> Assuming you have been running repair, it' can't hurt.
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 17/06/2012, at 4:06 AM, Raj N wrote:
> >>
> >> Nick, do you think I should still run cleanup on the first node.
> >>
> >> -Rajesh
> >>
> >> On Fri, Jun 15, 2012 at 3:47 PM, Raj N <raj.cassan...@gmail.com> wrote:
> >>>
> >>> I did run nodetool move. But that was when I was setting up the cluster
> >>> which means I didn't have any data at that time.
> >>>
> >>> -Raj
> >>>
> >>>
> >>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey <n...@datastax.com> wrote:
> >>>>
> >>>> Did you start all your nodes at the correct tokens or did you balance
> >>>> by moving them? Moving nodes around won't delete unneeded data after
> >>>> the move is done.
> >>>>
> >>>> Try running 'nodetool cleanup' on all of your nodes.
> >>>>
> >>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N <raj.cassan...@gmail.com> wrote:
> >>>> > Actually I am not worried about the percentage. Its the data I am
> >>>> > concerned
> >>>> > about. Look at the first node. It has 102.07GB data. And the other
> >>>> > nodes
> >>>> > have around 60 GB(one has 69, but lets ignore that one). I am not
> >>>> > understanding why the first node has almost double the data.
> >>>> >
> >>>> > Thanks
> >>>> > -Raj
> >>>> >
> >>>> >
> >>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey <n...@datastax.com>
> >>>> > wrote:
> >>>> >>
> >>>> >> This is just a known problem with the nodetool output and multiple
> >>>> >> DCs. Your configuration is correct. The problem with nodetool is
> >>>> >> fixed
> >>>> >> in 1.1.1
> >>>> >>
> >>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412
> >>>> >>
> >>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N <raj.cassan...@gmail.com>
> >>>> >> wrote:
> >>>> >> > Hi experts,
> >>>> >> >     I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have
> >>>> >> > assigned
> >>>> >> > tokens using the first strategy(adding 1) mentioned here -
> >>>> >> >
> >>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection
> >>>> >> >
> >>>> >> > But when I run nodetool ring on my cluster, this is the result I
> >>>> >> > get -
> >>>> >> >
> >>>> >> > Address         DC  Rack  Status State   Load        Owns    Token
> >>>> >> >
> >>>> >> >  113427455640312814857969558651062452225
> >>>> >> > 172.17.72.91    DC1 RAC13 Up     Normal  102.07 GB   33.33%  0
> >>>> >> > 45.10.80.144    DC2 RAC5  Up     Normal  59.1 GB     0.00%   1
> >>>> >> > 172.17.72.93    DC1 RAC18 Up     Normal  59.57 GB    33.33%
> >>>> >> >  56713727820156407428984779325531226112
> >>>> >> > 45.10.80.146    DC2 RAC7  Up     Normal  59.64 GB    0.00%
> >>>> >> > 56713727820156407428984779325531226113
> >>>> >> > 172.17.72.95    DC1 RAC19 Up     Normal  69.58 GB    33.33%
> >>>> >> >  113427455640312814857969558651062452224
> >>>> >> > 45.10.80.148    DC2 RAC9  Up     Normal  59.31 GB    0.00%
> >>>> >> > 113427455640312814857969558651062452225
> >>>> >> >
> >>>> >> >
> >>>> >> > As you can see the first node has considerably more load than the
> >>>> >> > others(almost double) which is surprising since all these are
> >>>> >> > replicas
> >>>> >> > of
> >>>> >> > each other. I am running Cassandra 0.8.4. Is there an explanation
> >>>> >> > for
> >>>> >> > this
> >>>> >> > behaviour?
> >>>> >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be
> >>>> >> > the
> >>>> >> > cause for this?
> >>>> >> >
> >>>> >> > Thanks
> >>>> >> > -Raj
> >>>> >
> >>>> >
> >>>
> >>>
> >>
> >>
> >
> 

Reply via email to