> Does cleanup only cleanup keys that no longer belong to that node. Yes.
I guess it could be an artefact of the bulk load. It's not been reported previously though. Try the cleanup and see how it goes. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/06/2012, at 1:34 AM, Raj N wrote: > Nick, thanks for the response. Does cleanup only cleanup keys that no longer > belong to that node. Just to add more color, when I bulk loaded all my data > into these 6 nodes, all of them had the same amount of data. After the first > nodetool repair, the first node started having more data than the rest of the > cluster. And since then it has never come back down. When I run cfstats on > the node, the amount of data for every column family is almost 2 times the > the amount of data for other. This is true for the number of keys estimate as > well. For 1 CF I see more than double the number of keys and that's the > largest cf as well with 34 GB data. > > Thanks > -Rajesh > > On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey <n...@datastax.com> wrote: > No. Cleanup will scan each sstable to remove data that is no longer > owned by that specific node. It won't compact the sstables together > however. > > On Tue, Jun 19, 2012 at 11:11 PM, Raj N <raj.cassan...@gmail.com> wrote: > > But wont that also run a major compaction which is not recommended anymore. > > > > -Raj > > > > > > On Sun, Jun 17, 2012 at 11:58 PM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> Assuming you have been running repair, it' can't hurt. > >> > >> Cheers > >> > >> ----------------- > >> Aaron Morton > >> Freelance Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> > >> On 17/06/2012, at 4:06 AM, Raj N wrote: > >> > >> Nick, do you think I should still run cleanup on the first node. > >> > >> -Rajesh > >> > >> On Fri, Jun 15, 2012 at 3:47 PM, Raj N <raj.cassan...@gmail.com> wrote: > >>> > >>> I did run nodetool move. But that was when I was setting up the cluster > >>> which means I didn't have any data at that time. > >>> > >>> -Raj > >>> > >>> > >>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey <n...@datastax.com> wrote: > >>>> > >>>> Did you start all your nodes at the correct tokens or did you balance > >>>> by moving them? Moving nodes around won't delete unneeded data after > >>>> the move is done. > >>>> > >>>> Try running 'nodetool cleanup' on all of your nodes. > >>>> > >>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N <raj.cassan...@gmail.com> wrote: > >>>> > Actually I am not worried about the percentage. Its the data I am > >>>> > concerned > >>>> > about. Look at the first node. It has 102.07GB data. And the other > >>>> > nodes > >>>> > have around 60 GB(one has 69, but lets ignore that one). I am not > >>>> > understanding why the first node has almost double the data. > >>>> > > >>>> > Thanks > >>>> > -Raj > >>>> > > >>>> > > >>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey <n...@datastax.com> > >>>> > wrote: > >>>> >> > >>>> >> This is just a known problem with the nodetool output and multiple > >>>> >> DCs. Your configuration is correct. The problem with nodetool is > >>>> >> fixed > >>>> >> in 1.1.1 > >>>> >> > >>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412 > >>>> >> > >>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N <raj.cassan...@gmail.com> > >>>> >> wrote: > >>>> >> > Hi experts, > >>>> >> > I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have > >>>> >> > assigned > >>>> >> > tokens using the first strategy(adding 1) mentioned here - > >>>> >> > > >>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection > >>>> >> > > >>>> >> > But when I run nodetool ring on my cluster, this is the result I > >>>> >> > get - > >>>> >> > > >>>> >> > Address DC Rack Status State Load Owns Token > >>>> >> > > >>>> >> > 113427455640312814857969558651062452225 > >>>> >> > 172.17.72.91 DC1 RAC13 Up Normal 102.07 GB 33.33% 0 > >>>> >> > 45.10.80.144 DC2 RAC5 Up Normal 59.1 GB 0.00% 1 > >>>> >> > 172.17.72.93 DC1 RAC18 Up Normal 59.57 GB 33.33% > >>>> >> > 56713727820156407428984779325531226112 > >>>> >> > 45.10.80.146 DC2 RAC7 Up Normal 59.64 GB 0.00% > >>>> >> > 56713727820156407428984779325531226113 > >>>> >> > 172.17.72.95 DC1 RAC19 Up Normal 69.58 GB 33.33% > >>>> >> > 113427455640312814857969558651062452224 > >>>> >> > 45.10.80.148 DC2 RAC9 Up Normal 59.31 GB 0.00% > >>>> >> > 113427455640312814857969558651062452225 > >>>> >> > > >>>> >> > > >>>> >> > As you can see the first node has considerably more load than the > >>>> >> > others(almost double) which is surprising since all these are > >>>> >> > replicas > >>>> >> > of > >>>> >> > each other. I am running Cassandra 0.8.4. Is there an explanation > >>>> >> > for > >>>> >> > this > >>>> >> > behaviour? > >>>> >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be > >>>> >> > the > >>>> >> > cause for this? > >>>> >> > > >>>> >> > Thanks > >>>> >> > -Raj > >>>> > > >>>> > > >>> > >>> > >> > >> > > >