Nick, thanks for the response. Does cleanup only cleanup keys that no longer belong to that node. Just to add more color, when I bulk loaded all my data into these 6 nodes, all of them had the same amount of data. After the first nodetool repair, the first node started having more data than the rest of the cluster. And since then it has never come back down. When I run cfstats on the node, the amount of data for every column family is almost 2 times the the amount of data for other. This is true for the number of keys estimate as well. For 1 CF I see more than double the number of keys and that's the largest cf as well with 34 GB data.
Thanks -Rajesh On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey <n...@datastax.com> wrote: > No. Cleanup will scan each sstable to remove data that is no longer > owned by that specific node. It won't compact the sstables together > however. > > On Tue, Jun 19, 2012 at 11:11 PM, Raj N <raj.cassan...@gmail.com> wrote: > > But wont that also run a major compaction which is not recommended > anymore. > > > > -Raj > > > > > > On Sun, Jun 17, 2012 at 11:58 PM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> Assuming you have been running repair, it' can't hurt. > >> > >> Cheers > >> > >> ----------------- > >> Aaron Morton > >> Freelance Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> > >> On 17/06/2012, at 4:06 AM, Raj N wrote: > >> > >> Nick, do you think I should still run cleanup on the first node. > >> > >> -Rajesh > >> > >> On Fri, Jun 15, 2012 at 3:47 PM, Raj N <raj.cassan...@gmail.com> wrote: > >>> > >>> I did run nodetool move. But that was when I was setting up the cluster > >>> which means I didn't have any data at that time. > >>> > >>> -Raj > >>> > >>> > >>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey <n...@datastax.com> > wrote: > >>>> > >>>> Did you start all your nodes at the correct tokens or did you balance > >>>> by moving them? Moving nodes around won't delete unneeded data after > >>>> the move is done. > >>>> > >>>> Try running 'nodetool cleanup' on all of your nodes. > >>>> > >>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N <raj.cassan...@gmail.com> > wrote: > >>>> > Actually I am not worried about the percentage. Its the data I am > >>>> > concerned > >>>> > about. Look at the first node. It has 102.07GB data. And the other > >>>> > nodes > >>>> > have around 60 GB(one has 69, but lets ignore that one). I am not > >>>> > understanding why the first node has almost double the data. > >>>> > > >>>> > Thanks > >>>> > -Raj > >>>> > > >>>> > > >>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey <n...@datastax.com> > >>>> > wrote: > >>>> >> > >>>> >> This is just a known problem with the nodetool output and multiple > >>>> >> DCs. Your configuration is correct. The problem with nodetool is > >>>> >> fixed > >>>> >> in 1.1.1 > >>>> >> > >>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412 > >>>> >> > >>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N <raj.cassan...@gmail.com> > >>>> >> wrote: > >>>> >> > Hi experts, > >>>> >> > I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have > >>>> >> > assigned > >>>> >> > tokens using the first strategy(adding 1) mentioned here - > >>>> >> > > >>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection > >>>> >> > > >>>> >> > But when I run nodetool ring on my cluster, this is the result I > >>>> >> > get - > >>>> >> > > >>>> >> > Address DC Rack Status State Load Owns > Token > >>>> >> > > >>>> >> > 113427455640312814857969558651062452225 > >>>> >> > 172.17.72.91 DC1 RAC13 Up Normal 102.07 GB 33.33% 0 > >>>> >> > 45.10.80.144 DC2 RAC5 Up Normal 59.1 GB 0.00% 1 > >>>> >> > 172.17.72.93 DC1 RAC18 Up Normal 59.57 GB 33.33% > >>>> >> > 56713727820156407428984779325531226112 > >>>> >> > 45.10.80.146 DC2 RAC7 Up Normal 59.64 GB 0.00% > >>>> >> > 56713727820156407428984779325531226113 > >>>> >> > 172.17.72.95 DC1 RAC19 Up Normal 69.58 GB 33.33% > >>>> >> > 113427455640312814857969558651062452224 > >>>> >> > 45.10.80.148 DC2 RAC9 Up Normal 59.31 GB 0.00% > >>>> >> > 113427455640312814857969558651062452225 > >>>> >> > > >>>> >> > > >>>> >> > As you can see the first node has considerably more load than the > >>>> >> > others(almost double) which is surprising since all these are > >>>> >> > replicas > >>>> >> > of > >>>> >> > each other. I am running Cassandra 0.8.4. Is there an explanation > >>>> >> > for > >>>> >> > this > >>>> >> > behaviour? > >>>> >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be > >>>> >> > the > >>>> >> > cause for this? > >>>> >> > > >>>> >> > Thanks > >>>> >> > -Raj > >>>> > > >>>> > > >>> > >>> > >> > >> > > >