Nick, thanks for the response. Does cleanup only cleanup keys that no
longer belong to that node. Just to add more color, when I bulk loaded all
my data into these 6 nodes, all of them had the same amount of data. After
the first nodetool repair, the first node started having more data than the
rest of the cluster. And since then it has never come back down. When I run
cfstats on the node, the amount of data for every column family is almost 2
times the the amount of data for other. This is true for the number of keys
estimate as well. For 1 CF I see more than double the number of keys and
that's the largest cf as well with 34 GB data.

Thanks
-Rajesh

On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey <n...@datastax.com> wrote:

> No. Cleanup will scan each sstable to remove data that is no longer
> owned by that specific node. It won't compact the sstables together
> however.
>
> On Tue, Jun 19, 2012 at 11:11 PM, Raj N <raj.cassan...@gmail.com> wrote:
> > But wont that also run a major compaction which is not recommended
> anymore.
> >
> > -Raj
> >
> >
> > On Sun, Jun 17, 2012 at 11:58 PM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> Assuming you have been running repair, it' can't hurt.
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 17/06/2012, at 4:06 AM, Raj N wrote:
> >>
> >> Nick, do you think I should still run cleanup on the first node.
> >>
> >> -Rajesh
> >>
> >> On Fri, Jun 15, 2012 at 3:47 PM, Raj N <raj.cassan...@gmail.com> wrote:
> >>>
> >>> I did run nodetool move. But that was when I was setting up the cluster
> >>> which means I didn't have any data at that time.
> >>>
> >>> -Raj
> >>>
> >>>
> >>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey <n...@datastax.com>
> wrote:
> >>>>
> >>>> Did you start all your nodes at the correct tokens or did you balance
> >>>> by moving them? Moving nodes around won't delete unneeded data after
> >>>> the move is done.
> >>>>
> >>>> Try running 'nodetool cleanup' on all of your nodes.
> >>>>
> >>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N <raj.cassan...@gmail.com>
> wrote:
> >>>> > Actually I am not worried about the percentage. Its the data I am
> >>>> > concerned
> >>>> > about. Look at the first node. It has 102.07GB data. And the other
> >>>> > nodes
> >>>> > have around 60 GB(one has 69, but lets ignore that one). I am not
> >>>> > understanding why the first node has almost double the data.
> >>>> >
> >>>> > Thanks
> >>>> > -Raj
> >>>> >
> >>>> >
> >>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey <n...@datastax.com>
> >>>> > wrote:
> >>>> >>
> >>>> >> This is just a known problem with the nodetool output and multiple
> >>>> >> DCs. Your configuration is correct. The problem with nodetool is
> >>>> >> fixed
> >>>> >> in 1.1.1
> >>>> >>
> >>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412
> >>>> >>
> >>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N <raj.cassan...@gmail.com>
> >>>> >> wrote:
> >>>> >> > Hi experts,
> >>>> >> >     I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have
> >>>> >> > assigned
> >>>> >> > tokens using the first strategy(adding 1) mentioned here -
> >>>> >> >
> >>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection
> >>>> >> >
> >>>> >> > But when I run nodetool ring on my cluster, this is the result I
> >>>> >> > get -
> >>>> >> >
> >>>> >> > Address         DC  Rack  Status State   Load        Owns
>  Token
> >>>> >> >
> >>>> >> >  113427455640312814857969558651062452225
> >>>> >> > 172.17.72.91    DC1 RAC13 Up     Normal  102.07 GB   33.33%  0
> >>>> >> > 45.10.80.144    DC2 RAC5  Up     Normal  59.1 GB     0.00%   1
> >>>> >> > 172.17.72.93    DC1 RAC18 Up     Normal  59.57 GB    33.33%
> >>>> >> >  56713727820156407428984779325531226112
> >>>> >> > 45.10.80.146    DC2 RAC7  Up     Normal  59.64 GB    0.00%
> >>>> >> > 56713727820156407428984779325531226113
> >>>> >> > 172.17.72.95    DC1 RAC19 Up     Normal  69.58 GB    33.33%
> >>>> >> >  113427455640312814857969558651062452224
> >>>> >> > 45.10.80.148    DC2 RAC9  Up     Normal  59.31 GB    0.00%
> >>>> >> > 113427455640312814857969558651062452225
> >>>> >> >
> >>>> >> >
> >>>> >> > As you can see the first node has considerably more load than the
> >>>> >> > others(almost double) which is surprising since all these are
> >>>> >> > replicas
> >>>> >> > of
> >>>> >> > each other. I am running Cassandra 0.8.4. Is there an explanation
> >>>> >> > for
> >>>> >> > this
> >>>> >> > behaviour?
> >>>> >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be
> >>>> >> > the
> >>>> >> > cause for this?
> >>>> >> >
> >>>> >> > Thanks
> >>>> >> > -Raj
> >>>> >
> >>>> >
> >>>
> >>>
> >>
> >>
> >
>

Reply via email to