Re: Bootstrapping taking long

Ran Tavory Wed, 05 Jan 2011 05:51:37 -0800
I haven't tried repair.  Should I?
On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote:
> Have you tried not bootstrapping but setting the token and manually
calling
> repair?
>
> On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> wrote:
>
>> My conclusion is lame: I tried this on several hosts and saw the same
>> behavior, the only way I was able to join new nodes was to first start
them
>> when they are *not in* their own seeds list and after they
>> finish transferring the data, then restart them with themselves *in*
their
>> own seeds list. After doing that the node would join the ring.
>> This is either my misunderstanding or a bug, but the only place I found
it
>> documented stated that the new node should not be in its own seeds list.
>> Version 0.6.6.
>>
>> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn <da...@lookin2.com
>wrote:
>>
>>> My nodes all have themselves in their list of seeds - always did - and
>>> everything works. (You may ask why I did this. I don't know, I must have
>>> copied it from an example somewhere.)
>>>
>>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> wrote:
>>>
>>>> I was able to make the node join the ring but I'm confused.
>>>> What I did is, first when adding the node, this node was not in the
seeds
>>>> list of itself. AFAIK this is how it's supposed to be. So it was able
to
>>>> transfer all data to itself from other nodes but then it stayed in the
>>>> bootstrapping state.
>>>> So what I did (and I don't know why it works), is add this node to the
>>>> seeds list in its own storage-conf.xml file. Then restart the server
and
>>>> then I finally see it in the ring...
>>>> If I had added the node to the seeds list of itself when first joining
>>>> it, it would not join the ring but if I do it in two phases it did
work.
>>>> So it's either my misunderstanding or a bug...
>>>>
>>>>
>>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> wrote:
>>>>
>>>>> The new node does not see itself as part of the ring, it sees all
others
>>>>> but itself, so from that perspective the view is consistent.
>>>>> The only problem is that the node never finishes to bootstrap. It
stays
>>>>> in this state for hours (It's been 20 hours now...)
>>>>>
>>>>>
>>>>> $ bin/nodetool -p 9004 -h localhost streams
>>>>>> Mode: Bootstrapping
>>>>>> Not sending any streams.
>>>>>> Not receiving any streams.
>>>>>
>>>>>
>>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com> wrote:
>>>>>
>>>>>> Does the new node have itself in the list of seeds per chance? This
>>>>>> could cause some issues if so.
>>>>>>
>>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com> wrote:
>>>>>> > I'm still at lost. I haven't been able to resolve this. I tried
>>>>>> > adding another node at a different location on the ring but this
node
>>>>>> > too remains stuck in the bootstrapping state for many hours without
>>>>>> > any of the other nodes being busy with anti compaction or anything
>>>>>> > else. I don't know what's keeping it from finishing the
bootstrap,no
>>>>>> > CPU, no io, files were already streamed so what is it waiting for?
>>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
to
>>>>>> > be anything addressing a similar issue so I figured there was no
>>>>>> point
>>>>>> > in upgrading. But let me know if you think there is.
>>>>>> > Or any other advice...
>>>>>> >
>>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com> wrote:
>>>>>> >> Thanks Jake, but unfortunately the streams directory is empty so I
>>>>>> don't think that any of the nodes is anti-compacting data right now
or had
>>>>>> been in the past 5 hours. It seems that all the data was already
transferred
>>>>>> to the joining host but the joining node, after having received the
data
>>>>>> would still remain in bootstrapping mode and not join the cluster.
I'm not
>>>>>> sure that *all* data was transferred (perhaps other nodes need to
transfer
>>>>>> more data) but nothing is actually happening so I assume all has been
moved.
>>>>>> >> Perhaps it's a configuration error from my part. Should I use I
use
>>>>>> AutoBootstrap=true ? Anything else I should look out for in the
>>>>>> configuration file or something else?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jak...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> In 0.6, locate the node doing anti-compaction and look in the
>>>>>> "streams" subdirectory in the keyspace data dir to monitor the
>>>>>> anti-compaction progress (it puts new SSTables for bootstrapping node
in
>>>>>> there)
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >> Running nodetool decommission didn't help. Actually the node
refused
>>>>>> to decommission itself (b/c it wasn't part of the ring). So I simply
stopped
>>>>>> the process, deleted all the data directories and started it again.
It
>>>>>> worked in the sense of the node bootstrapped again but as before,
after it
>>>>>> had finished moving the data nothing happened for a long time (I'm
still
>>>>>> waiting, but nothing seems to be happening).
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com>
>>>>>> wrote:
>>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the other
>>>>>> nodes from the same DC but to my understanding it has already ended.
A few
>>>>>> hour ago...
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> I plenty of log messages such as [1] which ended a couple of hours
>>>>>> ago, and I've seen the new node streaming and accepting the data from
the
>>>>>> node which performed the anticompaction and so far it was normal so
it
>>>>>> seemed that data is at its right place. But now the new node seems
sort of
>>>>>> stuck. None of the other nodes is anticompacting right now or had
been
>>>>>> anticompacting since then.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> The new node's CPU is close to zero, it's iostats are almost zero
so
>>>>>> I can't find another bottleneck that would keep it hanging.
>>>>>> >> On the IRC someone suggested I'd maybe retry to join this node,
>>>>>> e.g. decommission and rejoin it again. I'll try it now...
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
>>>>>> CompactionManager.java (line 338) AntiCompacting
>>>>>>
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
>>>>>> CompactionManager.java (line 338) AntiCompacting
>>>>>>
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132
>>>>>> CompactionManager.java (line 338) AntiCompacting
>>>>>>
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486
>>>>>> CompactionManager.java (line 338) AntiCompacting
>>>>>>
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> In my experience most of the time it takes for a node to join the
>>>>>> cluster is the anticompaction on the other nodes. The streaming part
is very
>>>>>> fast.
>>>>>> >> Check the other nodes logs to see if there is any node doing
>>>>>> anticompaction.I don't remember how much data I had in the cluster
when I
>>>>>> needed to add/remove nodes. I do remember that it took a few hours.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> The node will join the ring only when it will finish the
bootstrap.
>>>>>> >> --
>>>>>> >> /Ran
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> > --
>>>>>> > /Ran
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> /Ran
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> /Ran
>>>>
>>>>
>>>
>>
>>
>> --
>> /Ran
>>
>>
Re: Bootstrapping taking long

Reply via email to