Re: Bootstrapping taking long

David Boxenhorn Wed, 05 Jan 2011 06:58:53 -0800

If "seed list should be the same across the cluster" that means that nodes
*should* have themselves as a seed. If that doesn't work for Ran, then that
is the first problem, no?



On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote:

> Well your ring issues don't make sense to me, seed list should be the same
> across the cluster.
> I'm just thinking of other things to try, non-boostrapped nodes should join
> the ring instantly but reads will fail if you aren't using quorum.
>
>
> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote:
>
>> I haven't tried repair.  Should I?
>> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote:
>> > Have you tried not bootstrapping but setting the token and manually
>> calling
>> > repair?
>> >
>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> wrote:
>> >
>> >> My conclusion is lame: I tried this on several hosts and saw the same
>> >> behavior, the only way I was able to join new nodes was to first start
>> them
>> >> when they are *not in* their own seeds list and after they
>> >> finish transferring the data, then restart them with themselves *in*
>> their
>> >> own seeds list. After doing that the node would join the ring.
>> >> This is either my misunderstanding or a bug, but the only place I found
>> it
>> >> documented stated that the new node should not be in its own seeds
>> list.
>> >> Version 0.6.6.
>> >>
>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn <da...@lookin2.com
>> >wrote:
>> >>
>> >>> My nodes all have themselves in their list of seeds - always did - and
>> >>> everything works. (You may ask why I did this. I don't know, I must
>> have
>> >>> copied it from an example somewhere.)
>> >>>
>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> wrote:
>> >>>
>> >>>> I was able to make the node join the ring but I'm confused.
>> >>>> What I did is, first when adding the node, this node was not in the
>> seeds
>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was able
>> to
>> >>>> transfer all data to itself from other nodes but then it stayed in
>> the
>> >>>> bootstrapping state.
>> >>>> So what I did (and I don't know why it works), is add this node to
>> the
>> >>>> seeds list in its own storage-conf.xml file. Then restart the server
>> and
>> >>>> then I finally see it in the ring...
>> >>>> If I had added the node to the seeds list of itself when first
>> joining
>> >>>> it, it would not join the ring but if I do it in two phases it did
>> work.
>> >>>> So it's either my misunderstanding or a bug...
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> wrote:
>> >>>>
>> >>>>> The new node does not see itself as part of the ring, it sees all
>> others
>> >>>>> but itself, so from that perspective the view is consistent.
>> >>>>> The only problem is that the node never finishes to bootstrap. It
>> stays
>> >>>>> in this state for hours (It's been 20 hours now...)
>> >>>>>
>> >>>>>
>> >>>>> $ bin/nodetool -p 9004 -h localhost streams
>> >>>>>> Mode: Bootstrapping
>> >>>>>> Not sending any streams.
>> >>>>>> Not receiving any streams.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com>
>> wrote:
>> >>>>>
>> >>>>>> Does the new node have itself in the list of seeds per chance? This
>> >>>>>> could cause some issues if so.
>> >>>>>>
>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com>
>> wrote:
>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I tried
>> >>>>>> > adding another node at a different location on the ring but this
>> node
>> >>>>>> > too remains stuck in the bootstrapping state for many hours
>> without
>> >>>>>> > any of the other nodes being busy with anti compaction or
>> anything
>> >>>>>> > else. I don't know what's keeping it from finishing the
>> bootstrap,no
>> >>>>>> > CPU, no io, files were already streamed so what is it waiting
>> for?
>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem
>> to
>> >>>>>> > be anything addressing a similar issue so I figured there was no
>> >>>>>> point
>> >>>>>> > in upgrading. But let me know if you think there is.
>> >>>>>> > Or any other advice...
>> >>>>>> >
>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com>
>> wrote:
>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is empty so
>> I
>> >>>>>> don't think that any of the nodes is anti-compacting data right now
>> or had
>> >>>>>> been in the past 5 hours. It seems that all the data was already
>> transferred
>> >>>>>> to the joining host but the joining node, after having received the
>> data
>> >>>>>> would still remain in bootstrapping mode and not join the cluster.
>> I'm not
>> >>>>>> sure that *all* data was transferred (perhaps other nodes need to
>> transfer
>> >>>>>> more data) but nothing is actually happening so I assume all has
>> been moved.
>> >>>>>> >> Perhaps it's a configuration error from my part. Should I use I
>> use
>> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the
>> >>>>>> configuration file or something else?
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jak...@gmail.com>
>> >>>>>> wrote:
>> >>>>>> >>
>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in the
>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the
>> >>>>>> anti-compaction progress (it puts new SSTables for bootstrapping
>> node in
>> >>>>>> there)
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com>
>> >>>>>> wrote:
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> Running nodetool decommission didn't help. Actually the node
>> refused
>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I
>> simply stopped
>> >>>>>> the process, deleted all the data directories and started it again.
>> It
>> >>>>>> worked in the sense of the node bootstrapped again but as before,
>> after it
>> >>>>>> had finished moving the data nothing happened for a long time (I'm
>> still
>> >>>>>> waiting, but nothing seems to be happening).
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks
>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com>
>> >>>>>> wrote:
>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the
>> other
>> >>>>>> nodes from the same DC but to my understanding it has already
>> ended. A few
>> >>>>>> hour ago...
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of
>> hours
>> >>>>>> ago, and I've seen the new node streaming and accepting the data
>> from the
>> >>>>>> node which performed the anticompaction and so far it was normal so
>> it
>> >>>>>> seemed that data is at its right place. But now the new node seems
>> sort of
>> >>>>>> stuck. None of the other nodes is anticompacting right now or had
>> been
>> >>>>>> anticompacting since then.
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost
>> zero so
>> >>>>>> I can't find another bottleneck that would keep it hanging.
>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this node,
>> >>>>>> e.g. decommission and rejoin it again. I'll try it now...
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
>> >>>>>> CompactionManager.java (line 338) AntiCompacting
>> >>>>>>
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683
>> >>>>>> CompactionManager.java (line 338) AntiCompacting
>> >>>>>>
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132
>> >>>>>> CompactionManager.java (line 338) AntiCompacting
>> >>>>>>
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486
>> >>>>>> CompactionManager.java (line 338) AntiCompacting
>> >>>>>>
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com>
>> wrote:
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> In my experience most of the time it takes for a node to join
>> the
>> >>>>>> cluster is the anticompaction on the other nodes. The streaming
>> part is very
>> >>>>>> fast.
>> >>>>>> >> Check the other nodes logs to see if there is any node doing
>> >>>>>> anticompaction.I don't remember how much data I had in the cluster
>> when I
>> >>>>>> needed to add/remove nodes. I do remember that it took a few hours.
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> The node will join the ring only when it will finish the
>> bootstrap.
>> >>>>>> >> --
>> >>>>>> >> /Ran
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> > /Ran
>> >>>>>> >
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> /Ran
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> /Ran
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >> --
>> >> /Ran
>> >>
>> >>
>>
>
>

Re: Bootstrapping taking long

Reply via email to