In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like "I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true".
[1] <!-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) --> <AutoBootstrap>false</AutoBootstrap> On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn <da...@lookin2.com> wrote: > If "seed list should be the same across the cluster" that means that nodes > *should* have themselves as a seed. If that doesn't work for Ran, then that > is the first problem, no? > > > On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote: > >> Well your ring issues don't make sense to me, seed list should be the same >> across the cluster. >> I'm just thinking of other things to try, non-boostrapped nodes should >> join the ring instantly but reads will fail if you aren't using quorum. >> >> >> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote: >> >>> I haven't tried repair. Should I? >>> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote: >>> > Have you tried not bootstrapping but setting the token and manually >>> calling >>> > repair? >>> > >>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> wrote: >>> > >>> >> My conclusion is lame: I tried this on several hosts and saw the same >>> >> behavior, the only way I was able to join new nodes was to first start >>> them >>> >> when they are *not in* their own seeds list and after they >>> >> finish transferring the data, then restart them with themselves *in* >>> their >>> >> own seeds list. After doing that the node would join the ring. >>> >> This is either my misunderstanding or a bug, but the only place I >>> found it >>> >> documented stated that the new node should not be in its own seeds >>> list. >>> >> Version 0.6.6. >>> >> >>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn <da...@lookin2.com >>> >wrote: >>> >> >>> >>> My nodes all have themselves in their list of seeds - always did - >>> and >>> >>> everything works. (You may ask why I did this. I don't know, I must >>> have >>> >>> copied it from an example somewhere.) >>> >>> >>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> wrote: >>> >>> >>> >>>> I was able to make the node join the ring but I'm confused. >>> >>>> What I did is, first when adding the node, this node was not in the >>> seeds >>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was >>> able to >>> >>>> transfer all data to itself from other nodes but then it stayed in >>> the >>> >>>> bootstrapping state. >>> >>>> So what I did (and I don't know why it works), is add this node to >>> the >>> >>>> seeds list in its own storage-conf.xml file. Then restart the server >>> and >>> >>>> then I finally see it in the ring... >>> >>>> If I had added the node to the seeds list of itself when first >>> joining >>> >>>> it, it would not join the ring but if I do it in two phases it did >>> work. >>> >>>> So it's either my misunderstanding or a bug... >>> >>>> >>> >>>> >>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> >>> wrote: >>> >>>> >>> >>>>> The new node does not see itself as part of the ring, it sees all >>> others >>> >>>>> but itself, so from that perspective the view is consistent. >>> >>>>> The only problem is that the node never finishes to bootstrap. It >>> stays >>> >>>>> in this state for hours (It's been 20 hours now...) >>> >>>>> >>> >>>>> >>> >>>>> $ bin/nodetool -p 9004 -h localhost streams >>> >>>>>> Mode: Bootstrapping >>> >>>>>> Not sending any streams. >>> >>>>>> Not receiving any streams. >>> >>>>> >>> >>>>> >>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com> >>> wrote: >>> >>>>> >>> >>>>>> Does the new node have itself in the list of seeds per chance? >>> This >>> >>>>>> could cause some issues if so. >>> >>>>>> >>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com> >>> wrote: >>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I tried >>> >>>>>> > adding another node at a different location on the ring but this >>> node >>> >>>>>> > too remains stuck in the bootstrapping state for many hours >>> without >>> >>>>>> > any of the other nodes being busy with anti compaction or >>> anything >>> >>>>>> > else. I don't know what's keeping it from finishing the >>> bootstrap,no >>> >>>>>> > CPU, no io, files were already streamed so what is it waiting >>> for? >>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't >>> seem to >>> >>>>>> > be anything addressing a similar issue so I figured there was no >>> >>>>>> point >>> >>>>>> > in upgrading. But let me know if you think there is. >>> >>>>>> > Or any other advice... >>> >>>>>> > >>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com> >>> wrote: >>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is empty >>> so I >>> >>>>>> don't think that any of the nodes is anti-compacting data right >>> now or had >>> >>>>>> been in the past 5 hours. It seems that all the data was already >>> transferred >>> >>>>>> to the joining host but the joining node, after having received >>> the data >>> >>>>>> would still remain in bootstrapping mode and not join the cluster. >>> I'm not >>> >>>>>> sure that *all* data was transferred (perhaps other nodes need to >>> transfer >>> >>>>>> more data) but nothing is actually happening so I assume all has >>> been moved. >>> >>>>>> >> Perhaps it's a configuration error from my part. Should I use I >>> use >>> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the >>> >>>>>> configuration file or something else? >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jak...@gmail.com >>> > >>> >>>>>> wrote: >>> >>>>>> >> >>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in the >>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the >>> >>>>>> anti-compaction progress (it puts new SSTables for bootstrapping >>> node in >>> >>>>>> there) >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com> >>> >>>>>> wrote: >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> Running nodetool decommission didn't help. Actually the node >>> refused >>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I >>> simply stopped >>> >>>>>> the process, deleted all the data directories and started it >>> again. It >>> >>>>>> worked in the sense of the node bootstrapped again but as before, >>> after it >>> >>>>>> had finished moving the data nothing happened for a long time (I'm >>> still >>> >>>>>> waiting, but nothing seems to be happening). >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks >>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com> >>> >>>>>> wrote: >>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the >>> other >>> >>>>>> nodes from the same DC but to my understanding it has already >>> ended. A few >>> >>>>>> hour ago... >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of >>> hours >>> >>>>>> ago, and I've seen the new node streaming and accepting the data >>> from the >>> >>>>>> node which performed the anticompaction and so far it was normal >>> so it >>> >>>>>> seemed that data is at its right place. But now the new node seems >>> sort of >>> >>>>>> stuck. None of the other nodes is anticompacting right now or had >>> been >>> >>>>>> anticompacting since then. >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost >>> zero so >>> >>>>>> I can't find another bottleneck that would keep it hanging. >>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this node, >>> >>>>>> e.g. decommission and rejoin it again. I'll try it now... >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 >>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>> >>>>>> >>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> >>> wrote: >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> In my experience most of the time it takes for a node to join >>> the >>> >>>>>> cluster is the anticompaction on the other nodes. The streaming >>> part is very >>> >>>>>> fast. >>> >>>>>> >> Check the other nodes logs to see if there is any node doing >>> >>>>>> anticompaction.I don't remember how much data I had in the cluster >>> when I >>> >>>>>> needed to add/remove nodes. I do remember that it took a few >>> hours. >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> The node will join the ring only when it will finish the >>> bootstrap. >>> >>>>>> >> -- >>> >>>>>> >> /Ran >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> > >>> >>>>>> > -- >>> >>>>>> > /Ran >>> >>>>>> > >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> /Ran >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> /Ran >>> >>>> >>> >>>> >>> >>> >>> >> >>> >> >>> >> -- >>> >> /Ran >>> >> >>> >> >>> >> >> > -- /Ran