On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory <ran...@gmail.com> wrote: > In storage-conf I see this comment [1] from which I understand that the > recommended way to bootstrap a new node is to set AutoBootstrap=true and > remove itself from the seeds list. > Moreover, I did try to set AutoBootstrap=true and have the node in its own > seeds list, but it would not bootstrap. I don't recall the exact message but > it was something like "I found myself in the seeds list therefore I'm not > going to bootstrap even though AutoBootstrap is true". > > [1] > <!-- > ~ Turn on to make new [non-seed] nodes automatically migrate the right > data > ~ to themselves. (If no InitialToken is specified, they will pick one > ~ such that they will get half the range of the most-loaded node.) > ~ If a node starts up without bootstrapping, it will mark itself > bootstrapped > ~ so that you can't subsequently accidently bootstrap a node with > ~ data on it. (You can reset this by wiping your data and commitlog > ~ directories.) > ~ > ~ Off by default so that new clusters and upgraders from 0.4 don't > ~ bootstrap immediately. You should turn this on when you start adding > ~ new nodes to a cluster that already has data on it. (If you are > upgrading > ~ from 0.4, start your cluster with it off once before changing it to > true. > ~ Otherwise, no data will be lost but you will incur a lot of unnecessary > ~ I/O before your cluster starts up.) > --> > <AutoBootstrap>false</AutoBootstrap> > On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn <da...@lookin2.com> wrote: >> >> If "seed list should be the same across the cluster" that means that nodes >> *should* have themselves as a seed. If that doesn't work for Ran, then that >> is the first problem, no? >> >> >> On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote: >>> >>> Well your ring issues don't make sense to me, seed list should be the >>> same across the cluster. >>> I'm just thinking of other things to try, non-boostrapped nodes should >>> join the ring instantly but reads will fail if you aren't using quorum. >>> >>> On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote: >>>> >>>> I haven't tried repair. Should I? >>>> >>>> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote: >>>> > Have you tried not bootstrapping but setting the token and manually >>>> > calling >>>> > repair? >>>> > >>>> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> wrote: >>>> > >>>> >> My conclusion is lame: I tried this on several hosts and saw the same >>>> >> behavior, the only way I was able to join new nodes was to first >>>> >> start them >>>> >> when they are *not in* their own seeds list and after they >>>> >> finish transferring the data, then restart them with themselves *in* >>>> >> their >>>> >> own seeds list. After doing that the node would join the ring. >>>> >> This is either my misunderstanding or a bug, but the only place I >>>> >> found it >>>> >> documented stated that the new node should not be in its own seeds >>>> >> list. >>>> >> Version 0.6.6. >>>> >> >>>> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn >>>> >> <da...@lookin2.com>wrote: >>>> >> >>>> >>> My nodes all have themselves in their list of seeds - always did - >>>> >>> and >>>> >>> everything works. (You may ask why I did this. I don't know, I must >>>> >>> have >>>> >>> copied it from an example somewhere.) >>>> >>> >>>> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> wrote: >>>> >>> >>>> >>>> I was able to make the node join the ring but I'm confused. >>>> >>>> What I did is, first when adding the node, this node was not in the >>>> >>>> seeds >>>> >>>> list of itself. AFAIK this is how it's supposed to be. So it was >>>> >>>> able to >>>> >>>> transfer all data to itself from other nodes but then it stayed in >>>> >>>> the >>>> >>>> bootstrapping state. >>>> >>>> So what I did (and I don't know why it works), is add this node to >>>> >>>> the >>>> >>>> seeds list in its own storage-conf.xml file. Then restart the >>>> >>>> server and >>>> >>>> then I finally see it in the ring... >>>> >>>> If I had added the node to the seeds list of itself when first >>>> >>>> joining >>>> >>>> it, it would not join the ring but if I do it in two phases it did >>>> >>>> work. >>>> >>>> So it's either my misunderstanding or a bug... >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> >>>> >>>> wrote: >>>> >>>> >>>> >>>>> The new node does not see itself as part of the ring, it sees all >>>> >>>>> others >>>> >>>>> but itself, so from that perspective the view is consistent. >>>> >>>>> The only problem is that the node never finishes to bootstrap. It >>>> >>>>> stays >>>> >>>>> in this state for hours (It's been 20 hours now...) >>>> >>>>> >>>> >>>>> >>>> >>>>> $ bin/nodetool -p 9004 -h localhost streams >>>> >>>>>> Mode: Bootstrapping >>>> >>>>>> Not sending any streams. >>>> >>>>>> Not receiving any streams. >>>> >>>>> >>>> >>>>> >>>> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com> >>>> >>>>> wrote: >>>> >>>>> >>>> >>>>>> Does the new node have itself in the list of seeds per chance? >>>> >>>>>> This >>>> >>>>>> could cause some issues if so. >>>> >>>>>> >>>> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com> >>>> >>>>>> wrote: >>>> >>>>>> > I'm still at lost. I haven't been able to resolve this. I tried >>>> >>>>>> > adding another node at a different location on the ring but >>>> >>>>>> > this node >>>> >>>>>> > too remains stuck in the bootstrapping state for many hours >>>> >>>>>> > without >>>> >>>>>> > any of the other nodes being busy with anti compaction or >>>> >>>>>> > anything >>>> >>>>>> > else. I don't know what's keeping it from finishing the >>>> >>>>>> > bootstrap,no >>>> >>>>>> > CPU, no io, files were already streamed so what is it waiting >>>> >>>>>> > for? >>>> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't >>>> >>>>>> > seem to >>>> >>>>>> > be anything addressing a similar issue so I figured there was >>>> >>>>>> > no >>>> >>>>>> point >>>> >>>>>> > in upgrading. But let me know if you think there is. >>>> >>>>>> > Or any other advice... >>>> >>>>>> > >>>> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com> >>>> >>>>>> > wrote: >>>> >>>>>> >> Thanks Jake, but unfortunately the streams directory is empty >>>> >>>>>> >> so I >>>> >>>>>> don't think that any of the nodes is anti-compacting data right >>>> >>>>>> now or had >>>> >>>>>> been in the past 5 hours. It seems that all the data was already >>>> >>>>>> transferred >>>> >>>>>> to the joining host but the joining node, after having received >>>> >>>>>> the data >>>> >>>>>> would still remain in bootstrapping mode and not join the >>>> >>>>>> cluster. I'm not >>>> >>>>>> sure that *all* data was transferred (perhaps other nodes need to >>>> >>>>>> transfer >>>> >>>>>> more data) but nothing is actually happening so I assume all has >>>> >>>>>> been moved. >>>> >>>>>> >> Perhaps it's a configuration error from my part. Should I use >>>> >>>>>> >> I use >>>> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the >>>> >>>>>> configuration file or something else? >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani >>>> >>>>>> >> <jak...@gmail.com> >>>> >>>>>> wrote: >>>> >>>>>> >> >>>> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in the >>>> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the >>>> >>>>>> anti-compaction progress (it puts new SSTables for bootstrapping >>>> >>>>>> node in >>>> >>>>>> there) >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com> >>>> >>>>>> wrote: >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> Running nodetool decommission didn't help. Actually the node >>>> >>>>>> >> refused >>>> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I >>>> >>>>>> simply stopped >>>> >>>>>> the process, deleted all the data directories and started it >>>> >>>>>> again. It >>>> >>>>>> worked in the sense of the node bootstrapped again but as before, >>>> >>>>>> after it >>>> >>>>>> had finished moving the data nothing happened for a long time >>>> >>>>>> (I'm still >>>> >>>>>> waiting, but nothing seems to be happening). >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks >>>> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com> >>>> >>>>>> wrote: >>>> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the >>>> >>>>>> >> other >>>> >>>>>> nodes from the same DC but to my understanding it has already >>>> >>>>>> ended. A few >>>> >>>>>> hour ago... >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> I plenty of log messages such as [1] which ended a couple of >>>> >>>>>> >> hours >>>> >>>>>> ago, and I've seen the new node streaming and accepting the data >>>> >>>>>> from the >>>> >>>>>> node which performed the anticompaction and so far it was normal >>>> >>>>>> so it >>>> >>>>>> seemed that data is at its right place. But now the new node >>>> >>>>>> seems sort of >>>> >>>>>> stuck. None of the other nodes is anticompacting right now or had >>>> >>>>>> been >>>> >>>>>> anticompacting since then. >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost >>>> >>>>>> >> zero so >>>> >>>>>> I can't find another bottleneck that would keep it hanging. >>>> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this >>>> >>>>>> >> node, >>>> >>>>>> e.g. decommission and rejoin it again. I'll try it now... >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>>> >>>>>> >>>> >>>>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>>> >>>>>> >>>> >>>>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>>> >>>>>> >>>> >>>>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 >>>> >>>>>> CompactionManager.java (line 338) AntiCompacting >>>> >>>>>> >>>> >>>>>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> >>>> >>>>>> >> wrote: >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> In my experience most of the time it takes for a node to join >>>> >>>>>> >> the >>>> >>>>>> cluster is the anticompaction on the other nodes. The streaming >>>> >>>>>> part is very >>>> >>>>>> fast. >>>> >>>>>> >> Check the other nodes logs to see if there is any node doing >>>> >>>>>> anticompaction.I don't remember how much data I had in the >>>> >>>>>> cluster when I >>>> >>>>>> needed to add/remove nodes. I do remember that it took a few >>>> >>>>>> hours. >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> The node will join the ring only when it will finish the >>>> >>>>>> >> bootstrap. >>>> >>>>>> >> -- >>>> >>>>>> >> /Ran >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> > >>>> >>>>>> > -- >>>> >>>>>> > /Ran >>>> >>>>>> > >>>> >>>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> -- >>>> >>>>> /Ran >>>> >>>>> >>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> /Ran >>>> >>>> >>>> >>>> >>>> >>> >>>> >> >>>> >> >>>> >> -- >>>> >> /Ran >>>> >> >>>> >> >>> >> > > > > -- > /Ran >
If non-auto-bootstrap nodes to not join they check to make sure good old iptables is not on. Edward