If "seed list should be the same across the cluster" that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no?
On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani <jak...@gmail.com> wrote: > Well your ring issues don't make sense to me, seed list should be the same > across the cluster. > I'm just thinking of other things to try, non-boostrapped nodes should join > the ring instantly but reads will fail if you aren't using quorum. > > > On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory <ran...@gmail.com> wrote: > >> I haven't tried repair. Should I? >> On Jan 5, 2011 3:48 PM, "Jake Luciani" <jak...@gmail.com> wrote: >> > Have you tried not bootstrapping but setting the token and manually >> calling >> > repair? >> > >> > On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory <ran...@gmail.com> wrote: >> > >> >> My conclusion is lame: I tried this on several hosts and saw the same >> >> behavior, the only way I was able to join new nodes was to first start >> them >> >> when they are *not in* their own seeds list and after they >> >> finish transferring the data, then restart them with themselves *in* >> their >> >> own seeds list. After doing that the node would join the ring. >> >> This is either my misunderstanding or a bug, but the only place I found >> it >> >> documented stated that the new node should not be in its own seeds >> list. >> >> Version 0.6.6. >> >> >> >> On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn <da...@lookin2.com >> >wrote: >> >> >> >>> My nodes all have themselves in their list of seeds - always did - and >> >>> everything works. (You may ask why I did this. I don't know, I must >> have >> >>> copied it from an example somewhere.) >> >>> >> >>> On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory <ran...@gmail.com> wrote: >> >>> >> >>>> I was able to make the node join the ring but I'm confused. >> >>>> What I did is, first when adding the node, this node was not in the >> seeds >> >>>> list of itself. AFAIK this is how it's supposed to be. So it was able >> to >> >>>> transfer all data to itself from other nodes but then it stayed in >> the >> >>>> bootstrapping state. >> >>>> So what I did (and I don't know why it works), is add this node to >> the >> >>>> seeds list in its own storage-conf.xml file. Then restart the server >> and >> >>>> then I finally see it in the ring... >> >>>> If I had added the node to the seeds list of itself when first >> joining >> >>>> it, it would not join the ring but if I do it in two phases it did >> work. >> >>>> So it's either my misunderstanding or a bug... >> >>>> >> >>>> >> >>>> On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory <ran...@gmail.com> wrote: >> >>>> >> >>>>> The new node does not see itself as part of the ring, it sees all >> others >> >>>>> but itself, so from that perspective the view is consistent. >> >>>>> The only problem is that the node never finishes to bootstrap. It >> stays >> >>>>> in this state for hours (It's been 20 hours now...) >> >>>>> >> >>>>> >> >>>>> $ bin/nodetool -p 9004 -h localhost streams >> >>>>>> Mode: Bootstrapping >> >>>>>> Not sending any streams. >> >>>>>> Not receiving any streams. >> >>>>> >> >>>>> >> >>>>> On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall <n...@riptano.com> >> wrote: >> >>>>> >> >>>>>> Does the new node have itself in the list of seeds per chance? This >> >>>>>> could cause some issues if so. >> >>>>>> >> >>>>>> On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory <ran...@gmail.com> >> wrote: >> >>>>>> > I'm still at lost. I haven't been able to resolve this. I tried >> >>>>>> > adding another node at a different location on the ring but this >> node >> >>>>>> > too remains stuck in the bootstrapping state for many hours >> without >> >>>>>> > any of the other nodes being busy with anti compaction or >> anything >> >>>>>> > else. I don't know what's keeping it from finishing the >> bootstrap,no >> >>>>>> > CPU, no io, files were already streamed so what is it waiting >> for? >> >>>>>> > I read the release notes of 0.6.7 and 0.6.8 and there didn't seem >> to >> >>>>>> > be anything addressing a similar issue so I figured there was no >> >>>>>> point >> >>>>>> > in upgrading. But let me know if you think there is. >> >>>>>> > Or any other advice... >> >>>>>> > >> >>>>>> > On Tuesday, January 4, 2011, Ran Tavory <ran...@gmail.com> >> wrote: >> >>>>>> >> Thanks Jake, but unfortunately the streams directory is empty so >> I >> >>>>>> don't think that any of the nodes is anti-compacting data right now >> or had >> >>>>>> been in the past 5 hours. It seems that all the data was already >> transferred >> >>>>>> to the joining host but the joining node, after having received the >> data >> >>>>>> would still remain in bootstrapping mode and not join the cluster. >> I'm not >> >>>>>> sure that *all* data was transferred (perhaps other nodes need to >> transfer >> >>>>>> more data) but nothing is actually happening so I assume all has >> been moved. >> >>>>>> >> Perhaps it's a configuration error from my part. Should I use I >> use >> >>>>>> AutoBootstrap=true ? Anything else I should look out for in the >> >>>>>> configuration file or something else? >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani <jak...@gmail.com> >> >>>>>> wrote: >> >>>>>> >> >> >>>>>> >> In 0.6, locate the node doing anti-compaction and look in the >> >>>>>> "streams" subdirectory in the keyspace data dir to monitor the >> >>>>>> anti-compaction progress (it puts new SSTables for bootstrapping >> node in >> >>>>>> there) >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory <ran...@gmail.com> >> >>>>>> wrote: >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> Running nodetool decommission didn't help. Actually the node >> refused >> >>>>>> to decommission itself (b/c it wasn't part of the ring). So I >> simply stopped >> >>>>>> the process, deleted all the data directories and started it again. >> It >> >>>>>> worked in the sense of the node bootstrapped again but as before, >> after it >> >>>>>> had finished moving the data nothing happened for a long time (I'm >> still >> >>>>>> waiting, but nothing seems to be happening). >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> Any hints how to analyze a "stuck" bootstrapping node??thanks >> >>>>>> >> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory <ran...@gmail.com> >> >>>>>> wrote: >> >>>>>> >> Thanks Shimi, so indeed anticompaction was run on one of the >> other >> >>>>>> nodes from the same DC but to my understanding it has already >> ended. A few >> >>>>>> hour ago... >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> I plenty of log messages such as [1] which ended a couple of >> hours >> >>>>>> ago, and I've seen the new node streaming and accepting the data >> from the >> >>>>>> node which performed the anticompaction and so far it was normal so >> it >> >>>>>> seemed that data is at its right place. But now the new node seems >> sort of >> >>>>>> stuck. None of the other nodes is anticompacting right now or had >> been >> >>>>>> anticompacting since then. >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> The new node's CPU is close to zero, it's iostats are almost >> zero so >> >>>>>> I can't find another bottleneck that would keep it hanging. >> >>>>>> >> On the IRC someone suggested I'd maybe retry to join this node, >> >>>>>> e.g. decommission and rejoin it again. I'll try it now... >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 >> >>>>>> CompactionManager.java (line 338) AntiCompacting >> >>>>>> >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 >> >>>>>> CompactionManager.java (line 338) AntiCompacting >> >>>>>> >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 >> >>>>>> CompactionManager.java (line 338) AntiCompacting >> >>>>>> >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 >> >>>>>> CompactionManager.java (line 338) AntiCompacting >> >>>>>> >> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> On Tue, Jan 4, 2011 at 12:45 PM, shimi <shim...@gmail.com> >> wrote: >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> In my experience most of the time it takes for a node to join >> the >> >>>>>> cluster is the anticompaction on the other nodes. The streaming >> part is very >> >>>>>> fast. >> >>>>>> >> Check the other nodes logs to see if there is any node doing >> >>>>>> anticompaction.I don't remember how much data I had in the cluster >> when I >> >>>>>> needed to add/remove nodes. I do remember that it took a few hours. >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> The node will join the ring only when it will finish the >> bootstrap. >> >>>>>> >> -- >> >>>>>> >> /Ran >> >>>>>> >> >> >>>>>> >> >> >>>>>> > >> >>>>>> > -- >> >>>>>> > /Ran >> >>>>>> > >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> /Ran >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> /Ran >> >>>> >> >>>> >> >>> >> >> >> >> >> >> -- >> >> /Ran >> >> >> >> >> > >