I have just solved the problem removing the second keyspace (manually moving its column families to the first). So it seems the problem appears when having multiple keyspaces.
2010/11/8 Thibaut Britz <thibaut.br...@trendiction.com> > Hi, > > No I didn't solve the problem. I reinitialized the cluster and gave each > node manually a token before adding data. There are a few messages in > multiple threads related to this, so I suspect it's very common and I hope > it's gone with 0.7. > > Thibaut > > > > > > On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta <mcanal...@gmail.com> wrote: > >> Hi, >> >> Did you solve this problem? I'm having the same poblem. I'm trying to >> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1 >> and KeyspaceLogs, both with replication factor 2. >> >> It starts bootstrapping, receives some streams but it keeps waiting for >> streams. I enabled the debug mode. This lines may be useful: >> >> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning >> bootstrap process >> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added >> /10.204.93.16/Keyspace1 as a bootstrap source >> ... >> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added >> /10.204.93.16/KeyspaceLogs as a bootstrap source >> ... (streaming mesages) >> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171) >> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/ >> 10.204.93.16] >> ... >> (and never ends). >> >> It seems it is waiting for [/10.204.93.16] when it should be waiting for >> /10.204.93.16/KeyspaceLogs. >> >> The third node is 64 bits, while the two existing nodes are 32 bits. Can >> this be a problem? >> >> Thank you. >> >> >> 2010/10/28 Dimitry Lvovsky <dimi...@reviewpro.com> >> >> Maybe your <StoragePort>7000</StoragePort> is being blocked by iptables >>> or some firewall or maybe you have it bound (<ListenAddress> tag ) to >>> localhost instead an ip address. >>> >>> Hope this helps, >>> Dimitry. >>> >>> >>> >>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz < >>> thibaut.br...@trendiction.com> wrote: >>> >>>> Hi, >>>> >>>> I have the same problem with 0.6.5 >>>> >>>> New nodes will hang forever in bootstrap mode (no streams are being >>>> opened) and the receiver thread just waits for data forever: >>>> >>>> >>>> INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120) >>>> Sampling index for /hd2/cassandra/data/table_xyz/ >>>> table_xyz-3-Data.db >>>> INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java >>>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db >>>> >>>> Stacktracke: >>>> >>>> "pool-1-thread-53" prio=10 tid=0x00000000412f2800 nid=0x215c runnable >>>> [0x00007fd7cf217000] >>>> java.lang.Thread.State: RUNNABLE >>>> at java.net.SocketInputStream.socketRead0(Native Method) >>>> at java.net.SocketInputStream.read(SocketInputStream.java:129) >>>> at >>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >>>> at >>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258) >>>> at >>>> java.io.BufferedInputStream.read(BufferedInputStream.java:317) >>>> - locked <0x00007fd7e77e0520> (a java.io.BufferedInputStream) >>>> at >>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126) >>>> at >>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) >>>> at >>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) >>>> at >>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) >>>> at >>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) >>>> at >>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154) >>>> at >>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>> at java.lang.Thread.run(Thread.java:662) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton <aa...@thelastpickle.com >>>> > wrote: >>>> >>>>> The best approach is to manually select the tokens, see the Load >>>>> Balancing section http://wiki.apache.org/cassandra/Operations Also >>>>> >>>>> Are there any log messages in the existing nodes or the new one which >>>>> mention each other? >>>>> >>>>> Is this a production system? Is it still running ? >>>>> >>>>> Sorry there is not a lot to go on, it sounds like you've done the right >>>>> thing. I'm assuming things like the Cluster Name, seed list and port >>>>> numbers >>>>> are set correct as the new node got some data. >>>>> >>>>> You'll need to dig through the logs a bit more to see that the boot >>>>> strapping started and what was the last message it logged. >>>>> >>>>> Good Luck. >>>>> Aaron >>>>> >>>>> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote: >>>>> >>>>> Hi Aaron, >>>>> Thanks for your reply. >>>>> >>>>> We still haven't solved this unfortunately. >>>>> >>>>> How did you start the bootstrap for the .18 node ? >>>>> >>>>> >>>>> Standard way: we set "AutoBootstrap" to true and added all the servers >>>>> from the working ring as seeds. >>>>> >>>>> >>>>>> Was it the .18 or the .17 node you tried to add >>>>> >>>>> >>>>> We first tried adding .17, it streamed for a while, took on a 50GB of >>>>> load, stopped streaming but then didn't enter into the ring. We left it >>>>> for >>>>> a few days to see if it would come in, but no luck. After that we did >>>>> decommission and removeToken ( in that order) operations. >>>>> Since we couldn't get .17 in we tried again with .18. Before doing so >>>>> we increased the RpcTimeoutInMillis from 1000, to 10000 having read that >>>>> this may cause the problem of nodes not entering into the ring. It's >>>>> been >>>>> going since friday and still, like .17, won't come into the ring. >>>>> >>>>> Does it have a token in the config or did you use nodetool move to set >>>>>> it >>>>> >>>>> No we didn't manually set the token in the config, rather we were >>>>> relaying on the token to be assigned durring bootstrap from the >>>>> RandomPartitioner. >>>>> >>>>> Again thanks for the help. >>>>> >>>>> Dimitry. >>>>> >>>>> >>>>> >>>>> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton < >>>>> aa...@thelastpickle.com> wrote: >>>>> >>>>>> Dimitry, Did you get anywhere with this ? >>>>>> >>>>>> Was it the .18 or the .17 node you tried to add ? How did you start >>>>>> the bootstrap for the .18 node ? Does it have a token in the config or >>>>>> did >>>>>> you use nodetool move to set it? >>>>>> >>>>>> I had a quick look at the code AKAIK the message about removing the >>>>>> fat client is logged when the node does not have a record of the token >>>>>> the >>>>>> other node as. >>>>>> >>>>>> Aaron >>>>>> >>>>>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimi...@reviewpro.com> >>>>>> wrote: >>>>>> >>>>>> Hi All, >>>>>> We recently upgraded from .65 to .66 after which we tried adding a new >>>>>> node to our cluster. We left it bootstrapping and after 3 days, it still >>>>>> refused to join the ring. The strange thing is that nodetool info shows >>>>>> 50GB >>>>>> of load and nodetool ring shows that it sees the rest of ring, which it >>>>>> is >>>>>> not part of. We tried the process again with another server -- again the >>>>>> same thing as before: >>>>>> >>>>>> >>>>>> //from machine 192.168.218 >>>>>> >>>>>> >>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info >>>>>> 131373516047318302934572185119435768941 >>>>>> Load : 52.85 GB >>>>>> Generation No : 1287761987 >>>>>> Uptime (seconds) : 323157 >>>>>> Heap Memory (MB) : 795.42 / 1945.63 >>>>>> >>>>>> >>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring >>>>>> Address Status Load Range Ring >>>>>> 158573510920250391466717289405976537674 >>>>>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 |<--| >>>>>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | | >>>>>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | | >>>>>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 |-->| >>>>>> >>>>>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams >>>>>> Mode: Bootstrapping >>>>>> Not sending any streams. >>>>>> Not receiving any streams. >>>>>> >>>>>> >>>>>> Whats more, while looking at the log of one of the nodes I see gossip >>>>>> messages from 192.168.1.17 -- the first node we tried to add to the >>>>>> cluster >>>>>> but which is not running at the the time of the log message: >>>>>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) >>>>>> FatClient /192.168.2.17 has been silent for 3600000ms, removing from >>>>>> gossip >>>>>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node / >>>>>> 192.168.2.17 is now part of the cluster >>>>>> >>>>>> >>>>>> Thanks in advance for the help, >>>>>> Dimitry >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dimitry Lvovsky >>>>> Director of Engineering >>>>> ReviewPro >>>>> www.reviewpro.com >>>>> +34 616 337 103 >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Dimitry Lvovsky >>> Director of Engineering >>> ReviewPro >>> www.reviewpro.com >>> +34 616 337 103 >>> >> >> >