I had also multiple keyspaces defined (> 20). All nodes were 64 bit, no mixtures.
On Mon, Nov 8, 2010 at 8:23 PM, Dimitry Lvovsky <dimi...@reviewpro.com>wrote: > We didn't solve it unfortunately and and ended up regenerating the entire > cluster. But, if it helps anyone in the future, we too had multiple > keyspaces when we encountered the problem. > > > > On Mon, Nov 8, 2010 at 5:47 PM, Marc Canaleta <mcanal...@gmail.com> wrote: > >> I have just solved the problem removing the second keyspace (manually >> moving its column families to the first). So it seems the problem appears >> when having multiple keyspaces. >> >> 2010/11/8 Thibaut Britz <thibaut.br...@trendiction.com> >> >> Hi, >>> >>> No I didn't solve the problem. I reinitialized the cluster and gave each >>> node manually a token before adding data. There are a few messages in >>> multiple threads related to this, so I suspect it's very common and I hope >>> it's gone with 0.7. >>> >>> Thibaut >>> >>> >>> >>> >>> >>> On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta <mcanal...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> Did you solve this problem? I'm having the same poblem. I'm trying to >>>> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1 >>>> and KeyspaceLogs, both with replication factor 2. >>>> >>>> It starts bootstrapping, receives some streams but it keeps waiting for >>>> streams. I enabled the debug mode. This lines may be useful: >>>> >>>> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) >>>> Beginning bootstrap process >>>> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) >>>> Added /10.204.93.16/Keyspace1 as a bootstrap source >>>> ... >>>> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) >>>> Added /10.204.93.16/KeyspaceLogs as a bootstrap source >>>> ... (streaming mesages) >>>> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171) >>>> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/ >>>> 10.204.93.16] >>>> ... >>>> (and never ends). >>>> >>>> It seems it is waiting for [/10.204.93.16] when it should be waiting >>>> for /10.204.93.16/KeyspaceLogs. >>>> >>>> The third node is 64 bits, while the two existing nodes are 32 bits. Can >>>> this be a problem? >>>> >>>> Thank you. >>>> >>>> >>>> 2010/10/28 Dimitry Lvovsky <dimi...@reviewpro.com> >>>> >>>> Maybe your <StoragePort>7000</StoragePort> is being blocked by >>>>> iptables or some firewall or maybe you have it bound (<ListenAddress> tag >>>>> ) >>>>> to localhost instead an ip address. >>>>> >>>>> Hope this helps, >>>>> Dimitry. >>>>> >>>>> >>>>> >>>>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz < >>>>> thibaut.br...@trendiction.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have the same problem with 0.6.5 >>>>>> >>>>>> New nodes will hang forever in bootstrap mode (no streams are being >>>>>> opened) and the receiver thread just waits for data forever: >>>>>> >>>>>> >>>>>> INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line >>>>>> 120) Sampling index for /hd2/cassandra/data/table_xyz/ >>>>>> table_xyz-3-Data.db >>>>>> INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java >>>>>> (line 64) Streaming added >>>>>> /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db >>>>>> >>>>>> Stacktracke: >>>>>> >>>>>> "pool-1-thread-53" prio=10 tid=0x00000000412f2800 nid=0x215c runnable >>>>>> [0x00007fd7cf217000] >>>>>> java.lang.Thread.State: RUNNABLE >>>>>> at java.net.SocketInputStream.socketRead0(Native Method) >>>>>> at java.net.SocketInputStream.read(SocketInputStream.java:129) >>>>>> at >>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >>>>>> at >>>>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258) >>>>>> at >>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:317) >>>>>> - locked <0x00007fd7e77e0520> (a java.io.BufferedInputStream) >>>>>> at >>>>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126) >>>>>> at >>>>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) >>>>>> at >>>>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) >>>>>> at >>>>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) >>>>>> at >>>>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) >>>>>> at >>>>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154) >>>>>> at >>>>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton < >>>>>> aa...@thelastpickle.com> wrote: >>>>>> >>>>>>> The best approach is to manually select the tokens, see the Load >>>>>>> Balancing section http://wiki.apache.org/cassandra/Operations Also >>>>>>> >>>>>>> Are there any log messages in the existing nodes or the new one which >>>>>>> mention each other? >>>>>>> >>>>>>> Is this a production system? Is it still running ? >>>>>>> >>>>>>> Sorry there is not a lot to go on, it sounds like you've done the >>>>>>> right thing. I'm assuming things like the Cluster Name, seed list and >>>>>>> port >>>>>>> numbers are set correct as the new node got some data. >>>>>>> >>>>>>> You'll need to dig through the logs a bit more to see that the boot >>>>>>> strapping started and what was the last message it logged. >>>>>>> >>>>>>> Good Luck. >>>>>>> Aaron >>>>>>> >>>>>>> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote: >>>>>>> >>>>>>> Hi Aaron, >>>>>>> Thanks for your reply. >>>>>>> >>>>>>> We still haven't solved this unfortunately. >>>>>>> >>>>>>> How did you start the bootstrap for the .18 node ? >>>>>>> >>>>>>> >>>>>>> Standard way: we set "AutoBootstrap" to true and added all the >>>>>>> servers from the working ring as seeds. >>>>>>> >>>>>>> >>>>>>>> Was it the .18 or the .17 node you tried to add >>>>>>> >>>>>>> >>>>>>> We first tried adding .17, it streamed for a while, took on a 50GB of >>>>>>> load, stopped streaming but then didn't enter into the ring. We left >>>>>>> it for >>>>>>> a few days to see if it would come in, but no luck. After that we did >>>>>>> decommission and removeToken ( in that order) operations. >>>>>>> Since we couldn't get .17 in we tried again with .18. Before doing >>>>>>> so we increased the RpcTimeoutInMillis from 1000, to 10000 having read >>>>>>> that >>>>>>> this may cause the problem of nodes not entering into the ring. It's >>>>>>> been >>>>>>> going since friday and still, like .17, won't come into the ring. >>>>>>> >>>>>>> Does it have a token in the config or did you use nodetool move to >>>>>>>> set it >>>>>>> >>>>>>> No we didn't manually set the token in the config, rather we were >>>>>>> relaying on the token to be assigned durring bootstrap from the >>>>>>> RandomPartitioner. >>>>>>> >>>>>>> Again thanks for the help. >>>>>>> >>>>>>> Dimitry. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 26, 2010 at 10:14 PM, Aaron Morton < >>>>>>> aa...@thelastpickle.com> wrote: >>>>>>> >>>>>>>> Dimitry, Did you get anywhere with this ? >>>>>>>> >>>>>>>> Was it the .18 or the .17 node you tried to add ? How did you start >>>>>>>> the bootstrap for the .18 node ? Does it have a token in the config or >>>>>>>> did >>>>>>>> you use nodetool move to set it? >>>>>>>> >>>>>>>> I had a quick look at the code AKAIK the message about removing the >>>>>>>> fat client is logged when the node does not have a record of the token >>>>>>>> the >>>>>>>> other node as. >>>>>>>> >>>>>>>> Aaron >>>>>>>> >>>>>>>> On 26 Oct, 2010,at 10:42 PM, Dimitry Lvovsky <dimi...@reviewpro.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi All, >>>>>>>> We recently upgraded from .65 to .66 after which we tried adding a >>>>>>>> new node to our cluster. We left it bootstrapping and after 3 days, it >>>>>>>> still >>>>>>>> refused to join the ring. The strange thing is that nodetool info >>>>>>>> shows 50GB >>>>>>>> of load and nodetool ring shows that it sees the rest of ring, which >>>>>>>> it is >>>>>>>> not part of. We tried the process again with another server -- again >>>>>>>> the >>>>>>>> same thing as before: >>>>>>>> >>>>>>>> >>>>>>>> //from machine 192.168.218 >>>>>>>> >>>>>>>> >>>>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 info >>>>>>>> 131373516047318302934572185119435768941 >>>>>>>> Load : 52.85 GB >>>>>>>> Generation No : 1287761987 >>>>>>>> Uptime (seconds) : 323157 >>>>>>>> Heap Memory (MB) : 795.42 / 1945.63 >>>>>>>> >>>>>>>> >>>>>>>> /opt/cassandra/bin/nodetool -h localhost -p 8999 ring >>>>>>>> Address Status Load Range Ring >>>>>>>> 158573510920250391466717289405976537674 >>>>>>>> 192.168.2.22 Up 59.45 GB 28203205416427384773583427414698832202 >>>>>>>> |<--| >>>>>>>> 192.168.2.23 Up 44.95 GB 60562227403709245514637766500430120055 | | >>>>>>>> 192.168.2.20 Up 47.15 GB 104160057322065544623939416372654814065 | | >>>>>>>> 192.168.2.21 Up 61.04 GB 158573510920250391466717289405976537674 >>>>>>>> |-->| >>>>>>>> >>>>>>>> opt/cassandra/bin/nodetool -h localhost -p 8999 streams >>>>>>>> Mode: Bootstrapping >>>>>>>> Not sending any streams. >>>>>>>> Not receiving any streams. >>>>>>>> >>>>>>>> >>>>>>>> Whats more, while looking at the log of one of the nodes I see >>>>>>>> gossip messages from 192.168.1.17 -- the first node we tried to add to >>>>>>>> the >>>>>>>> cluster but which is not running at the the time of the log message: >>>>>>>> INFO [Timer-0] 2010-10-26 02:13:20,340 Gossiper.java (line 406) >>>>>>>> FatClient /192.168.2.17 has been silent for 3600000ms, removing >>>>>>>> from gossip >>>>>>>> INFO [GMFD:1] 2010-10-26 02:13:51,398 Gossiper.java (line 591) Node >>>>>>>> /192.168.2.17 is now part of the cluster >>>>>>>> >>>>>>>> >>>>>>>> Thanks in advance for the help, >>>>>>>> Dimitry >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Dimitry Lvovsky >>>>>>> Director of Engineering >>>>>>> ReviewPro >>>>>>> www.reviewpro.com >>>>>>> +34 616 337 103 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dimitry Lvovsky >>>>> Director of Engineering >>>>> ReviewPro >>>>> www.reviewpro.com >>>>> +34 616 337 103 >>>>> >>>> >>>> >>> >> > > > -- > Dimitry Lvovsky > Director of Engineering > ReviewPro > www.reviewpro.com > +34 616 337 103 >