Re: Nodes not added to existing cluster

Aaron Morton Wed, 20 Nov 2013 23:32:31 -0800

> - broadcast_address is set to the instance's public address
You only need this if you have a multi region setup.


>  I’ve gisted the results here: 
> https://gist.github.com/skyebook/be5ee75a000a1e6d65d0

This error

TRACE [HANDSHAKE-/NODE_1_PUBLIC_IP] 2013-11-18 06:57:13,984 
OutboundTcpConnection.java (line 393) Cannot handshake version with 
/NODE_1_PUBLIC_IP
java.nio.channels.AsynchronousCloseException
        at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:402)
        at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        at java.io.InputStream.read(InputStream.java:101)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at 
org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:387)

Is preventing the node from reading the version and results in this line being 
printed ( -2147483648 is the no version flag)

> OutboundTcpConnection.java (line 333) Target max version is -2147483648; no 
> version information yet, will retry

 
Not really sure why that exception is being thrown, the help does not make it 
clear 
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousCloseException.html

Check the networking. 

Hope that helps. 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 18/11/2013, at 8:36 pm, Skye Book <skye.b...@gmail.com> wrote:

> Hi there,
> 
> I’m bringing this thread back as its something that I thought was solved and 
> is apparently not fixed on my end.
> 
> To recap, I’m having trouble getting a node to join a cluster.  Configuration 
> seems all right using the EC2MultiRegionSnitch but new nodes are unable to 
> handshake with seeds.
> 
> - Security Group has 22 && 1024-65535 open
> - Nodes are configured with password authentication using CassandraAuthorizer
> - internode_authenticator is commented out in configuration
> - rpc_address is set to the instance’s private address
> - listen_address is set to the instance’s private address
> - broadcast_address is set to the instance's public address
> 
> As was suggested earlier, I’ve enabled TRACE logging for 
> OutboundTcpConnection and get the following dumped into system.log when the 
> new node is started up without itself in the seed list (if its own IP is in 
> the list it just creates a new single node cluster).  I’ve gisted the results 
> here: https://gist.github.com/skyebook/be5ee75a000a1e6d65d0
> 
> It looks like the handshake process completely and utterly fails as it seems 
> unable to get any information from the other nodes as evidenced by:
> OutboundTcpConnection.java (line 386) Handshaking version with 
> /NODE_1_PUBLIC_IP
> OutboundTcpConnection.java (line 386) Handshaking version with 
> /NODE_2_PUBLIC_IP
> OutboundTcpConnection.java (line 333) Target max version is -2147483648; no 
> version information yet, will retry
> 
> Thanks in advance for any light you all might be able to shed on what’s going 
> on.
> 
> On Sep 26, 2013, at 9:03 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> 
>>>  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
>>>  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
>> If you can turn up logging to TRACE for 
>> org.apache.cassandra.net.OutboundTcpConnection it will include the full 
>> error. 
>> 
>>> The two addresses that it is unable to handshake with are the other two 
>>> addresses of nodes in the cluster I'm unable to join.
>> Are you mixing versions ? 
>> 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>> 
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>> 
>> On 26/09/2013, at 5:13 PM, Skye Book <skye.b...@gmail.com> wrote:
>> 
>>> Hi Aaron, thanks for the clarification.
>>> 
>>> As might be expected, having the broadcast_address fixed hasn't fixed 
>>> anything.  What I did find after writing my last email is that output.log 
>>> is littered with these:
>>> 
>>>  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
>>>  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
>>>  INFO 05:03:49,803 Cannot handshake version with /ww.xx.yy.zz
>>>  INFO 05:03:49,805 Handshaking version with /ww.xx.yy.zz
>>> 
>>> The two addresses that it is unable to handshake with are the other two 
>>> addresses of nodes in the cluster I'm unable to join.  I started thinking 
>>> that maybe EC2 was having an-advertised problem communicating between AZ's 
>>> but bringing up nodes in both of the other availability zones resulted in 
>>> the same wrong behavior.
>>> 
>>> I've gist'd my cassandra.yaml, its pretty standard and hasn't caused an 
>>> issue in the past for me.  
>>> https://gist.github.com/skyebook/ec9364cdcec02e803ffc
>>> 
>>> Skye Book
>>> http://skyebook.net -- @sbook
>>> 
>>> On Sep 26, 2013, at 12:34 AM, Aaron Morton <aa...@thelastpickle.com> wrote:
>>> 
>>>>>  I am curious, though, how any of this worked in the first place spread 
>>>>> across three AZ's without that being set?
>>>> boradcast_address is only needed when you are going cross region (IIRC 
>>>> it's the EC2MultiRegionSnitch) that sets it. 
>>>> 
>>>> As rob said, make sure the seed list includes on of the other nodes and 
>>>> that the cluster_name set. 
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> New Zealand
>>>> @aaronmorton
>>>> 
>>>> Co-Founder & Principal Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 26/09/2013, at 8:12 AM, Skye Book <skye.b...@gmail.com> wrote:
>>>> 
>>>>> Thank you, both Michael and Robert for your suggestions.  I actually saw 
>>>>> 5760, but we were running on 2.0.0, which it seems like this was fixed in.
>>>>> 
>>>>> That said, I noticed that my Chef scripts were failing to set the 
>>>>> broadcast_address correctly, which I'm guessing is the cause of the 
>>>>> problem, fixing that and trying a redeploy.  I am curious, though, how 
>>>>> any of this worked in the first place spread across three AZ's without 
>>>>> that being set?
>>>>> 
>>>>> -Skye
>>>>> 
>>>>> On Sep 25, 2013, at 3:56 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>>>> 
>>>>>> On Wed, Sep 25, 2013 at 12:41 PM, Skye Book <skye.b...@gmail.com> wrote:
>>>>>> I have a three node cluster using the EC2 Multi-Region Snitch currently 
>>>>>> operating only in US-EAST.  On having a node go down this morning, I 
>>>>>> started a new node with an identical configuration, except for the seed 
>>>>>> list, the listen address and the rpc address.  The new node comes up and 
>>>>>> creates its own cluster rather than joining the pre-existing ring.  I've 
>>>>>> tried creating a node both before ad after using `nodetool remove` for 
>>>>>> the bad node, each time with the same result.
>>>>>> 
>>>>>> What version of Cassandra?
>>>>>> 
>>>>>> This particular confusing behavior is fixed upstream, in a version you 
>>>>>> should not deploy to production yet. Take some solace, however, that you 
>>>>>> may be the last Cassandra administrator to die for a broken code path!
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-5768
>>>>>> 
>>>>>> Does anyone have any suggestions for where to look that might put me on 
>>>>>> the right track?
>>>>>> 
>>>>>> It must be that your seed list is wrong in some way, or your node state 
>>>>>> is wrong. If you're trying to bootstrap a node, note that you can't 
>>>>>> bootstrap a node when it is in its own seed list.
>>>>>> 
>>>>>> If you have installed Cassandra via debian package, there is a 
>>>>>> possibility that your node has started before you explicitly started it. 
>>>>>> If so, it might have invalid node state.
>>>>>> 
>>>>>> Have you tried wiping the data directory and trying again?
>>>>>> 
>>>>>> What is your seed list? Are you sure the new node can reach the seeds on 
>>>>>> the network layer?
>>>>>> 
>>>>>> =Rob
>>>>> 
>>>> 
>>> 
>> 
>

Re: Nodes not added to existing cluster

Reply via email to