> - broadcast_address is set to the instance's public address You only need this if you have a multi region setup.
> I’ve gisted the results here: > https://gist.github.com/skyebook/be5ee75a000a1e6d65d0 This error TRACE [HANDSHAKE-/NODE_1_PUBLIC_IP] 2013-11-18 06:57:13,984 OutboundTcpConnection.java (line 393) Cannot handshake version with /NODE_1_PUBLIC_IP java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:402) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:387) Is preventing the node from reading the version and results in this line being printed ( -2147483648 is the no version flag) > OutboundTcpConnection.java (line 333) Target max version is -2147483648; no > version information yet, will retry Not really sure why that exception is being thrown, the help does not make it clear http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousCloseException.html Check the networking. Hope that helps. ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 18/11/2013, at 8:36 pm, Skye Book <skye.b...@gmail.com> wrote: > Hi there, > > I’m bringing this thread back as its something that I thought was solved and > is apparently not fixed on my end. > > To recap, I’m having trouble getting a node to join a cluster. Configuration > seems all right using the EC2MultiRegionSnitch but new nodes are unable to > handshake with seeds. > > - Security Group has 22 && 1024-65535 open > - Nodes are configured with password authentication using CassandraAuthorizer > - internode_authenticator is commented out in configuration > - rpc_address is set to the instance’s private address > - listen_address is set to the instance’s private address > - broadcast_address is set to the instance's public address > > As was suggested earlier, I’ve enabled TRACE logging for > OutboundTcpConnection and get the following dumped into system.log when the > new node is started up without itself in the seed list (if its own IP is in > the list it just creates a new single node cluster). I’ve gisted the results > here: https://gist.github.com/skyebook/be5ee75a000a1e6d65d0 > > It looks like the handshake process completely and utterly fails as it seems > unable to get any information from the other nodes as evidenced by: > OutboundTcpConnection.java (line 386) Handshaking version with > /NODE_1_PUBLIC_IP > OutboundTcpConnection.java (line 386) Handshaking version with > /NODE_2_PUBLIC_IP > OutboundTcpConnection.java (line 333) Target max version is -2147483648; no > version information yet, will retry > > Thanks in advance for any light you all might be able to shed on what’s going > on. > > On Sep 26, 2013, at 9:03 PM, Aaron Morton <aa...@thelastpickle.com> wrote: > >>> INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd >>> INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd >> If you can turn up logging to TRACE for >> org.apache.cassandra.net.OutboundTcpConnection it will include the full >> error. >> >>> The two addresses that it is unable to handshake with are the other two >>> addresses of nodes in the cluster I'm unable to join. >> Are you mixing versions ? >> >> >> Cheers >> >> ----------------- >> Aaron Morton >> New Zealand >> @aaronmorton >> >> Co-Founder & Principal Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> On 26/09/2013, at 5:13 PM, Skye Book <skye.b...@gmail.com> wrote: >> >>> Hi Aaron, thanks for the clarification. >>> >>> As might be expected, having the broadcast_address fixed hasn't fixed >>> anything. What I did find after writing my last email is that output.log >>> is littered with these: >>> >>> INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd >>> INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd >>> INFO 05:03:49,803 Cannot handshake version with /ww.xx.yy.zz >>> INFO 05:03:49,805 Handshaking version with /ww.xx.yy.zz >>> >>> The two addresses that it is unable to handshake with are the other two >>> addresses of nodes in the cluster I'm unable to join. I started thinking >>> that maybe EC2 was having an-advertised problem communicating between AZ's >>> but bringing up nodes in both of the other availability zones resulted in >>> the same wrong behavior. >>> >>> I've gist'd my cassandra.yaml, its pretty standard and hasn't caused an >>> issue in the past for me. >>> https://gist.github.com/skyebook/ec9364cdcec02e803ffc >>> >>> Skye Book >>> http://skyebook.net -- @sbook >>> >>> On Sep 26, 2013, at 12:34 AM, Aaron Morton <aa...@thelastpickle.com> wrote: >>> >>>>> I am curious, though, how any of this worked in the first place spread >>>>> across three AZ's without that being set? >>>> boradcast_address is only needed when you are going cross region (IIRC >>>> it's the EC2MultiRegionSnitch) that sets it. >>>> >>>> As rob said, make sure the seed list includes on of the other nodes and >>>> that the cluster_name set. >>>> >>>> Cheers >>>> >>>> ----------------- >>>> Aaron Morton >>>> New Zealand >>>> @aaronmorton >>>> >>>> Co-Founder & Principal Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>>> On 26/09/2013, at 8:12 AM, Skye Book <skye.b...@gmail.com> wrote: >>>> >>>>> Thank you, both Michael and Robert for your suggestions. I actually saw >>>>> 5760, but we were running on 2.0.0, which it seems like this was fixed in. >>>>> >>>>> That said, I noticed that my Chef scripts were failing to set the >>>>> broadcast_address correctly, which I'm guessing is the cause of the >>>>> problem, fixing that and trying a redeploy. I am curious, though, how >>>>> any of this worked in the first place spread across three AZ's without >>>>> that being set? >>>>> >>>>> -Skye >>>>> >>>>> On Sep 25, 2013, at 3:56 PM, Robert Coli <rc...@eventbrite.com> wrote: >>>>> >>>>>> On Wed, Sep 25, 2013 at 12:41 PM, Skye Book <skye.b...@gmail.com> wrote: >>>>>> I have a three node cluster using the EC2 Multi-Region Snitch currently >>>>>> operating only in US-EAST. On having a node go down this morning, I >>>>>> started a new node with an identical configuration, except for the seed >>>>>> list, the listen address and the rpc address. The new node comes up and >>>>>> creates its own cluster rather than joining the pre-existing ring. I've >>>>>> tried creating a node both before ad after using `nodetool remove` for >>>>>> the bad node, each time with the same result. >>>>>> >>>>>> What version of Cassandra? >>>>>> >>>>>> This particular confusing behavior is fixed upstream, in a version you >>>>>> should not deploy to production yet. Take some solace, however, that you >>>>>> may be the last Cassandra administrator to die for a broken code path! >>>>>> >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-5768 >>>>>> >>>>>> Does anyone have any suggestions for where to look that might put me on >>>>>> the right track? >>>>>> >>>>>> It must be that your seed list is wrong in some way, or your node state >>>>>> is wrong. If you're trying to bootstrap a node, note that you can't >>>>>> bootstrap a node when it is in its own seed list. >>>>>> >>>>>> If you have installed Cassandra via debian package, there is a >>>>>> possibility that your node has started before you explicitly started it. >>>>>> If so, it might have invalid node state. >>>>>> >>>>>> Have you tried wiping the data directory and trying again? >>>>>> >>>>>> What is your seed list? Are you sure the new node can reach the seeds on >>>>>> the network layer? >>>>>> >>>>>> =Rob >>>>> >>>> >>> >> >