Are you using EC2 ?

On 11 May 2012, at 16:13, Pavel Polushkin wrote:

> We use 1.0.8 version.
>  
> From: David Leimbach [mailto:leim...@gmail.com] 
> Sent: Friday, May 11, 2012 18:48
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> What's the version number of Cassandra?
> 
> On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin <ppolush...@enkata.com> 
> wrote:
> Hello,
> 
>  
> 
> We faced with a strange problem while testing performance on Cassandra 
> cluster. After some time all nodes went to down state for several days. Now 
> all nodes went back to up state and only one node still down.
> 
>  
> 
> Nodetool on down node throws exception:
> 
> Error connection to remote JMX agent!
> 
> java.io.IOException: Failed to retrieve RMIServer stub: 
> javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
>         java.net.SocketTimeoutException: Read timed out]
> 
>         at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)
> 
>         at 
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
>         at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
>         at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:114)
> 
>         at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
> 
> Caused by: javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
>         java.net.SocketTimeoutException: Read timed out]
> 
>         at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
>         at 
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
>         at javax.naming.InitialContext.lookup(InitialContext.java:392)
> 
>         at 
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
>         at 
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
>         at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)
> 
>         ... 4 more
> 
> Caused by: java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is:
> 
>         java.net.SocketTimeoutException: Read timed out
> 
>         at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
> 
>         at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
> 
>         at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
> 
>         at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
> 
>         at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)
> 
>         ... 9 more
> 
> Caused by: java.net.SocketTimeoutException: Read timed out
> 
>         at java.net.SocketInputStream.socketRead0(Native Method)
> 
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
> 
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 
>         at java.io.DataInputStream.readByte(DataInputStream.java:248)
> 
>         at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
> 
>         ... 13 more
> 
>  
> 
> In system log of down node unlimited list of such errors:
> 
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804) 
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP 
> INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804) 
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.161 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.165 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.162 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.163 is now dead.
> 
> INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804) 
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP 
> INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804) 
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP
> 
>  
> 
> The suspicious fact is that on this node we have several tcp connections to 
> other nodes 7000 port in CLOSE_WAIT state:
> 
> Active Internet connections (servers and established)
> 
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> 
> tcp   869073      0 rcwocas:afs3-fileserver rcwocas03.enkata.:34274 CLOSE_WAIT
> 
> tcp   463429      0 rcwocas:afs3-fileserver rcwocas02.enkata.:39654 CLOSE_WAIT
> 
> tcp   873838      0 rcwocas:afs3-fileserver rcwocas01.enkata.:49486 CLOSE_WAIT
> 
> tcp   860245      0 rcwocas:afs3-fileserver rcwocas05.enkata.:43028 CLOSE_WAIT
> 
> tcp      112      0 rcwocas:afs3-fileserver rcwocas02.enkata.:40321 CLOSE_WAIT
> 
> tcp     2124      0 rcwocas:afs3-fileserver rcwocas03.enkata.:39338 CLOSE_WAIT
> 
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:56408 
> ESTABLISHED
> 
> tcp      184      0 rcwocas:afs3-fileserver rcwocas01.enkata.:48862 CLOSE_WAIT
> 
> tcp   534489      0 rcwocas:afs3-fileserver rcwocas02.enkata.:35331 
> ESTABLISHED
> 
> tcp      886      0 rcwocas:afs3-fileserver rcwocas03.enkata.:56034 CLOSE_WAIT
> 
> tcp        0      0 rcwocas04.Enkata.:48800 rcwocas:afs3-fileserver 
> ESTABLISHED
> 
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:51348 
> ESTABLISHED
> 
> tcp      187      0 rcwocas:afs3-fileserver rcwocas05.enkata.:45538 CLOSE_WAIT
> 
> tcp      253      0 rcwocas:afs3-fileserver rcwocas03.enkata.:51359 CLOSE_WAIT
> 
>  
> 
> Also I have attached thread dump
> 
>  
> 
> Thanks,
> 
> Pavel
> 
>  
>  

Reply via email to