tcpdump shows bidirectional communication with ACKs during a known problem period. I did not have TRACE logging going during the period I have tcpdump logs, but I assume that an 'INFO error connecting to' is probably caused by ConnectExceptions
For instance... lpc03:~$ telnet fs02 7000 ...connects during the problem period. I wish the ConnectException contained port information to be very sure of what it was trying to attempt. But my setup uses default Gossip ports. The only interface thing that's non-standard is that the JMX ports are set to 8081 on all hosts. Hopefully I'll be able to do another experiment in an hour or so, but then going camping for a couple days. AJ On Sat, Jun 19, 2010 at 5:05 PM, Peter Schuller <peter.schul...@infidyne.com> wrote: >> TRACE 14:42:06,248 unable to connect to /10.33.3.20 >> java.net.ConnectException: Connection refused >> at java.net.PlainSocketImpl.socketConnect(Native Method) > > So that's interesting since it is a clear failure that comes from the > operating system and indicates something which can be observed outside > of cassandra using system tools. Presumably either cassandra is > somehow connecting to the wrong port, or this is a > firewalling/os/network issue, or the 'other' cassandra is not > listening on the port. Using tcpdump/netstat -nlp should narrow that > down. > > Is it possible connections only succeed in one direction for example? > > -- > / Peter Schuller >