> [snip] > I'm not sure that is the case. > > When the server gets into the unrecoverable state, the repeating exceptions > are indeed "SocketException: Too many open files". [snip] > Although this is unquestionably a network error, I don't think it is > actually a > network problem per se, as the maximum number of sockets open by the > Cassandra server is at this point is about 8. When I kill the client, > sockets > held are just the listening sockets - no sockets in ESTABLISHED or > TIMED_WAIT.
Is this based on netstat or lsof or similar? When the node is in the state of giving these errors, try inspecting /proc/<pid>/fd or use lsof. Presumably you'll see thousands of fds of some category; either sockets or files. (If you already did this, sorry!) -- / Peter Schuller