That definitely appears to be the issue. Thanks for pointing that out! https://issues.apache.org/jira/browse/CASSANDRA-8116 It looks like 2.0.12 will check for the default and throw an exception (thanks Mike Adamson) and also includes a bit more text in the config file but I'm thinking that 2.0.12 should be pushed out sooner rather than later as anyone using hsha and the default settings will simply have their cluster stop working a few minutes after the upgrade and without any indication of the actual problem.
Peter On Wed, Oct 29, 2014 at 5:23 AM, Duncan Sands <duncan.sa...@gmail.com> wrote: > Hi Peter, are you using the hsha RPC server type on this node? If you are, > then it looks like rpc_max_threads threads will be allocated on startup in > 2.0.11 while this wasn't the case before. This can exhaust your heap if the > value of rpc_max_threads is too large (eg if you use the default). > > Ciao, Duncan. > > > On 29/10/14 01:08, Peter Haggerty wrote: >> >> On a 3 node test cluster we recently upgraded one node from 2.0.10 to >> 2.0.11. This is a cluster that had been happily running 2.0.10 for >> weeks and that has very little load and very capable hardware. The >> upgrade was just your typical package upgrade: >> >> $ dpkg -s cassandra | egrep '^Ver|^Main' >> Maintainer: Eric Evans <eev...@apache.org> >> Version: 2.0.11 >> >> Immediately after started it ran a couple of ParNews and then started >> executing CMS runs. In 10 minutes the node had become unreachable and >> was marked as down by the two other nodes in the ring, which are still >> 2.0.10. >> >> We have jstack output and the server logs but nothing seems to be >> jumping out. Has anyone else run into this? What should we be looking >> for? >> >> >> Peter >> >