First, lets check if the timeouts are client or server side. What was the timeout error stack ? Were they (python/thrift) socket timeouts or TimedOutException's raised by the cassandra thrift code.
Is it across all requests / clients or say just read? Have you tried asking on http://groups.google.com/group/pycassa-discuss ? Hope that helps. Aaron On 13 Apr 2011, at 04:30, Jason Harvey wrote: > Interesting issue this morning. > > My apps started throwing a bunch of pycassa timeouts all of a sudden. > The ring looked perfect. No load issues anywhere, and no errors in the > logs. > > The site was basically down, so I got desperate and whacked a random > node in the ring. As soon as gossip saw it go down, the timeouts went > away. Thinking that was kinda crazy, I started the node back up. As > soon as it rejoined the ring, pycassa started timing out again. I then > killed another random node, far away from the first node I killed, and > the timeouts stopped again. Started it back up, and the timeouts > started again when it rejoined the ring. > > Repeated this process once more just to make sure I wasn't insane, and > the same result happened. Killing any single node, anywhere in the > ring, fixes my timeouts. > > Actively able to repro this. I am having to just keep one node down > right now so the site doesn't break. Desperate for any suggestions or > advice on this. > > Using pycassa 1.0.7. Timeout is set to 15 seconds, with 3 retries. > Reads and writes are in quorum. 27 nodes in the ring, with an RF of 3. > > Thanks, > Jason