Re: pycassa timeouts resolved by killing a random node in the ring

aaron morton Tue, 12 Apr 2011 19:25:26 -0700

First, lets check if the timeouts are client or server side. 
 
What was the timeout error stack ? 
Were they (python/thrift) socket timeouts or TimedOutException's raised by the 
cassandra thrift code.


Is it across all requests / clients or say just read?
Have you tried asking on http://groups.google.com/group/pycassa-discuss ?

Hope that helps. 
Aaron

On 13 Apr 2011, at 04:30, Jason Harvey wrote:

> Interesting issue this morning.
> 
> My apps started throwing a bunch of pycassa timeouts all of a sudden.
> The ring looked perfect. No load issues anywhere, and no errors in the
> logs.
> 
> The site was basically down, so I got desperate and whacked a random
> node in the ring. As soon as gossip saw it go down, the timeouts went
> away. Thinking that was kinda crazy, I started the node back up. As
> soon as it rejoined the ring, pycassa started timing out again. I then
> killed another random node, far away from the first node I killed, and
> the timeouts stopped again. Started it back up, and the timeouts
> started again when it rejoined the ring.
> 
> Repeated this process once more just to make sure I wasn't insane, and
> the same result happened. Killing any single node, anywhere in the
> ring, fixes my timeouts.
> 
> Actively able to repro this. I am having to just keep one node down
> right now so the site doesn't break. Desperate for any suggestions or
> advice on this.
> 
> Using pycassa 1.0.7. Timeout is set to 15 seconds, with 3 retries.
> Reads and writes are in quorum. 27 nodes in the ring, with an RF of 3.
> 
> Thanks,
> Jason

Re: pycassa timeouts resolved by killing a random node in the ring

Reply via email to