Also once you've got your phi_convict_threshold sorted, if you see these again check:
http://status.aws.amazon.com/ AWS does occasionally have the odd increased latency issue / outage. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 19/05/2014, at 1:15 PM, Nate McCall <n...@thelastpickle.com> wrote: > It's a good idea to increase phi_convict_threshold to at least 12 on EC2. > Using placement groups and single-tenant systems will certainly help. > > Another optimization would be dedicating an Enhanced Network Interface > (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) > specifically for gossip traffic. > > > On Mon, May 19, 2014 at 1:36 PM, Phil Burress <philburress...@gmail.com> > wrote: > Has anyone experienced network i/o issues with ec2? We are seeing a lot of > these in our logs: > > HintedHandOffManager.java (line 477) Timed out replaying hints to > /10.0.x.xxx; aborting (15 delivered) > > and these... > > Cannot handshake version with /10.0.x.xxx > > and these... > > java.io.IOException: Cannot proceed on repair because a neighbor > (/10.0.x.xxx) is dead: session failed > > Occurs on all of our nodes. Even though in all cases, the host that is being > reported as down or unavailable is up and readily 'pingable'. > > We are using shared tenancy on all our nodes (instance type m1.xlarge) with > cassandra 2.0.7. Any suggestions on how to debug these errors? > > Is there a recommendation to move to Placement Groups for Cassandra? > > Thanks! > > Phil > > > > -- > ----------------- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com