I may be mistaken on the exact configuration option for the timeout you're hitting, but I believe this may be the general `request_timeout_in_ms: 10000` in conf/cassandra.yaml.
A reasonable timeout for a "node down" discovery/processing is needed to prevent random flapping of nodes with a super short timeout interval. Applications should also retry on a host unavailable exception like this, because in the long run, this should be expected from time to time for network partitions, node failure, maintenance cycles, etc. -- Kind regards, Michael On 03/10/2017 04:07 AM, Shalom Sagges wrote: > Hi daniel, > > I don't think that's a network issue, because ~10 seconds after the node > stopped, the queries were successful again without any timeout issues. > > Thanks! > > > Shalom Sagges > DBA > T: +972-74-700-4035 > <http://www.linkedin.com/company/164748> > <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> > > We Create Meaningful Connections > > <https://liveperson.docsend.com/view/8iiswfp> > > > > On Fri, Mar 10, 2017 at 12:01 PM, Daniel Hölbling-Inzko > <daniel.hoelbling-in...@bitmovin.com > <mailto:daniel.hoelbling-in...@bitmovin.com>> wrote: > > Could there be network issues in connecting between the nodes? If > node a gets To be the query coordinator but can't reach b and c is > obviously down it won't get a quorum. > > Greetings > > Shalom Sagges <shal...@liveperson.com > <mailto:shal...@liveperson.com>> schrieb am Fr. 10. März 2017 um 10:55: > > @Ryan, my keyspace replication settings are as follows: > CREATE KEYSPACE mykeyspace WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '3', 'DC2: '3', 'DC3': '3'} > AND durable_writes = true; > > CREATE TABLE mykeyspace.test ( > column1 text, > column2 text, > column3 text, > PRIMARY KEY (column1, column2) > > The query is */select * from mykeyspace.test where > column1='xxxxx';/* > > @Daniel, the replication factor is 3. That's why I don't > understand why I get these timeouts when only one node drops. > > Also, when I enabled tracing, I got the following error: > *Unable to fetch query trace: ('Unable to complete the operation > against any hosts', {<Host: 127.0.0.1 DC1>: Unavailable('Error > from server: code=1000 [Unavailable exception] message="Cannot > achieve consistency level LOCAL_QUORUM" > info={\'required_replicas\': 2, \'alive_replicas\': 1, > \'consistency\': \'LOCAL_QUORUM\'}',)})* > > But nodetool status shows that only 1 replica was down: > -- Address Load Tokens Owns Host ID > Rack > DN x.x.x.235 134.32 MB 256 ? > c0920d11-08da-4f18-a7f3-dbfb8c155b19 RAC1 > UN x.x.x.236 134.02 MB 256 ? > 2cc0a27b-b1e4-461f-a3d2-186d3d82ff3d RAC1 > UN x.x.x.237 134.34 MB 256 ? > 5b2162aa-8803-4b54-88a9-ff2e70b3d830 RAC1 > > > I tried to run the same scenario on all 3 nodes, and only the > 3rd node didn't fail the query when I dropped it. The nodes were > installed and configured with Puppet so the configuration is the > same on all 3 nodes. > > > Thanks! > > > > On Fri, Mar 10, 2017 at 10:25 AM, Daniel Hölbling-Inzko > <daniel.hoelbling-in...@bitmovin.com > <mailto:daniel.hoelbling-in...@bitmovin.com>> wrote: > > The LOCAL_QUORUM works on the available replicas in the dc. > So if your replication factor is 2 and you have 10 nodes you > can still only loose 1. With a replication factor of 3 you > can loose one node and still satisfy the query. > Ryan Svihla <r...@foundev.pro <mailto:r...@foundev.pro>> schrieb > am Do. 9. März 2017 um 18:09: > > whats your keyspace replication settings and what's your > query? > > On Thu, Mar 9, 2017 at 9:32 AM, Shalom Sagges > <shal...@liveperson.com <mailto:shal...@liveperson.com>> > wrote: > > Hi Cassandra Users, > > I hope someone could help me understand the > following scenario: > > Version: 3.0.9 > 3 nodes per DC > 3 DCs in the cluster. > Consistency Local_Quorum. > > I did a small resiliency test and dropped a node to > check the availability of the data. > What I assumed would happen is nothing at all. If a > node is down in a 3 nodes DC, Local_Quorum should > still be satisfied. > However, during the ~10 first seconds after stopping > the service, I got timeout errors (tried it both > from the client and from cqlsh. > > This is the error I get: > */ServerError: > > com.google.common.util.concurrent.UncheckedExecutionException: > > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: > Operation > timed out - received only 4 responses./* > > > After ~10 seconds, the same query is successful with > no timeout errors. The dropped node is still down. > > Any idea what could cause this and how to fix it? > > Thanks! > > > This message may contain confidential and/or > privileged information. > If you are not the addressee or authorized to > receive this on behalf of the addressee you must not > use, copy, disclose or take action based on this > message or any information herein. > If you have received this message in error, please > advise the sender immediately by reply email and > delete this message. Thank you. > > > > > -- > > Thanks, > > Ryan Svihla > > > > This message may contain confidential and/or privileged > information. > If you are not the addressee or authorized to receive this on > behalf of the addressee you must not use, copy, disclose or take > action based on this message or any information herein. > If you have received this message in error, please advise the > sender immediately by reply email and delete this message. Thank > you. > > > > This message may contain confidential and/or privileged information. > If you are not the addressee or authorized to receive this on behalf of > the addressee you must not use, copy, disclose or take action based on > this message or any information herein. > If you have received this message in error, please advise the sender > immediately by reply email and delete this message. Thank you.