Re: A Single Dropped Node Fails Entire Read Queries

Daniel Hölbling-Inzko Fri, 10 Mar 2017 02:02:17 -0800

Could there be network issues in connecting between the nodes? If node a
gets To be the query coordinator but can't reach b and c is obviously down
it won't get a quorum.


Greetings
Shalom Sagges <shal...@liveperson.com> schrieb am Fr. 10. März 2017 um
10:55:

> @Ryan, my keyspace replication settings are as follows:
> CREATE KEYSPACE mykeyspace WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2: '3', 'DC3': '3'}  AND
> durable_writes = true;
>
> CREATE TABLE mykeyspace.test (
>     column1 text,
>     column2 text,
>     column3 text,
>     PRIMARY KEY (column1, column2)
>
> The query is *select * from mykeyspace.test where column1='xxxxx';*
>
> @Daniel, the replication factor is 3. That's why I don't understand why I
> get these timeouts when only one node drops.
>
> Also, when I enabled tracing, I got the following error:
> *Unable to fetch query trace: ('Unable to complete the operation against
> any hosts', {<Host: 127.0.0.1 DC1>: Unavailable('Error from server:
> code=1000 [Unavailable exception] message="Cannot achieve consistency level
> LOCAL_QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1,
> \'consistency\': \'LOCAL_QUORUM\'}',)})*
>
> But nodetool status shows that only 1 replica was down:
> --  Address          Load       Tokens       Owns    Host ID
>                 Rack
> DN  x.x.x.235  134.32 MB  256          ?
> c0920d11-08da-4f18-a7f3-dbfb8c155b19  RAC1
> UN  x.x.x.236  134.02 MB  256          ?
> 2cc0a27b-b1e4-461f-a3d2-186d3d82ff3d  RAC1
> UN  x.x.x.237  134.34 MB  256          ?
> 5b2162aa-8803-4b54-88a9-ff2e70b3d830  RAC1
>
>
> I tried to run the same scenario on all 3 nodes, and only the 3rd node
> didn't fail the query when I dropped it. The nodes were installed and
> configured with Puppet so the configuration is the same on all 3 nodes.
>
>
> Thanks!
>
>
>
> On Fri, Mar 10, 2017 at 10:25 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
> The LOCAL_QUORUM works on the available replicas in the dc. So if your
> replication factor is 2 and you have 10 nodes you can still only loose 1.
> With a replication factor of 3 you can loose one node and still satisfy the
> query.
> Ryan Svihla <r...@foundev.pro> schrieb am Do. 9. März 2017 um 18:09:
>
> whats your keyspace replication settings and what's your query?
>
> On Thu, Mar 9, 2017 at 9:32 AM, Shalom Sagges <shal...@liveperson.com>
> wrote:
>
> Hi Cassandra Users,
>
> I hope someone could help me understand the following scenario:
>
> Version: 3.0.9
> 3 nodes per DC
> 3 DCs in the cluster.
> Consistency Local_Quorum.
>
> I did a small resiliency test and dropped a node to check the availability
> of the data.
> What I assumed would happen is nothing at all. If a node is down in a 3
> nodes DC, Local_Quorum should still be satisfied.
> However, during the ~10 first seconds after stopping the service, I got
> timeout errors (tried it both from the client and from cqlsh.
>
> This is the error I get:
> *ServerError:
> com.google.common.util.concurrent.UncheckedExecutionException:
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 4 responses.*
>
>
> After ~10 seconds, the same query is successful with no timeout errors.
> The dropped node is still down.
>
> Any idea what could cause this and how to fix it?
>
> Thanks!
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>

Re: A Single Dropped Node Fails Entire Read Queries

Reply via email to