Tom, I don't believe so; it seems the symptom would be an indefinite (or very long) hang.
To clarify, is this issue restricted to LOCAL_QUORUM? Can you issue a LOCAL_ONE SELECT and retrieve the expected data back? On Tue, Sep 8, 2015 at 12:02 PM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > Just to be sure: can this bug result in a 0-row result while it should be > > 0 ? > Op 8 sep. 2015 6:29 PM schreef "Tyler Hobbs" <ty...@datastax.com>: > > See https://issues.apache.org/jira/browse/CASSANDRA-9753 >> >> On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < >> tom.vandenbe...@gmail.com> wrote: >> >>> I've been bugging you a few times, but now I've got trace data for a >>> query with LOCAL_QUORUM that is being sent to a remove data center. >>> >>> The setup is as follows: >>> NetworkTopologyStrategy: {"DC1":"1","DC2":"2"} >>> Both DC1 and DC2 have 2 nodes. >>> In DC2, one node is currently being rebuilt, and therefore does not >>> contain all data (yet). >>> >>> The client app connects to a node in DC1, and sends a SELECT query with >>> CL LOCAL_QUORUM, which in this case means ((1/2)+1=1. >>> If all is ok, the query always produces a result, because the requested >>> rows are guaranteed to be available in DC1. >>> >>> However, the query sometimes produces no result. I've been able to >>> record the traces of these queries, and it turns out that the coordinator >>> node in DC1 sometimes sends the query to DC2, to the node that is being >>> rebuilt, and does not have the requested rows. I've included an example >>> trace below. >>> >>> The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node >>> is in DC2. >>> I've verified that the CL=LOCAL_QUORUM by printing it when the query is >>> sent (I'm using the datastax java driver). >>> >>> activity >>> | source | source_elapsed | thread >>> >>> ---------------------------------------------------------------------------+--------------+----------------+----------------------------------------- >>> Message received from / >>> 10.55.156.67 | 10.88.4.194 | 48 | >>> MessagingService-Incoming-/10.55.156.67 >>> Executing single-partition query on >>> aggregate | 10.88.4.194 | 286 | >>> SharedPool-Worker-2 >>> Acquiring sstable >>> references | 10.88.4.194 | 306 | >>> SharedPool-Worker-2 >>> Merging memtable >>> tombstones | 10.88.4.194 | 321 | >>> SharedPool-Worker-2 >>> Partition index lookup allows skipping sstable >>> 107 | 10.88.4.194 | 458 | >>> SharedPool-Worker-2 >>> Bloom filter allows skipping sstable >>> 1 | 10.88.4.194 | 489 | SharedPool-Worker-2 >>> Skipped 0/2 non-slice-intersecting sstables, included 0 due to >>> tombstones | 10.88.4.194 | 496 | >>> SharedPool-Worker-2 >>> Merging data from memtables and 0 >>> sstables | 10.88.4.194 | 500 | >>> SharedPool-Worker-2 >>> Read 0 live and 0 tombstone >>> cells | 10.88.4.194 | 513 | >>> SharedPool-Worker-2 >>> Enqueuing response to / >>> 10.55.156.67 | 10.88.4.194 | 613 | >>> SharedPool-Worker-2 >>> Sending message to / >>> 10.55.156.67 | 10.88.4.194 | 672 | >>> MessagingService-Outgoing-/10.55.156.67 >>> Parsing SELECT * FROM Aggregate WHERE type=? AND >>> typeId=?; | 10.55.156.67 | 10 | >>> SharedPool-Worker-4 >>> Sending message to / >>> 10.88.4.194 | 10.55.156.67 | 4335 | >>> MessagingService-Outgoing-/10.88.4.194 >>> Message received from / >>> 10.88.4.194 | 10.55.156.67 | 6328 | >>> MessagingService-Incoming-/10.88.4.194 >>> Seeking to partition beginning in data >>> file | 10.55.156.67 | 10417 | >>> SharedPool-Worker-3 >>> Key cache hit for sstable >>> 389 | 10.55.156.67 | 10586 | >>> SharedPool-Worker-3 >>> >>> My question is: how is it possible that the query is sent to a node in >>> DC2? >>> Since DC1 has 2 nodes and RF 1, the query should always be sent to the >>> other node in DC1 if the coordinator does not have a replica, right? >>> >>> Thanks, >>> Tom >>> >>> >>> >>> >>> >> >> >> -- >> Tyler Hobbs >> DataStax <http://datastax.com/> >> >