Trace evidence for LOCAL_QUORUM ending up in remote DC

Tom van den Berge Tue, 08 Sep 2015 08:23:29 -0700

I've been bugging you a few times, but now I've got trace data for a query
with LOCAL_QUORUM that is being sent to a remove data center.


The setup is as follows:
NetworkTopologyStrategy: {"DC1":"1","DC2":"2"}
Both DC1 and DC2 have 2 nodes.
In DC2, one node is currently being rebuilt, and therefore does not contain
all data (yet).

The client app connects to a node in DC1, and sends a SELECT query with CL
LOCAL_QUORUM, which in this case means ((1/2)+1=1.
If all is ok, the query always produces a result, because the requested
rows are guaranteed to be available in DC1.

However, the query sometimes produces no result. I've been able to record
the traces of these queries, and it turns out that the coordinator node in
DC1 sometimes sends the query to DC2, to the node that is being rebuilt,
and does not have the requested rows. I've included an example trace below.

The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node
is in DC2.
I've verified that the  CL=LOCAL_QUORUM by printing it when the query is
sent (I'm using the datastax java driver).

 activity
 | source       | source_elapsed | thread
---------------------------------------------------------------------------+--------------+----------------+-----------------------------------------
                                       Message received from /10.55.156.67
|  10.88.4.194 |             48 | MessagingService-Incoming-/10.55.156.67
                             Executing single-partition query on aggregate
|  10.88.4.194 |            286 |                     SharedPool-Worker-2
                                              Acquiring sstable references
|  10.88.4.194 |            306 |                     SharedPool-Worker-2
                                               Merging memtable tombstones
|  10.88.4.194 |            321 |                     SharedPool-Worker-2
                        Partition index lookup allows skipping sstable 107
|  10.88.4.194 |            458 |                     SharedPool-Worker-2
                                    Bloom filter allows skipping sstable 1
|  10.88.4.194 |            489 |                     SharedPool-Worker-2
 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones
|  10.88.4.194 |            496 |                     SharedPool-Worker-2
                                Merging data from memtables and 0 sstables
|  10.88.4.194 |            500 |                     SharedPool-Worker-2
                                         Read 0 live and 0 tombstone cells
|  10.88.4.194 |            513 |                     SharedPool-Worker-2
                                       Enqueuing response to /10.55.156.67
|  10.88.4.194 |            613 |                     SharedPool-Worker-2
                                          Sending message to /10.55.156.67
|  10.88.4.194 |            672 | MessagingService-Outgoing-/10.55.156.67
                Parsing SELECT * FROM Aggregate WHERE type=? AND typeId=?;
| 10.55.156.67 |             10 |                     SharedPool-Worker-4
                                           Sending message to /10.88.4.194
| 10.55.156.67 |           4335 |  MessagingService-Outgoing-/10.88.4.194
                                        Message received from /10.88.4.194
| 10.55.156.67 |           6328 |  MessagingService-Incoming-/10.88.4.194
                               Seeking to partition beginning in data file
| 10.55.156.67 |          10417 |                     SharedPool-Worker-3
                                             Key cache hit for sstable 389
| 10.55.156.67 |          10586 |                     SharedPool-Worker-3

My question is: how is it possible that the query is sent to a node in DC2?
Since DC1 has 2 nodes and RF 1, the query should always be sent to the
other node in DC1 if the coordinator does not have a replica, right?

Thanks,
Tom

Trace evidence for LOCAL_QUORUM ending up in remote DC

Reply via email to