Hmmm...I am using: endpoint_snitch: com.datastax.bdp.snitch.DseDelegateSnitch
which is using: delegated_snitch: org.apache.cassandra.locator.PropertyFileSnitch (for this specific test cluster) I did not check the code - is this snitch on by default and, maybe, used as wrapper for configured endpoint_snitch? It would explain the difference in the inter-DC traffic for sure. Also it would not affect the local DC traffic as all nodes are replicas for the data anyway. On Thu, Nov 20, 2014 at 12:03 PM, Tyler Hobbs <ty...@datastax.com> wrote: > The difference is likely due to the DynamicEndpointSnitch (aka dynamic > snitch), which picks replicas to send messages to based on recently > observed latency and self-reported load (accounting for compactions, > repair, etc). If you want to confirm this, you can disable the dynamic > snitch by adding this line to cassandra.yaml: "dynamic_snitch: false". > > On Thu, Nov 20, 2014 at 9:52 AM, Nikolai Grigoriev <ngrigor...@gmail.com> > wrote: > >> Hi, >> >> There is something odd I have observed when testing a configuration with >> two DC for the first time. I wanted to do a simple functional test to prove >> myself (and my pessimistic colleagues ;) ) that it works. >> >> I have a test cluster of 6 nodes, 3 in each DC, and a keyspace that is >> replicated as follows: >> >> CREATE KEYSPACE xxxxxxx WITH replication = { >> >> 'class': 'NetworkTopologyStrategy', >> >> 'DC2': '3', >> >> 'DC1': '3' >> >> }; >> >> >> I have disabled the traffic compression between DCs to get more accurate >> numbers. >> >> I have set up a bunch of IP accounting rules on each node so they count >> the outgoing traffic from this node to each other node. I had rules for >> different ports but, of course, but it is mostly about port 7000 (or 7001) >> when talking about inter-node traffic. Anyway, I have a table that shows >> the traffic from any node to any node's port 7000. >> >> I have ran a test with DCAwareRoundRobinPolicy and the client talking >> only to DC1 nodes. Everything looks fine - the client has sent identical >> amount of data to each of 3 nodes in DC1. These nodes inside of DC1 (I was >> writing with LOCAL_ONE consistency) have sent similar amount of data to >> each other that represents exactly two extra replicas. >> >> However, when I look at the traffic from the nodes in DC1 to the nodes in >> DC1 the picture is different: >> >> 10.3.45.156 >> >> 10.3.45.159 >> >> dpt:7000 >> >> 117,273,075 >> >> 10.3.45.156 >> >> 10.3.45.160 >> >> dpt:7000 >> >> 228,326,091 >> >> 10.3.45.156 >> >> 10.3.45.161 >> >> dpt:7000 >> >> 46,924,339 >> >> 10.3.45.157 >> >> 10.3.45.159 >> >> dpt:7000 >> >> 118,978,269 >> >> 10.3.45.157 >> >> 10.3.45.160 >> >> dpt:7000 >> >> 230,444,929 >> >> 10.3.45.157 >> >> 10.3.45.161 >> >> dpt:7000 >> >> 47,394,179 >> >> 10.3.45.158 >> >> 10.3.45.159 >> >> dpt:7000 >> >> 113,969,248 >> >> 10.3.45.158 >> >> 10.3.45.160 >> >> dpt:7000 >> >> 225,844,838 >> >> 10.3.45.158 >> >> 10.3.45.161 >> >> dpt:7000 >> >> 46,338,939 >> >> Nodes 10.3.45.156-158 are in DC1, .159-161 - in DC2. As you can see, each >> of nodes in DC1 has sent different amount of traffic to the remote nodes: >> 117Mb, 228Mb and 46Mb respectively. Both DC have one rack. >> >> So, here is my question. How does node select the node in remote DC to >> send the message to? I did a quick sweep through the code and I could only >> find the sorting by proximity (checking the rack and DC). So, considering >> that for each request I fire the targets are all 3 nodes in the remote DC, >> the list will contain all 3 nodes in DC2. And, if I understood correctly, >> the first node from the list is picked to send the message. >> >> So, it seems to me that there is no any kind of round-robin-type logic is >> applied when selecting the target node to forward the write to from the >> list of targets in remote DC. >> >> If this is true (and the numbers kind of show it is, right?), then >> probably the list with equal proximity should be shuffled randomly? Or, >> instead of picking the first target, a random one should be picked? >> >> >> -- >> Nikolai Grigoriev >> >> > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> > -- Nikolai Grigoriev (514) 772-5178