Hi all,

we are experiencing a strange behavior when we are trying to bootstrap a
new node. The problem is that the Recent Write Latency goes to 2s on all
the other Cassandra nodes (which are receiving user traffic), which
corresponds to our setting of "write_request_timeout_in_ms: 2000".

We use Cassandra 2.0.10 and trying to convert to vnodes and increase a
replication factor. So we are adding a new node in new DC (marked as
DCXA) as the only node in new DC with replication factor 3. The reason
for higher RF is that we will be converting another 2 existing servers
to new DC (vnodes) and we want them to get all the data.

The replication settings look like this:
ALTER KEYSPACE slw WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC4': '1',
  'DC5': '1',
  'DC2': '1',
  'DC3': '1',
  'DC0': '1',
  'DC1': '1',
  'DC0A': '3',
  'DC1A': '3',
  'DC2A': '3',
  'DC3A': '3',
  'DC4A': '3',
  'DC5A': '3'
};

We were adding the nodes to DC0A->DC4A without any effects on existing
nodes (DCX without A). When we are trying to add DC5A, the abovemention
problem happens, 100% reproducibly.

I tried to increase number of concurrent_writers from 32 to 128 on the
old nodes, also tried to increase number of flush writers, both  with no
effect. The strange thing is that the load, CPU usage, GC, network
throughput - everything is fine on the old nodes which are reporting 2s
of write latency. Nodetool tpstats does not show any blocked/pending
operations.

I think I must be hitting some limit (because of overall of replicas?)
somewhere.

Any input would be greatly appreciated.

Thanks
Jirka H.

Reply via email to