Read queries on a secondary index are somehow causing an excessively high
CPU load on all nodes in my DC.

The table has some 60K records, and the cardinality of the index is very
low (~10 distinct values). The returned result set typically contains
10-30K records.
The same queries on nodes in another DC are working fine. The nodes with
the high CPU are in a newly set up DC (see my previous message below). The
hardware in both DCs is the same, as well as the C* version (2.1.6). The
only difference in the C* setup is that the new DC is using vnodes (256),
while the old DC is not. Both DCs have 4 nodes, and RF=2.

I've rebuilt the index, but that didn't help.

It looks a bit like CASSANDRA-8530
<https://issues.apache.org/jira/browse/CASSANDRA-8530> (unresolved).

What really surprised me is that executing a single query on this secondary
index makes the "Local read count" in the cfstats for the index go up with
almost 200000! When doing the same query on one of my "good" nodes, it only
increases with a small number, as I would expect.

Could it be that the use of vnodes is causing these problems?

Regards,
Tom



On Mon, Sep 14, 2015 at 8:09 PM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> I have a DC of 4 nodes that must be expanded to accommodate an expected
> growth in data. Since the DC is not using vnodes, we have decided to set up
> a new DC with vnodes enabled, start using the new DC, and decommission the
> old DC.
>
> Both DCs have 4 nodes. The idea is to add additional nodes to the new DC
> later on.
> The servers in both DCs are very similar: quad-core machines with 8GB.
>
> We have bootstrapped/rebuilt the nodes in the new DC. When that finished,
> the nodes in the new DC were showing little CPU activity, as you would
> expect, because they are receiving writes from the other DC. So far, so
> good.
>
> Then we switched the clients from the old DC to the new DC. The CPU load
> on all nodes in the new DC immediately rose to excessively high levels (15
> - 25), which made the servers effectively unavailable. The load did not
> drop structurally within 20 minutes, so we had to switch the clients back
> to the old DC. Then the load dropped again.
>
> What can be the reason for the high CPU loads on the new nodes?
>
> Performance test shows that the servers in the new DC perform slightly
> better (both IO and CPU) than the servers in the old DC.
> I did not see anything abnormal in the Cassandra logs, like garbage
> collection warnings. I also did not see any strange things in the tpstats.
> The only difference I'm aware of between the old and new DC is the use of
> vnodes.
>
> Any help is appreciated!
> Thanks,
> Tom
>

Reply via email to