We have a monitoring service that runs on all of our Cassandra nodes which
performs different query types to ensure the cluster is healthy. We use
different consistency levels for the queries and alert if any of them
fail. All of our query types consistently succeed apart from our ALL range
query
You're correct in that the timeout is only driver side. The server will
have its own timeouts configured in the cassandra.yaml file.
I suspect either that you have a node down in your cluster (or 4), or your
queries are gradually getting slower. This kind of aligns with the slow
query statements i
There are many scenarios where it can be useful, but to address what seems
to be your main concern; you could simply restore and then only read at ALL
until your repair completes.
If you use snapshot restore with commitlog archiving you're in a better
state, but granted the case you described can
Dear All,
we are currently using Cassandra 2.1.13,and it has grown to 5TB size with 32
nodes in one DC.
For monitoring,opsCenter does not send alarm and not free in higher version.so
we have to use a simple JMX+Zabbix template.And we plan to use
Jolokia+JMX2Graphite to draw the metrics chart
Hi Kurt
Thanks for the response. Few comments in line:
On Wed, Jun 28, 2017 at 1:17 PM, kurt greaves wrote:
> You're correct in that the timeout is only driver side. The server will
> have its own timeouts configured in the cassandra.yaml file.
>
Yup, OK.
I suspect either that you have a node
Thanks Kurt.
I think the main scenario which MUST be addressed by snapshot is Backup/Restore
so that a node can be restored with minimal time and the lengthy procedure of
boottsrapping with join_ring=false followed by full repair can be avoided. The
plain restore snapshot + repair scenario see
If you suspect this is different than #13004, please don't keep it a secret -
even if you haven't fixed it, if you can describe the steps to repro, that'd be
incredibly helpful.
- Jeff
On 2017-06-27 13:42 (-0700), Michael Shuler wrote:
> To clarify, are you talking about the same issue fixe
Hm, I did recall seeing a ticket for this particular use case, which is
certainly useful, I just didn't think it had been implemented yet. Turns
out it's been in since 2.0.7, so you should be receiving writes with
join_ring=false. If you confirm you aren't receiving writes then we have an
issue. ht
I'd say that no, a range query probably isn't the best for monitoring, but
it really depends on how important it is that the range you select is
consistent.
>From those traces it does seem that the bulk of the time spent was waiting
for responses from the replicas, which may indicate a network iss
I'm using JMX+Prometheus and Grafana.
JMX = https://github.com/prometheus/jmx_exporter
Prometheus + Grafana = https://prometheus.io/docs/visualization/grafana/
There are some dashboard examples like that:
https://grafana.com/dashboards/371
Looks good.
Thanks,
Petrus Silva
On Wed, Jun 28, 2017 at
nodetool repair has a trace option
nodetool repair -tr yourkeyspacename
see if that provides you with additional information.
Regards,
Akhil
> On 28/06/2017, at 2:25 AM, Balaji Venkatesan
> wrote:
>
>
> We use Apache Cassandra 3.10-13
>
> On Jun 26, 2017 8:41 PM, "Michael Shuler"
Hello,
We are using C* version 2.1.6 and lately we are seeing an issue where,
nodetool removenode causing the schema to go out of sync and causing client
to fail for 2-3 minutes.
C* cluster is in 8 Datacenters with RF=3 and has 50 nodes.
We have 130 Keyspaces and 500 CF in the cluster.
Here are
On 2017-06-28 18:51 (-0700), Jai Bheemsen Rao Dhanwada
wrote:
> Hello,
>
> We are using C* version 2.1.6 and lately we are seeing an issue where,
> nodetool removenode causing the schema to go out of sync and causing client
> to fail for 2-3 minutes.
>
> C* cluster is in 8 Datacenters with R
13 matches
Mail list logo