date:20170628

ALL range query monitors failing frequently

2017-06-28 Thread Matthew O'Riordan

We have a monitoring service that runs on all of our Cassandra nodes which performs different query types to ensure the cluster is healthy. We use different consistency levels for the queries and alert if any of them fail. All of our query types consistently succeed apart from our ALL range query

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves

You're correct in that the timeout is only driver side. The server will have its own timeouts configured in the cassandra.yaml file. I suspect either that you have a node down in your cluster (or 4), or your queries are gradually getting slower. This kind of aligns with the slow query statements i

Re: Restore Snapshot

2017-06-28 Thread kurt greaves

There are many scenarios where it can be useful, but to address what seems to be your main concern; you could simply restore and then only read at ALL until your repair completes. If you use snapshot restore with commitlog archiving you're in a better state, but granted the case you described can

How do you monitoring Cassandra Cluster?

2017-06-28 Thread Peng Xiao

Dear All, we are currently using Cassandra 2.1.13,and it has grown to 5TB size with 32 nodes in one DC. For monitoring,opsCenter does not send alarm and not free in higher version.so we have to use a simple JMX+Zabbix template.And we plan to use Jolokia+JMX2Graphite to draw the metrics chart

Re: ALL range query monitors failing frequently

2017-06-28 Thread Matthew O'Riordan

Hi Kurt Thanks for the response. Few comments in line: On Wed, Jun 28, 2017 at 1:17 PM, kurt greaves wrote: > You're correct in that the timeout is only driver side. The server will > have its own timeouts configured in the cassandra.yaml file. > Yup, OK. I suspect either that you have a node

Re: Restore Snapshot

2017-06-28 Thread Anuj Wadehra

Thanks Kurt. I think the main scenario which MUST be addressed by snapshot is Backup/Restore so that a node can be restored with minimal time and the lengthy procedure of boottsrapping with join_ring=false followed by full repair can be avoided. The plain restore snapshot + repair scenario see

Re: [IMPORTANT UPDATE]: PLEASE DO NOT UPDATE SCHEMA

2017-06-28 Thread Jeff Jirsa

If you suspect this is different than #13004, please don't keep it a secret - even if you haven't fixed it, if you can describe the steps to repro, that'd be incredibly helpful. - Jeff On 2017-06-27 13:42 (-0700), Michael Shuler wrote: > To clarify, are you talking about the same issue fixe

Re: Restore Snapshot

2017-06-28 Thread kurt greaves

Hm, I did recall seeing a ticket for this particular use case, which is certainly useful, I just didn't think it had been implemented yet. Turns out it's been in since 2.0.7, so you should be receiving writes with join_ring=false. If you confirm you aren't receiving writes then we have an issue. ht

Re: ALL range query monitors failing frequently

2017-06-28 Thread kurt greaves

I'd say that no, a range query probably isn't the best for monitoring, but it really depends on how important it is that the range you select is consistent. >From those traces it does seem that the bulk of the time spent was waiting for responses from the replicas, which may indicate a network iss

Re: How do you monitoring Cassandra Cluster?

2017-06-28 Thread Petrus Gomes

I'm using JMX+Prometheus and Grafana. JMX = https://github.com/prometheus/jmx_exporter Prometheus + Grafana = https://prometheus.io/docs/visualization/grafana/ There are some dashboard examples like that: https://grafana.com/dashboards/371 Looks good. Thanks, Petrus Silva On Wed, Jun 28, 2017 at

Re: nodetool repair failure

2017-06-28 Thread Akhil Mehra

nodetool repair has a trace option nodetool repair -tr yourkeyspacename see if that provides you with additional information. Regards, Akhil > On 28/06/2017, at 2:25 AM, Balaji Venkatesan > wrote: > > > We use Apache Cassandra 3.10-13 > > On Jun 26, 2017 8:41 PM, "Michael Shuler"

nodetool removenode causing the schema out of sync

2017-06-28 Thread Jai Bheemsen Rao Dhanwada

Hello, We are using C* version 2.1.6 and lately we are seeing an issue where, nodetool removenode causing the schema to go out of sync and causing client to fail for 2-3 minutes. C* cluster is in 8 Datacenters with RF=3 and has 50 nodes. We have 130 Keyspaces and 500 CF in the cluster. Here are

Re: nodetool removenode causing the schema out of sync

2017-06-28 Thread Jeff Jirsa

On 2017-06-28 18:51 (-0700), Jai Bheemsen Rao Dhanwada wrote: > Hello, > > We are using C* version 2.1.6 and lately we are seeing an issue where, > nodetool removenode causing the schema to go out of sync and causing client > to fail for 2-3 minutes. > > C* cluster is in 8 Datacenters with R

ALL range query monitors failing frequently

Re: ALL range query monitors failing frequently

Re: Restore Snapshot

How do you monitoring Cassandra Cluster?

Re: ALL range query monitors failing frequently

Re: Restore Snapshot

Re: [IMPORTANT UPDATE]: PLEASE DO NOT UPDATE SCHEMA

Re: Restore Snapshot

Re: ALL range query monitors failing frequently

Re: How do you monitoring Cassandra Cluster?

Re: nodetool repair failure

nodetool removenode causing the schema out of sync

Re: nodetool removenode causing the schema out of sync

13 matches

Site Navigation

Mail list logo

Footer information