On Thu, Dec 4, 2014 at 7:19 AM, Robert Wille <rwi...@fold3.com> wrote:

> Does anybody have any idea what might cause this? That it happens at all
> is bizarre, and that it happens on only three nodes is even more bizarre.
> Also, it really doesn’t seem to have difficulty creating snapshots, so the
> snapshot failure creation errors are quite a mystery.
>

I conjecture the large number of snapshots relate to some automated repair
process accidentally repeatedly running repair?

Repair has been modified to use serial repair by default since early 2.0.
In order to do serial repair, it creates a snapshot.

https://issues.apache.org/jira/browse/CASSANDRA-5950

Is the ticket in which the Cassandra team (IMO) unreasonably and without
justification changes this default, resulting in lots of operators
experiencing suddenly dramatically different behavior on a minor point
release.

If you, as an operator of Cassandra in production, don't like these kind of
surprise major changes to defaults in a minor version without any
justification, your input is welcome on that JIRA, or on this one :

https://issues.apache.org/jira/browse/CASSANDRA-8177

The snapshotting is broken throughout 2.x, FWIW, and over-snapshots and
over-repairs as a result.

https://issues.apache.org/jira/browse/CASSANDRA-7024

And while we’re talking repairs, I have some questions about monitoring
> them. Even when not running an explicit repair, I randomly see repair tasks
> in OpsCenter. They usually only last a few seconds, and the progress
> percentage often goes into the quadruple digits. When I run repair using
> nodetool, it takes several hours, but again, all I ever see in OpsCenter
> are these random, short-lived repair tasks. Is there any way to monitor
> repairs? I frequently see posts about stalled repairs. How do you know a
> repair has stalled when you can’t see it? And, how do you know if a repair
> actually succeeded or not?
>

I have no idea why OpsCenter would spawn random repair tasks.

https://issues.apache.org/jira/browse/CASSANDRA-5483

Is the work for improved tracing of repair sessions.

=Rob

Reply via email to