On Thu, Dec 4, 2014 at 7:19 AM, Robert Wille <rwi...@fold3.com> wrote:
> Does anybody have any idea what might cause this? That it happens at all > is bizarre, and that it happens on only three nodes is even more bizarre. > Also, it really doesn’t seem to have difficulty creating snapshots, so the > snapshot failure creation errors are quite a mystery. > I conjecture the large number of snapshots relate to some automated repair process accidentally repeatedly running repair? Repair has been modified to use serial repair by default since early 2.0. In order to do serial repair, it creates a snapshot. https://issues.apache.org/jira/browse/CASSANDRA-5950 Is the ticket in which the Cassandra team (IMO) unreasonably and without justification changes this default, resulting in lots of operators experiencing suddenly dramatically different behavior on a minor point release. If you, as an operator of Cassandra in production, don't like these kind of surprise major changes to defaults in a minor version without any justification, your input is welcome on that JIRA, or on this one : https://issues.apache.org/jira/browse/CASSANDRA-8177 The snapshotting is broken throughout 2.x, FWIW, and over-snapshots and over-repairs as a result. https://issues.apache.org/jira/browse/CASSANDRA-7024 And while we’re talking repairs, I have some questions about monitoring > them. Even when not running an explicit repair, I randomly see repair tasks > in OpsCenter. They usually only last a few seconds, and the progress > percentage often goes into the quadruple digits. When I run repair using > nodetool, it takes several hours, but again, all I ever see in OpsCenter > are these random, short-lived repair tasks. Is there any way to monitor > repairs? I frequently see posts about stalled repairs. How do you know a > repair has stalled when you can’t see it? And, how do you know if a repair > actually succeeded or not? > I have no idea why OpsCenter would spawn random repair tasks. https://issues.apache.org/jira/browse/CASSANDRA-5483 Is the work for improved tracing of repair sessions. =Rob