We integrate ganglia On Mon, Jun 28, 2010 at 1:53 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> short version: > > if o.a.c.concurrent.{ROW-READ-STAGE,ROW-MUTATION-STAGE} and > o.a.c.db.CompactionManager have > > - completed task count increasing > - pending tasks stable (for RRS and RMS, stable in low hundreds or > less, for CM stable in single digits or less) > - the log isn't spitting out Error lines > > then the node is completing requests and keeping up with demand reasonably > well. > > On Tue, Jun 22, 2010 at 3:41 PM, Andrew Psaltis > <andrew.psal...@webtrends.com> wrote: > > All, > > We have been working through some operations scenarios, so that we are > ready to deploy our first Cassandra cluster into production in the coming > months. During this process our operations folks have asked us to provide a > Health Check service. I am using the word service here very liberally - > really we just need to provide a way for the folks in out NOC to know that > not only is the Cassandra process running (which they will get with their > monitoring tools ), but that it is actually alive and well. We do not have > the intent of verifying that the data is valid, just that every node in the > cluster that is known to be running is actually alive and healthy. My > questions are - What does it mean for a Cassandra node to be healthy? What > is the minimum (from an impact to the performance of a node) things we can > check to make sure that a node is not a zombie? > > > > Any and all input is greatly appreciated. > > > > Thanks, > > Andrew > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >