Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8
years, and have worked on hundreds of Cassandra deployments. One of the
things I’ve noticed in myself and a lot of my peers that have done
consulting, support or worked on really big deployments is that we get
burnt out. We fight a lot of the same fires over and over again, and don’t
get to work on new or interesting stuff. Also, what we do is really hard to
transfer to other people because it’s based on experience.

Over the past year my team and I have been working to overcome that gap,
creating an assistant that’s able to scale some of this knowledge. We’ve
got it to the point where it’s able to classify known root causes for an
outage or an SLA breach in Cassandra with an accuracy greater than 90%. It
can accurately diagnose bugs, data-modeling issues, or misuse of certain
features and when it does give you specific remediation steps with links to
knowledge base articles.

We think we’ve seeded our database with enough root causes that it’ll catch
the vast majority of issues but there is always the possibility that we’ll
run into something previously unknown like CASSANDRA-11170 (one of the
issues our system found in the wild).

We’re looking for feedback and would like to know if anyone is interested
in giving the product a trial. The process would be a collaboration, where
we both get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump

Reply via email to