Hi, really cool that this discussion gets attention.
You are right my question was quite open. For me it would already be helpful to compile a list like Ben started with scenarios that can happen to a cluster and what actions/strategies you have to take to resolve the incident without loosing data and having a healthy cluster. Ideally we would add some kind of rating of hard the scenario is to be resolved so that teams can go through a kind of learning curve. For the beginning I think it would already be sufficient to document the steps how you can get a cluster into the situation which has been described in the scenario. Hope it’s a bit clearer now what I mean. Is there some kind of community space where we could start a document for this purpose? Best, Malte > On 1 Mar 2017, at 13:33, Stefan Podkowinski <s...@apache.org> wrote: > > I've been thinking about this for a while, but haven't found a practical > solution yet, although the term "fire drill" leaves a lot of room for > interpretation. The most basic requirements I'd have for these kind of > trainings would start with automated cluster provisioning for each > scenario (either for teams or individuals) and provisioning of test data > for the cluster, with optionally some kind of load generator constantly > running in the background. I started to work on some Ansible scripts > that would do that on AWS a couple of months ago, but it turned out to > be a lot of work with all the details you have to take care of. So I'd > be happy to hear about any existing resources on that as well! > > > On 01.03.2017 10:59, Malte Pickhan wrote: >> Hi Cassandra users, >> >> I am looking for some resources/guides for firedrill scenarios with apache >> cassandra. >> >> Do you know anything like that? >> >> Best, >> >> Malte >>