Hi Colin, The Kibosh code is just a README for now, is it going to be published soon?
Tim On Tue, Aug 22, 2017 at 11:44 AM, Colin McCabe <cmcc...@apache.org> wrote: > Hi all, > > I've been working on a fault injector for Apache Kafka. The general > idea is to create faults such as network partitions or disk failures, > and see what happens in the cluster. The fault injector can run as part > of a ducktape system test, or standalone. > > The fault injector has two processes: a coordinator, and an agent. The > agent process is responsible for actually implementing the faults. For > example, it might run iptables, send signals to processes, generate a > lot of load, or do something else to disrupt the computer it is running > on. We run an agent process on each node where we would like to > potentially inject faults. So it will run alongside the brokers, > zookeeper nodes, etc. > > The coordinator process is responsible for communicating with the agent > processes and for scheduling faults. For example, the coordinator can > be instructed to create a fault immediately on several nodes. Or it can > be instructed to create faults over time, based on a pseudorandom seed. > Both the coordinator and the agent expose a REST interface that accepts > objects serialized via JSON. > > I think two kinds of faults will be especially interesting: network > faults, and disk errors. Simulating network faults in a Linux > environment is relatively straightforward using iptables. Disk errors > are tougher to simulate, but I have written a FUSE filesystem to do > this. The filesystem essentially simulates a bind mount in most cases, > but it can take a JSON specification telling it to inject certain > faults. (Disk errors seem especially relevant to the ongoing work on > JBOD.) > > Although it's not a user-visible component, I think having a fault > injector will be really great for Kafka users. It will really help us > stress test Kafka in more situations. I'm going to post some patches in > a day or two-- it would be great to get some feedback. Check out > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection > > best, > Colin