Hello, I am currently testing the 0.8 branch (and it works quite well). We plan to not use the replication feature for now since we don't really need it, we can afford to lose data in case of unrecoverable failure from a broker.
However, we really don't want to have producers/consumers fail if a broker is down. The ideal scenario (that was working on 0.7) is that producers would just produce to available partitions and consumers would consume from available partitions. If the broker comes back online, the consumer will catch up, if not we can decide to throw away the data. Is this feasible from 0.8? right now if i kill a broker it just makes everything fail... Multiple issues will come up: - Since now the partitions are set globally and never change, the availability of a topic vary depending on where the partitions are located - We would need tools to make sure topics are spread enough and rebalance them accordingly, (using the "DDL" i heard about, i'm not sure yet about how it works, i tried editing the json strings in zk, it somehow works, and there's the reassignment admin command too) That looks rather complicated, or maybe I'm missing something? The model that was used in 0.7 looked much easier to operate (it had drawbacks, and couldn't do intra-cluster replication, but at least the availability of the cluster was much higher). Thanks in advance for any help/clues, Maxime