[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733244#comment-14733244
 ] 

Gwen Shapira commented on KAFKA-2510:
-------------------------------------

No, the issue I'm trying to prevent is definitely not resolved by controlled 
shutdown.

Here's the scenario (I realize it sounds a bit contrived, but I've seen it 
happen twice):
* Shut down an entire Kafka cluster for maintenance (Kafka upgrade, OS upgrade, 
hardware upgrade, whatever)
* Sysadmin deploys a configuration change via automated tool. The tool replaces 
all the configuration on the machine, including server.properties.
* Unfortunately, the new server.properties has a typo in the logs.dir 
parameter, pointing to the wrong location.
* Bring up the cluster. Everything looks normal for a while, but all historical 
data is gone. By the time you realize what went wrong, you face the choice of 
either getting your old data back and losing the last few hours / days of new 
data, or saying goodbye to your history.

It is obviously the sysadmin fault for misconfiguring, but most other 
datastores would refuse to start under similar scenarios (i.e they have 
multiple sources of truth regarding the existing data and will not start under 
mismatches). It looks like we have the ability to make Kafka safer for our 
users, and I don't see a reason not to do so.

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-2510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2510
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to