Hi Dong, Thanks for the explanation. Comments inline.
On Fri, Aug 4, 2017 at 6:47 PM, Dong Lin <lindon...@gmail.com> wrote: > 1. Yes it has been considered. Here are the reasons why we don't do it > through controller. > > - There can be use-cases where we only want to rebalance the load of log > directories on a given broker. It seems unnecessary to go through > controller in this case. > Even though this is true, not sure how common it will be. - If controller is responsible for sending ChangeReplicaDirRequest, and if > the user-specified log directory is either invalid or offline, then > controller probably needs a way to tell user that the partition > reassignment has failed. We currently don't have a way to do this since > kafka-reassign-partition.sh simply creates the reassignment znode without > waiting for response. I am not sure that is a good solution to this. > Since the JSON is provided by the user, we would ideally validate its contents before storing it. Why are the log directories different than the other information in the JSON? - If controller is responsible for sending ChangeReplicaDirRequest, the > controller logic would be more complicated because controller needs to > first send ChangeReplicaRequest so that the broker memorize the partition > -> log directory mapping, send LeaderAndIsrRequest, and keep sending > ChangeReplicaDirRequest (just in case broker restarted) until replica is > created. Note that the last step needs repeat and timeout as the proposed > in the KIP-113. > > Overall I think this adds quite a bit complexity to controller and we > probably want to do this only if there is strong clear of doing so. > Currently in KIP-113 the kafka-reassign-partitions.sh is responsible for > sending ChangeReplicaDirRequest with repeat and provides error to user if > it either fails or timeout. It seems to be much simpler and user shouldn't > care whether it is done through controller. > If I understand correctly, the logic is the same in both cases, it's just a question of where it lives. The advantage of placing it in the Controller is that the whole reassignment logic is in one place (instead of split between the tool and the Controller). That seems easier to reason about. Also, how do you think things would work in the context of KIP-179? Would the tool still invoke these requests or would it be done by the broker receiving the alterTopics/reassignPartitions protocol call? And thanks for the suggestion. I will add this to the Rejected Alternative > Section in the KIP-113. > > 2) I think user needs to be able to specify different log directories for > the replicas of the same partition in order to rebalance load across log > directories of all brokers. I am not sure I understand the question. Can > you explain a bit more why "that the log directory has to be the same for > all replicas of a given partition"? I think I misunderstood the schema. The KIP has the following example: "partitions" : [ { "topic" : str, "partition" : int, "replicas" : [int], "log_dirs" : [str] <-- NEW. A log directory can be either "any", or a valid absolute path that begins with '/'. This is an optional filed. It is treated as an array of "any" if this field is not explicitly specified in the json file. }, ... ] Is it right that `log_dirs` is an array in the same order as `replicas`? That's a bit obscure and we should document it more clearly. Did we discard the option of a more readable schema (i.e. a JSON object mapping a replica id to a log dir) due to efficiency concerns? It would be good to include that in the KIP. Thanks, Ismael