Hi Jun, The inter-broker movement case has two subcases:
1. Where no log dir is supplied. This corresponds to the existing kafka-reassign-partitions script. This just needs the appropriate JSON to be written to the reassignment znode. 2. Where the log dir is supplied. This is covered in KIP-113 (in addition to the intra-broker case) and that KIP defines an algorithm where an initial AlterReplicaDirRequests is sent to each receiving broker, then the znode gets updated, then there are further AlterReplicaDirRequests. In the first case, the JSON lacks any log dir information. In the second case the JSON includes log dir information. I'm suggesting that a single PartitionReassignmentRequest class could be used to represent (and be convertible to) both kinds of JSON. (In fact the one JSON schema is a subset of the other). So PartitionReassignmentRequest would indeed only be necessary for inter-broker movement, but it would be necessary in both the with- and without log dir cases of that. While I could have a PartitionReassignmentRequest that only dealt with inter-broker-without-log-dirs data movement, that wouldn't be enough to address the needs of KIP-179, because the inter-broker-with-log-dirs case still needs to update the znode, and KIP-179 is all about the script/command not talking to Zookeeper any more. Does that make sense to you? Cheers, Tom On 11 August 2017 at 16:22, Jun Rao <j...@confluent.io> wrote: > Hi, Tom, > > One approach is to have a PartitionReassignmentRequest that only deals with > inter broker data movement (i.e, w/o any log dirs in the request). The > request is directed to any broker, which then just writes the reassignment > json to ZK. There is a separate AlterReplicaDirRequest that only deals with > intra broker data movement (i.e., with the log dirs in the request). This > request is directed to the specific broker who replicas need to moved btw > log dirs. This seems to be what's in your original proposal in KIP-179, > which I think makes sense. > > In your early email, I thought you were proposing to have > PartitionReassignmentRequest > dealing with both inter and intra broker data movement (i.e., include log > dirs in the request). Then, I am not sure how this request will be > processed on the broker. So, you were not proposing that? > > Thanks, > > Jun > > On Fri, Aug 11, 2017 at 5:37 AM, Tom Bentley <t.j.bent...@gmail.com> > wrote: > > > Hi Jun and Dong, > > > > Thanks for your replies... > > > > On 10 August 2017 at 20:43, Dong Lin <lindon...@gmail.com> wrote: > > > > > This is a very good idea. I have updated the KIP-113 so that > > > DescribeDirResponse returns lag instead of LEO. > > > > > > Excellent! > > > > On Thu, Aug 10, 2017 at 10:21 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > 2. Tom, note that currently, the LeaderAndIsrRequest doesn't specify > > the > > > > log dir. So, I am not sure in your new proposal, how the log dir info > > is > > > > communicated to all brokers. Is the broker receiving the > > > > ReassignPartitionsRequest > > > > going to forward that to all brokers? > > > > > > > My understanding of KIP-113 is that each broker has its own set of log > dirs > > (even though in practice they might all have the same names, and might > all > > be distributed across the brokers disks in the same way, and all those > > disks might be identical), so it doesn't make sense for one broker to be > > told about the log dirs of another broker. > > > > Furthermore, it is the AlterReplicaDirRequest that is sent to the > receiving > > broker which associates the partition with the log dir on that broker. To > > quote from KIP-113 (specifically, the notes in this section > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-113% > > 3A+Support+replicas+movement+between+log+directories#KIP- > > 113:Supportreplicasmovementbetweenlogdirectories-1%29Howtomo > > vereplicabetweenlogdirectoriesonthesamebroker> > > ): > > > > - If broker doesn't not have already replica created for the specified > > > topicParition when it receives AlterReplicaDirRequest, it will reply > > > ReplicaNotAvailableException AND remember (replica, destination log > > > directory) pair in memory to create the replica in the specified log > > > directory when it receives LeaderAndIsrRequest later. > > > > > > > I've not proposed anything to change that, really. All I've done is > change > > who creates the znode which causes the LeaderAndIsrRequest. Because > KIP-113 > > has been accepted, I've tried to avoid attempting to change it too much. > > > > Cheers, > > > > Tom > > >