Hi Igor, Thanks for the KIP. It looks like it's on a good track. I have a few suggestions to throw into the mix:
1. (nit): Instead of "storage id," maybe we could call it "directory id"? It seems a little clear since each log dir gets a unique id. 2. Rather than introducing a new RPC to communicate offline directories, would it be reasonable to add it to BrokerHeartbeat? My thinking is that we can let broker registration include the complete list of log dirs. The heartbeat can then be used to communicate online/offline status of each log dir. 3. I think we have an opportunity to consolidate normal partition reassignment and log dir reassignment here. Since we are modifying the `Assignment` to consist of both replica id and storage id, couldn't we make a similar change to the `AlterPartitionReassignment` API? Basically that would let us treat all reassignments as moving from one log dir to another. The case handled by `AlterReplicaLogDir` currently would just become a special case where the replica id happens to be the same. This might also let us get rid of all the custom logic surrounding JBOD, such as the additional fetcher, which has proven difficult to maintain. Thanks, Jason On Tue, Aug 2, 2022 at 2:24 AM Igor Soarez <i...@soarez.me> wrote: > Hi José, > > Thanks for having a look at this KIP and thanks for pointing this out, > I've had a look at KIP-856. > > It's good to see there's some overlap in our proposals. we're both > proposing: > > - Identifying log directories with a UUID > - Extending the storage tool to ensure each log directory has a UUID > assigned > - Expanding the topic partition identity to include the log directory UUID > > There were differences in our proposals as to how the UUID is to be > persisted, but I've changed my proposal to match yours — I think adding > storage.id to meta.properties makes sense. > > -- > Igor > > > On Wed, Jul 27, 2022, at 4:42 PM, José Armando García Sancio wrote: > > Hi Igor, > > > > Thanks for the KIP. Looking forward to this improvement. I'll review > your KIP. > > > > I should mention that I started a discussion thread on KIP-856: KRaft > > Disk Failure Recovery at > > https://lists.apache.org/thread/ytv0t18cplwwwqcp77h6vry7on378jzj > > > > Both keep introducing similar concepts. For example both KIP introduce > > a storage uuid that is stored in meta.properties. At first glance > > there are some minor differences. I suggest that we review each > > other's KIP so that we can remove these minor differences. What do you > > think? > > > > Thanks! > > -- > > -José >