Thanks Dong. I have updated the KIP. Instead of using a configure to specify the timeout, I switch it to use internal timer. User doesn't need a new configuration to use this feature.
Xiongqi (Wesley) Wu On Mon, Oct 29, 2018 at 4:40 PM xiongqi wu <xiongq...@gmail.com> wrote: > Dong, > > Thanks for the comments. > > 1) With KIP-380, in theory we don't need the timeout phase. > However, once orphan partitions are removed, they cannot be recovered. > The question is should we rely on the fact that the first leaderandISR > always contains correct information. > > For retention enabled topic, the deletion phase (step 3 in this KIP) will > protect against deletion of new segments. > For log compaction topic, since log segments can be relative old, delete > phase might delete useful segments if by any chance first leaderandISR is > incorrect. > > Here is the different with/without timeout phase: > Solution 1: without timeout phase, we rely on the first leaderandISR and > understand that if first leaderandISR is incorrect, we might loss data. We > don't protect against bug. > Solution 2: with timeout phase, we rely on the fact that, during > timeout period, there is at least one valid leaderandISR for any given > partition hosted by the broker. > With the complexity of adding a timeout configuration. > > The solution 2 is a more safer option that comes with the cost of timeout > configuration. > *What is your opinion on these two solutions?* > > > For your second comment: > > I will change the metric description. Thanks for pointing out the right > metric format. > > > Xiongqi (Wesley) Wu > > > On Sun, Oct 28, 2018 at 9:39 PM Dong Lin <lindon...@gmail.com> wrote: > >> Hey Xiongqi, >> >> Thanks for the KIP. Here are some comments: >> >> 1) KIP provides two motivation for the timeout/correction phase. One >> motivation is to handle outdated requests. Would this still be an issue >> after KIP-380? The second motivation seems to be mainly for performance >> optimization when there is reassignment. In general we expect data >> movement >> when we reassign partitions to new brokers. So this is probably not a >> strong reason for adding a new config. >> >> 2) The KIP says "Adding metrics to keep track of the number of orphan >> partitions and the size of these orphan partitions". Can you add the >> specification of these new metrics? Here are example doc in >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics >> . >> >> Thanks, >> Dong >> >> On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu <xiongq...@gmail.com> wrote: >> >> > Colin, >> > >> > Thanks for the comment. >> > 1) >> > auto.orphan.partition.removal.delay.ms refers to timeout since the >> first >> > leader and ISR request was received. The idea is we want to wait enough >> > time to receive up-to-dated leaderandISR request and any old or new >> > partitions reassignment requests. >> > >> > 2) >> > Is there any logic to remove the partition folders on disk? I can only >> > find references to removing older log segments, but not the folder, in >> the >> > KIP. >> > ==> yes, the plan is to remove partition folders as well. >> > >> > I will update the KIP to make it more clear. >> > >> > >> > Xiongqi (Wesley) Wu >> > >> > >> > On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe <cmcc...@apache.org> >> wrote: >> > >> > > Hi Xiongqi, >> > > >> > > Thanks for the KIP. >> > > >> > > Can you be a bit more clear what the timeout >> > > auto.orphan.partition.removal.delay.ms refers to? Is the timeout >> > > measured since the partition was supposed to be on the broker? Or is >> the >> > > timeout measured since the broker started up? >> > > >> > > Is there any logic to remove the partition folders on disk? I can >> only >> > > find references to removing older log segments, but not the folder, in >> > the >> > > KIP. >> > > >> > > best, >> > > Colin >> > > >> > > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote: >> > > > Any comments? >> > > > >> > > > Xiongqi (Wesley) Wu >> > > > >> > > > >> > > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu <xiongq...@gmail.com> >> > wrote: >> > > > >> > > > > Here is the implementation for the KIP 370. >> > > > > >> > > > > >> > > > > >> > > >> > >> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9 >> > > > > >> > > > > >> > > > > The purpose is to do one time cleanup (after a configured delay) >> of >> > > orphan >> > > > > partitions when a broker starts up. >> > > > > >> > > > > >> > > > > Xiongqi (Wesley) Wu >> > > > > >> > > > > >> > > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu <xiongq...@gmail.com> >> > > wrote: >> > > > > >> > > > >> >> > > > >> This KIP enables broker to remove orphan partitions >> automatically. >> > > > >> >> > > > >> >> > > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions >> > > > >> >> > > > >> >> > > > >> Xiongqi (Wesley) Wu >> > > > >> >> > > > > >> > > >> > >> >