On Tue, Sep 4, 2018, at 20:25, xiongqi wu wrote: > Thanks for comments. > > Today, when creating topic, client only does simple local validation > and doesn't check against broker's configurations. > > We cannot just let users to create a configuration in zookeeper and > dishonor the user's choice in broker side.
Hi xiongqi, I wasn't suggesting overriding the user's choices. Given that this is intended for GDPR purposes, that would indeed be a very bad idea. I was suggesting that we disallow users from creating topics with invalid settings. The broker could have a configuration that specifies the minimum segment rotation time it is willing to accept. This could be a dynamic config to make it easier to manage. best, Colin > > I agree we need a better way to enforce the right value is set such as > segment.ms, but it is not through a simple override in the broker side. > > If you have better solution, let me know. If it is require more > discussions, I would rather track this issue outside this KIP. > > > Xiongqi (Wesley) Wu > > > On Tue, Sep 4, 2018 at 6:38 PM Colin McCabe <cmcc...@apache.org> wrote: > > > On Tue, Sep 4, 2018, at 17:47, xiongqi wu wrote: > > > Colin, > > > Thank you for comments. > > > see my inline reply below. > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > On Tue, Sep 4, 2018 at 5:24 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > > > Hi Xiongqi, > > > > > > > > Thanks for this KIP. > > > > > > > > The name seems a bit ambiguous. Our compaction policies are already > > > > time-based, after all. It seems like this change is focused around > > adding > > > > a “max.compaction.lag.ms." Perhaps the KIP title should be something > > > > like "add maximum compaction lag time"? > > > > > > > > ==========> sure. I will change the title. > > > > > > > The active segment is forced to roll when either " > > max.compaction.lag.ms" > > > > > or "segment.ms" (log.roll.ms and log.roll.hours) has reached. > > > > > > > > If the max.compaction.lag.ms is low, it seems like segments will be > > > > rolled very frequently. This can be a source of problems in the > > cluster, > > > > since creating many different small log segments consumes a huge > > amount of > > > > cluster resources. Therefore, I would suggest adding a broker-level > > > > configuration which allows us to set a minimum value for > > > > max.compaction.lag.ms. If we let users set it on a per-topic basis, > > > > someone could set a value of 1 ms or something, and cause chaos. > > > > > > > > =========> this applies to segment.ms as well. Today users can set " > > > segment.ms" to a very low value, and cause a frequent rolling of active > > > segments. > > > > Hi Xiongqi, > > > > I agree that this is an existing problem with segment.ms. However, that > > doesn't mean that we shouldn't fix it. As you noted, there will be more > > interest in these topic-level retention settings as a result of GDPR. It > > seems likely that pre-existing problems will cause more trouble. > > > > The fix seems relatively straightforward here -- add a broker-level > > minimum segment.ms that overrides per-topic minimums. We can also fail > > with a helpful error message when someone attempts to set an invalid > > configuration. > > > > > In my option, the minimum of "max.compaction.lag.ms" should be > > > based on the minimum of "segment.ms". Since today the minimum of > > segment.ms > > > is 1, "max.compaction.lag.ms" also starts with 1. "0" means disable. I > > > can use -1 as disable, but it is hard to define the meaning of 0 because > > we > > > cannot just roll the active segment immediately. > > > > That's a fair point. We should make 0 = disable, to be consistent with > > the other settings. > > > > best, > > Colin > > > > > > > > > -- Note that an alternative configuration is to use -1 as "disabled" > > and > > > > 0 > > > > > as "immediate compaction". Because compaction lag is still > > determined > > > > > based on min.compaction.lag and how long to roll an active segment, > > > > the > > > > > actual lag for compaction is undetermined if we use "0". On the > > other > > > > > hand, we can already set "min.cleanable.dirty.ratio" to achieve the > > > > same > > > > > goal. So here we choose "0" as "disabled". > > > > > > > > I would prefer -1 to be the invalid setting. Treating 0 differently > > than > > > > 1 seems strange to me. > > > > > > > > =====> see my previous comment, I am not strongly against, but 0 is > > not a > > > valid configuration in my option. So I use "0" as disabled state. > > > > > > best, > > > > Colin > > > > > > > > > > > > On Tue, Sep 4, 2018, at 15:04, xiongqi wu wrote: > > > > > Let's VOTE for this KIP. > > > > > KIP: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > > > %3A+Time-based+log+compaction+policy > > > > > > > > > > Implementation: > > > > > > > > > > https://github.com/apache/kafka/pull/5611 > > > > > > > > > > > > > > > > > > > > Xiongqi (Wesley) Wu > > > > > >