Colin, When a user creates a topic, the user doesn't know broker's configures. If there is 100 brokers, the user can't contact all the brokers to find out the minimum allowed value for segment.ms. Today, the topic creation is done via zookeeper, at this stage, the user doesn't have right view of broker's configuration.
In the long term, we do need a way to enforce the minimum of segment.ms. But I think it is not a trivial change, so I rather keep this item as separated PR. Xiongqi (Wesley) Wu On Wed, Sep 5, 2018 at 8:33 AM Colin McCabe <cmcc...@apache.org> wrote: > On Tue, Sep 4, 2018, at 20:25, xiongqi wu wrote: > > Thanks for comments. > > > > Today, when creating topic, client only does simple local validation > > and doesn't check against broker's configurations. > > > > We cannot just let users to create a configuration in zookeeper and > > dishonor the user's choice in broker side. > > Hi xiongqi, > > I wasn't suggesting overriding the user's choices. Given that this is > intended for GDPR purposes, that would indeed be a very bad idea. I was > suggesting that we disallow users from creating topics with invalid > settings. The broker could have a configuration that specifies the minimum > segment rotation time it is willing to accept. This could be a dynamic > config to make it easier to manage. > > best, > Colin > > > > > I agree we need a better way to enforce the right value is set such as > > segment.ms, but it is not through a simple override in the broker side. > > > > If you have better solution, let me know. If it is require more > > discussions, I would rather track this issue outside this KIP. > > > > > > Xiongqi (Wesley) Wu > > > > > > On Tue, Sep 4, 2018 at 6:38 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > On Tue, Sep 4, 2018, at 17:47, xiongqi wu wrote: > > > > Colin, > > > > Thank you for comments. > > > > see my inline reply below. > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > > > > On Tue, Sep 4, 2018 at 5:24 PM Colin McCabe <cmcc...@apache.org> > wrote: > > > > > > > > > Hi Xiongqi, > > > > > > > > > > Thanks for this KIP. > > > > > > > > > > The name seems a bit ambiguous. Our compaction policies are > already > > > > > time-based, after all. It seems like this change is focused around > > > adding > > > > > a “max.compaction.lag.ms." Perhaps the KIP title should be > something > > > > > like "add maximum compaction lag time"? > > > > > > > > > > ==========> sure. I will change the title. > > > > > > > > > The active segment is forced to roll when either " > > > max.compaction.lag.ms" > > > > > > or "segment.ms" (log.roll.ms and log.roll.hours) has reached. > > > > > > > > > > If the max.compaction.lag.ms is low, it seems like segments will > be > > > > > rolled very frequently. This can be a source of problems in the > > > cluster, > > > > > since creating many different small log segments consumes a huge > > > amount of > > > > > cluster resources. Therefore, I would suggest adding a > broker-level > > > > > configuration which allows us to set a minimum value for > > > > > max.compaction.lag.ms. If we let users set it on a per-topic > basis, > > > > > someone could set a value of 1 ms or something, and cause chaos. > > > > > > > > > > =========> this applies to segment.ms as well. Today users can > set " > > > > segment.ms" to a very low value, and cause a frequent rolling of > active > > > > segments. > > > > > > Hi Xiongqi, > > > > > > I agree that this is an existing problem with segment.ms. However, > that > > > doesn't mean that we shouldn't fix it. As you noted, there will be > more > > > interest in these topic-level retention settings as a result of GDPR. > It > > > seems likely that pre-existing problems will cause more trouble. > > > > > > The fix seems relatively straightforward here -- add a broker-level > > > minimum segment.ms that overrides per-topic minimums. We can also > fail > > > with a helpful error message when someone attempts to set an invalid > > > configuration. > > > > > > > In my option, the minimum of "max.compaction.lag.ms" should be > > > > based on the minimum of "segment.ms". Since today the minimum of > > > segment.ms > > > > is 1, "max.compaction.lag.ms" also starts with 1. "0" means > disable. I > > > > can use -1 as disable, but it is hard to define the meaning of 0 > because > > > we > > > > cannot just roll the active segment immediately. > > > > > > That's a fair point. We should make 0 = disable, to be consistent with > > > the other settings. > > > > > > best, > > > Colin > > > > > > > > > > > > -- Note that an alternative configuration is to use -1 as > "disabled" > > > and > > > > > 0 > > > > > > as "immediate compaction". Because compaction lag is still > > > determined > > > > > > based on min.compaction.lag and how long to roll an active > segment, > > > > > the > > > > > > actual lag for compaction is undetermined if we use "0". On the > > > other > > > > > > hand, we can already set "min.cleanable.dirty.ratio" to achieve > the > > > > > same > > > > > > goal. So here we choose "0" as "disabled". > > > > > > > > > > I would prefer -1 to be the invalid setting. Treating 0 > differently > > > than > > > > > 1 seems strange to me. > > > > > > > > > > =====> see my previous comment, I am not strongly against, but 0 > is > > > not a > > > > valid configuration in my option. So I use "0" as disabled state. > > > > > > > > best, > > > > > Colin > > > > > > > > > > > > > > > On Tue, Sep 4, 2018, at 15:04, xiongqi wu wrote: > > > > > > Let's VOTE for this KIP. > > > > > > KIP: > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > > > > %3A+Time-based+log+compaction+policy > > > > > > > > > > > > Implementation: > > > > > > > > > > > > https://github.com/apache/kafka/pull/5611 > > > > > > > > > > > > > > > > > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > >