On Wed, Sep 5, 2018, at 10:26, xiongqi wu wrote: > Colin, > > When a user creates a topic, the user doesn't know broker's configures. If > there is 100 brokers, the user can't contact all the brokers to find out > the minimum allowed value for segment.ms. Today, the topic creation is > done via zookeeper, at this stage, the user doesn't have right view of > broker's configuration.
Hi Ziongqi, I think I understand your concerns now. Yes, there are some tools which use ZK directly now. We can't put any safeguards on these, of course. However, more and more code is switching over to use Admin Client. For example, Streams now uses admin client to create topics, and there is a KIP to switch over the kafka-topics.sh script to do the same. Admin Client can check and reject invalid configurations. Please keep in mind that I am only proposing that the broker configuration prevent topics from being created with the wrong configuration. Once the topic has been created then we should honor whatever configuration exists. Also, tools that modify ZooKeeper themselves can ignore all safeguards (and indeed destroy the whole system if used incorrectly). This is an existing bug which we are in the process of fixing as part of KIP-4. > > In the long term, we do need a way to enforce the minimum of segment.ms. > But I think it is not a trivial change, so I rather keep this item as > separated PR. It should be a pretty trivial change if you just add it to the admin client topic creation code path in KafkaApis. best, Colin > > > Xiongqi (Wesley) Wu > > > On Wed, Sep 5, 2018 at 8:33 AM Colin McCabe <cmcc...@apache.org> wrote: > > > On Tue, Sep 4, 2018, at 20:25, xiongqi wu wrote: > > > Thanks for comments. > > > > > > Today, when creating topic, client only does simple local validation > > > and doesn't check against broker's configurations. > > > > > > We cannot just let users to create a configuration in zookeeper and > > > dishonor the user's choice in broker side. > > > > Hi xiongqi, > > > > I wasn't suggesting overriding the user's choices. Given that this is > > intended for GDPR purposes, that would indeed be a very bad idea. I was > > suggesting that we disallow users from creating topics with invalid > > settings. The broker could have a configuration that specifies the minimum > > segment rotation time it is willing to accept. This could be a dynamic > > config to make it easier to manage. > > > > best, > > Colin > > > > > > > > I agree we need a better way to enforce the right value is set such as > > > segment.ms, but it is not through a simple override in the broker side. > > > > > > If you have better solution, let me know. If it is require more > > > discussions, I would rather track this issue outside this KIP. > > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > On Tue, Sep 4, 2018 at 6:38 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > > > On Tue, Sep 4, 2018, at 17:47, xiongqi wu wrote: > > > > > Colin, > > > > > Thank you for comments. > > > > > see my inline reply below. > > > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 5:24 PM Colin McCabe <cmcc...@apache.org> > > wrote: > > > > > > > > > > > Hi Xiongqi, > > > > > > > > > > > > Thanks for this KIP. > > > > > > > > > > > > The name seems a bit ambiguous. Our compaction policies are > > already > > > > > > time-based, after all. It seems like this change is focused around > > > > adding > > > > > > a “max.compaction.lag.ms." Perhaps the KIP title should be > > something > > > > > > like "add maximum compaction lag time"? > > > > > > > > > > > > ==========> sure. I will change the title. > > > > > > > > > > > The active segment is forced to roll when either " > > > > max.compaction.lag.ms" > > > > > > > or "segment.ms" (log.roll.ms and log.roll.hours) has reached. > > > > > > > > > > > > If the max.compaction.lag.ms is low, it seems like segments will > > be > > > > > > rolled very frequently. This can be a source of problems in the > > > > cluster, > > > > > > since creating many different small log segments consumes a huge > > > > amount of > > > > > > cluster resources. Therefore, I would suggest adding a > > broker-level > > > > > > configuration which allows us to set a minimum value for > > > > > > max.compaction.lag.ms. If we let users set it on a per-topic > > basis, > > > > > > someone could set a value of 1 ms or something, and cause chaos. > > > > > > > > > > > > =========> this applies to segment.ms as well. Today users can > > set " > > > > > segment.ms" to a very low value, and cause a frequent rolling of > > active > > > > > segments. > > > > > > > > Hi Xiongqi, > > > > > > > > I agree that this is an existing problem with segment.ms. However, > > that > > > > doesn't mean that we shouldn't fix it. As you noted, there will be > > more > > > > interest in these topic-level retention settings as a result of GDPR. > > It > > > > seems likely that pre-existing problems will cause more trouble. > > > > > > > > The fix seems relatively straightforward here -- add a broker-level > > > > minimum segment.ms that overrides per-topic minimums. We can also > > fail > > > > with a helpful error message when someone attempts to set an invalid > > > > configuration. > > > > > > > > > In my option, the minimum of "max.compaction.lag.ms" should be > > > > > based on the minimum of "segment.ms". Since today the minimum of > > > > segment.ms > > > > > is 1, "max.compaction.lag.ms" also starts with 1. "0" means > > disable. I > > > > > can use -1 as disable, but it is hard to define the meaning of 0 > > because > > > > we > > > > > cannot just roll the active segment immediately. > > > > > > > > That's a fair point. We should make 0 = disable, to be consistent with > > > > the other settings. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > > > > -- Note that an alternative configuration is to use -1 as > > "disabled" > > > > and > > > > > > 0 > > > > > > > as "immediate compaction". Because compaction lag is still > > > > determined > > > > > > > based on min.compaction.lag and how long to roll an active > > segment, > > > > > > the > > > > > > > actual lag for compaction is undetermined if we use "0". On the > > > > other > > > > > > > hand, we can already set "min.cleanable.dirty.ratio" to achieve > > the > > > > > > same > > > > > > > goal. So here we choose "0" as "disabled". > > > > > > > > > > > > I would prefer -1 to be the invalid setting. Treating 0 > > differently > > > > than > > > > > > 1 seems strange to me. > > > > > > > > > > > > =====> see my previous comment, I am not strongly against, but 0 > > is > > > > not a > > > > > valid configuration in my option. So I use "0" as disabled state. > > > > > > > > > > best, > > > > > > Colin > > > > > > > > > > > > > > > > > > On Tue, Sep 4, 2018, at 15:04, xiongqi wu wrote: > > > > > > > Let's VOTE for this KIP. > > > > > > > KIP: > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > > > > > %3A+Time-based+log+compaction+policy > > > > > > > > > > > > > > Implementation: > > > > > > > > > > > > > > https://github.com/apache/kafka/pull/5611 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > > > >