Luis, I meant to update the "Rejected Alternative" sections, what you have done already. Thx.
Originally, I also had the idea about a second config, but thought it might be easier to just change the allowed values to be `offset`, `timestamp`, `header=<key>`. (We try to keep the number of configs small if possible, as more configs are more confusing to users.) I don't think that using `_offset_`, `_timestamp_` and `<key>` solves the problem because users still might use `_something_` as header key -- and if we want to introduce a new compaction strategy "something" later we face the same issues as without the underscores. We only reduce the likelihood that it happens. Using `header=` as prefix or introducing a second config, that is only effective if the strategy is set to `header` seems to be a cleaner solution. @Luis: why do you think that using `header=<key>` is an "incorrect approach"? > Though I would still prefer to keep it as it is, as its a much simple> and > cleaner approach – I’m not so sure that a potential client would > really be so inconvenienced for having to use “_offset” or > “_timestamp_” as a header I don't think that it's about the issue that people cannot use `_offset_` or `_timestamp_` in their header (by "use" I mean for compaction). With the current KIP, they cannot use `offset` or `timestamp` either. The issue is, that we cannot introduce a new system supported compaction strategy in the future without potentially breaking something, as we basically assign the whole space of Strings (minus `offset`, `timestamp`) as valid configs to enable header based compaction. Personally, I prefer either adding a config or going with `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be good enough (even if this is the solution I like least)---for this case, we should state explicitly, that the whole space of `_*_` is reserved and users are not allowed to set those for header compaction. In fact, I would also add a check for the config that only allows for `_offset_` and `_timestamp_` and throws an exception for all other `_*_` configs. -Matthias On 6/18/18 2:03 PM, Luís Cabral wrote: > I’m ok with that... > > Ted / Matthias? > > > From: Guozhang Wang > Sent: 18 June 2018 22:49 > To: dev@kafka.apache.org > Subject: Re: [VOTE] KIP-280: Enhanced log compaction > > How about make the preserved values to be "_offset_" and "_timestamp_" > then? Currently in the KIP they are reserved as "offset" and "timestamp". > > > Guozhang > > On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral <luis_cab...@yahoo.com.invalid> > wrote: > >> Hi Guozhang, >> >> Yes, that is what I meant (separate configs). >> Though I would still prefer to keep it as it is, as its a much simpler and >> cleaner approach – I’m not so sure that a potential client would really be >> so inconvenienced for having to use “_offset” or “_timestamp_” as a header >> >> Cheers, >> Luís >> >> >> From: Guozhang Wang >> Sent: 18 June 2018 19:35 >> To: dev@kafka.apache.org >> Subject: Re: [VOTE] KIP-280: Enhanced log compaction >> >> Hello Luís, >> >> I agree that having an expression evaluation as a config value is not the >> best approach; if there are better ideas to allow users to specify the >> header key which happen to be the same as the preserved config values >> "offset" and "timestamp" (although the likelihood may be small, as Ted >> mentioned there may be more preserved config values added in the future), >> then I'd be happily follow the suggestions. For example, we could have the >> config value for header keys as "header-<key-name>"? Is that what you've >> suggested? Or do you suggest using two configs instead, and the second >> config specifying the key name, and will only be considered if the first >> (i.e. current proposed) config's value is `header`, otherwise be ignored? >> >> >> Guozhang >> >> >> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral >> <luis_cab...@yahoo.com.invalid >>> wrote: >> >>> Hi Ted / Guozhang / Matthias, >>> >>> @Ted: I've now added your argument to the "Rejected Alternatives" portion >>> of the KIP. Please keep in mind that I would like to keep this as >> backwards >>> compatible as possible, so a lot of decisions are inferred from that >> intent. >>> >>> @Guozhang: IMHO, adding expression evaluation to configuration is an >>> incorrect approach. If you absolutely insist on having this clear >>> distinction between header/key, then I would suggest instead to have a >>> dedicated property for the "key" part. Of course, this is your project so >>> I'll just continue whatever approach moves this KIP forward... >>> >>> @Matthias: Sorry, but update the KIP according to what? >>> >>> Cheers, >>> Luís >>> >>> On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax < >>> matth...@confluent.io> wrote: >>> >>> Well, for "offset" and "timestamp" policy, not communication between >>> both is required. >>> >>> Only if headers are used, user A should communicate the corresponding >>> header key to user B. >>> >>> >>> @Luis: can you update the KIP accordingly? >>> >>> >>> >>> -Matthias >>> >>> On 6/17/18 5:36 PM, Ted Yu wrote: >>>> My previous reply was just an alternative for consideration. >>>> >>>> bq. than a second user B can add a header with key "offset" and thus >>> break >>>> the intention of user A >>>> >>>> I didn't see such scenario after reading the KIP. Maybe add this as >>>> reasoning for the current approach ? >>>> >>>> I wonder how user B gets to know the intention of user A. Meaning, if >>> user >>>> B doesn't follow the norm set by user A, there still would be issue, >>> right ? >>>> >>>> >>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax < >> matth...@confluent.io> >>>> wrote: >>>> >>>>> Let me rephrase your answer to make sure I understand what you >> suggest: >>>>> >>>>> If compaction strategy is configured to use "offset", and if there is >> a >>>>> header in the record with `key == offset`, than we should use the >> value >>>>> of the record header instead of the actual record offset? >>>>> >>>>> Do I understand this correctly? If yes, what is the advantage of doing >>>>> this? From my point of view, it might be problematic, because if user >> A >>>>> creates a topic and configures "offset" compaction (with the intend >> that >>>>> the record offset should be uses), than a second user B can add a >> header >>>>> with key "offset" and thus break the intention of user A. >>>>> >>>>> Also, if existing topics might have data with record header key >>>>> "offset", the change would not be backward compatible either. >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 6/16/18 6:59 PM, Ted Yu wrote: >>>>>> Pardon the brevity in my previous reply. >>>>>> I was talking about this bullet: >>>>>> >>>>>> bq. When this configuration is set to anything other than "*offset*" >>> or " >>>>>> *timestamp*", then the record headers are scanned for a key matching >>> this >>>>>> value. >>>>>> >>>>>> My point is that if matching key in the header is found, its value >>> should >>>>>> take precedence over the value of the configuration. >>>>>> I understand that such interpretation may have slight performance >> cost. >>>>>> >>>>>> Cheers >>>>>> >>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax < >>> matth...@confluent.io> >>>>>> wrote: >>>>>> >>>>>>> Ted, >>>>>>> >>>>>>> I am also not sure what you mean by "Shouldn't the selection in >> header >>>>>>> have higher precedence over the configuration"? What selection do >> you >>>>>>> mean? And want configuration? >>>>>>> >>>>>>> >>>>>>> About the first point, I think this is actually a valid concern: To >>>>>>> address this issue, it seems that we would need to change the >> accepted >>>>>>> format of the config. Instead of "offset", "timestamp", >>> "<header-key>", >>>>>>> we could replace the last one with "header=<header-key>". >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote: >>>>>>>> If selection exists in header, the selection should override the >>> config >>>>>>> value. >>>>>>>> Cheers >>>>>>>> -------- Original message --------From: Luis Cabral >>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18 1:40 AM (GMT-08:00) >>> To: >>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log >>>>> compaction >>>>>>>> Hi, >>>>>>>> >>>>>>>> bq. Can the value be determined now ? My thinking is that what if >>> there >>>>>>> is a third compaction strategy proposed in the future ? We should >>> guard >>>>>>> against user unknowingly choosing the 'future' strategy. >>>>>>>> >>>>>>>> The idea is that the header name to use is flexible, which protects >>>>>>> current clients that may want to use this from having to adapt their >>>>>>> already existing header names (they can just specify a new name). >>>>>>>> >>>>>>>> bq. Shouldn't the selection in header have higher precedence over >> the >>>>>>> configuration ? >>>>>>>> >>>>>>>> Not sure what you mean here, could you clarify? >>>>>>>> >>>>>>>> bq. Please create JIRA if you haven't already. >>>>>>>> >>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061 >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Luís >>>>>>>> >>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> bq. When this configuration is set to anything other than >> "*offset*" >>>>> or >>>>>>> " >>>>>>>>> *timestamp*", then the record headers are scanned for a key >> matching >>>>>>> this >>>>>>>>> value. >>>>>>>>> >>>>>>>>> Can the value be determined now ? My thinking is that what if >> there >>>>> is a >>>>>>>>> third compaction strategy proposed in the future ? We should guard >>>>>>> against >>>>>>>>> user unknowingly choosing the 'future' strategy. >>>>>>>>> >>>>>>>>> bq. If this header is found >>>>>>>>> >>>>>>>>> Shouldn't the selection in header have higher precedence over the >>>>>>> configuration >>>>>>>>> ? >>>>>>>>> >>>>>>>>> Please create JIRA if you haven't already. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral >>>>>>> <luis_cab...@yahoo.com.invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Any takers on having a look at this KIP and voting on it? >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>> 280%3A+Enhanced+log+compaction >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Luis >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >> >> -- >> -- Guozhang >> >> > >
signature.asc
Description: OpenPGP digital signature