+1 from me as well. On Mon, Jun 25, 2018 at 8:16 AM, Matthias J. Sax <matth...@confluent.io> wrote:
> +1 from my side for using `compaction.strategy` with values "offset", > "timestamp" and "header" and `compaction.strategy.header` > > -Matthias > > On 6/25/18 1:25 AM, Luís Cabral wrote: > > Hi, > > > > So, is everyone OK using the approach with 2 properties? > > > > E.g.: > > > > Scenario 1: > > compaction.strategy: offset > > > > :- Behaviour is the same as what currently exists, where the > compaction is done only via the 'offset' > > > > > > Scenario 2: > > compaction.strategy: timestamp > > > > :- Similar to 'offset', but the record timestamp is used instead > > > > > > Scenario 3: > > compaction.strategy: header > > > > compaction.strategy.header: xyz > > > > :- Searches the headers for 'xyz' key when performing the > compaction. Defaults to 'offset' strategy if this header does not exist > (special note on the '.header' suffix, as this would allow additional > strategies to add whatever extra configuration they need). > > > > Scenario 4 (hypothetical future): > > compaction.strategy: foo > > > > compaction.strategy.foo.name: bar > > compaction.strategy.foo.order: DESC > > compaction.strategy.foo.fallback: timestamp > > > > > > :- This one is just to show what I meant with the '.header' suffix > mentioned in {Scenario 3} > > > > > > > > Regards, > > Luís > > > > > > On Monday, June 18, 2018, 11:56:51 PM GMT+2, Guozhang Wang < > wangg...@gmail.com> wrote: > > > > Hi Matthias, > > > > Yes, we are effectively assigning the the whole space of Strings minus > > current preserved ones as header keys; honestly I think in practice users > > wanting to use `_something_` would be very rare, but I admit it may still > > be possible in theory. > > > > I think Luis' point about "header=<header-value>" is that having a > expression > > evaluation as the config value is a bit weird, and thinking about it > twice > > it is still not flawless: we can still argue that we are effectively > > assigning the whole sub-space of "header=*" of Strings for headers, and > > what if users want to use preserved value falling into that sub-space > > (again, should not really happen in practice, just being paranoid here). > > > > It seems that two configs are the common choice that everyone is happy > with. > > > > Guozhang > > > > > > On Mon, Jun 18, 2018 at 2:35 PM, Matthias J. Sax <matth...@confluent.io> > > wrote: > > > >> Luis, > >> > >> I meant to update the "Rejected Alternative" sections, what you have > >> done already. Thx. > >> > >> Originally, I also had the idea about a second config, but thought it > >> might be easier to just change the allowed values to be `offset`, > >> `timestamp`, `header=<key>`. (We try to keep the number of configs small > >> if possible, as more configs are more confusing to users.) > >> > >> I don't think that using `_offset_`, `_timestamp_` and `<key>` solves > >> the problem because users still might use `_something_` as header key -- > >> and if we want to introduce a new compaction strategy "something" later > >> we face the same issues as without the underscores. We only reduce the > >> likelihood that it happens. > >> > >> Using `header=` as prefix or introducing a second config, that is only > >> effective if the strategy is set to `header` seems to be a cleaner > >> solution. > >> > >> @Luis: why do you think that using `header=<key>` is an "incorrect > >> approach"? > >> > >>> Though I would still prefer to keep it as it is, as its a much simple> > >> and cleaner approach – I’m not so sure that a potential client would > >>> really be so inconvenienced for having to use “_offset” or > >>> “_timestamp_” as a header > >> > >> I don't think that it's about the issue that people cannot use > >> `_offset_` or `_timestamp_` in their header (by "use" I mean for > >> compaction). With the current KIP, they cannot use `offset` or > >> `timestamp` either. The issue is, that we cannot introduce a new system > >> supported compaction strategy in the future without potentially breaking > >> something, as we basically assign the whole space of Strings (minus > >> `offset`, `timestamp`) as valid configs to enable header based > compaction. > >> > >> Personally, I prefer either adding a config or going with > >> `header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be > >> good enough (even if this is the solution I like least)---for this case, > >> we should state explicitly, that the whole space of `_*_` is reserved > >> and users are not allowed to set those for header compaction. In fact, I > >> would also add a check for the config that only allows for `_offset_` > >> and `_timestamp_` and throws an exception for all other `_*_` configs. > >> > >> > >> -Matthias > >> > >> > >> On 6/18/18 2:03 PM, Luís Cabral wrote: > >>> I’m ok with that... > >>> > >>> Ted / Matthias? > >>> > >>> > >>> From: Guozhang Wang > >>> Sent: 18 June 2018 22:49 > >>> To: dev@kafka.apache.org > >>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction > >>> > >>> How about make the preserved values to be "_offset_" and "_timestamp_" > >>> then? Currently in the KIP they are reserved as "offset" and > "timestamp". > >>> > >>> > >>> Guozhang > >>> > >>> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral > >> <luis_cab...@yahoo.com.invalid> > >>> wrote: > >>> > >>>> Hi Guozhang, > >>>> > >>>> Yes, that is what I meant (separate configs). > >>>> Though I would still prefer to keep it as it is, as its a much simpler > >> and > >>>> cleaner approach – I’m not so sure that a potential client would > really > >> be > >>>> so inconvenienced for having to use “_offset” or “_timestamp_” as a > >> header > >>>> > >>>> Cheers, > >>>> Luís > >>>> > >>>> > >>>> From: Guozhang Wang > >>>> Sent: 18 June 2018 19:35 > >>>> To: dev@kafka.apache.org > >>>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction > >>>> > >>>> Hello Luís, > >>>> > >>>> I agree that having an expression evaluation as a config value is not > >> the > >>>> best approach; if there are better ideas to allow users to specify the > >>>> header key which happen to be the same as the preserved config values > >>>> "offset" and "timestamp" (although the likelihood may be small, as Ted > >>>> mentioned there may be more preserved config values added in the > >> future), > >>>> then I'd be happily follow the suggestions. For example, we could have > >> the > >>>> config value for header keys as "header-<key-name>"? Is that what > you've > >>>> suggested? Or do you suggest using two configs instead, and the second > >>>> config specifying the key name, and will only be considered if the > first > >>>> (i.e. current proposed) config's value is `header`, otherwise be > >> ignored? > >>>> > >>>> > >>>> Guozhang > >>>> > >>>> > >>>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral > >>>> <luis_cab...@yahoo.com.invalid > >>>>> wrote: > >>>> > >>>>> Hi Ted / Guozhang / Matthias, > >>>>> > >>>>> @Ted: I've now added your argument to the "Rejected Alternatives" > >> portion > >>>>> of the KIP. Please keep in mind that I would like to keep this as > >>>> backwards > >>>>> compatible as possible, so a lot of decisions are inferred from that > >>>> intent. > >>>>> > >>>>> @Guozhang: IMHO, adding expression evaluation to configuration is an > >>>>> incorrect approach. If you absolutely insist on having this clear > >>>>> distinction between header/key, then I would suggest instead to have > a > >>>>> dedicated property for the "key" part. Of course, this is your > project > >> so > >>>>> I'll just continue whatever approach moves this KIP forward... > >>>>> > >>>>> @Matthias: Sorry, but update the KIP according to what? > >>>>> > >>>>> Cheers, > >>>>> Luís > >>>>> > >>>>> On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax < > >>>>> matth...@confluent.io> wrote: > >>>>> > >>>>> Well, for "offset" and "timestamp" policy, not communication > between > >>>>> both is required. > >>>>> > >>>>> Only if headers are used, user A should communicate the corresponding > >>>>> header key to user B. > >>>>> > >>>>> > >>>>> @Luis: can you update the KIP accordingly? > >>>>> > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> On 6/17/18 5:36 PM, Ted Yu wrote: > >>>>>> My previous reply was just an alternative for consideration. > >>>>>> > >>>>>> bq. than a second user B can add a header with key "offset" and > thus > >>>>> break > >>>>>> the intention of user A > >>>>>> > >>>>>> I didn't see such scenario after reading the KIP. Maybe add this as > >>>>>> reasoning for the current approach ? > >>>>>> > >>>>>> I wonder how user B gets to know the intention of user A. Meaning, > if > >>>>> user > >>>>>> B doesn't follow the norm set by user A, there still would be issue, > >>>>> right ? > >>>>>> > >>>>>> > >>>>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax < > >>>> matth...@confluent.io> > >>>>>> wrote: > >>>>>> > >>>>>>> Let me rephrase your answer to make sure I understand what you > >>>> suggest: > >>>>>>> > >>>>>>> If compaction strategy is configured to use "offset", and if there > is > >>>> a > >>>>>>> header in the record with `key == offset`, than we should use the > >>>> value > >>>>>>> of the record header instead of the actual record offset? > >>>>>>> > >>>>>>> Do I understand this correctly? If yes, what is the advantage of > >> doing > >>>>>>> this? From my point of view, it might be problematic, because if > user > >>>> A > >>>>>>> creates a topic and configures "offset" compaction (with the intend > >>>> that > >>>>>>> the record offset should be uses), than a second user B can add a > >>>> header > >>>>>>> with key "offset" and thus break the intention of user A. > >>>>>>> > >>>>>>> Also, if existing topics might have data with record header key > >>>>>>> "offset", the change would not be backward compatible either. > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> On 6/16/18 6:59 PM, Ted Yu wrote: > >>>>>>>> Pardon the brevity in my previous reply. > >>>>>>>> I was talking about this bullet: > >>>>>>>> > >>>>>>>> bq. When this configuration is set to anything other than > "*offset*" > >>>>> or " > >>>>>>>> *timestamp*", then the record headers are scanned for a key > matching > >>>>> this > >>>>>>>> value. > >>>>>>>> > >>>>>>>> My point is that if matching key in the header is found, its value > >>>>> should > >>>>>>>> take precedence over the value of the configuration. > >>>>>>>> I understand that such interpretation may have slight performance > >>>> cost. > >>>>>>>> > >>>>>>>> Cheers > >>>>>>>> > >>>>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax < > >>>>> matth...@confluent.io> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Ted, > >>>>>>>>> > >>>>>>>>> I am also not sure what you mean by "Shouldn't the selection in > >>>> header > >>>>>>>>> have higher precedence over the configuration"? What selection do > >>>> you > >>>>>>>>> mean? And want configuration? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> About the first point, I think this is actually a valid concern: > To > >>>>>>>>> address this issue, it seems that we would need to change the > >>>> accepted > >>>>>>>>> format of the config. Instead of "offset", "timestamp", > >>>>> "<header-key>", > >>>>>>>>> we could replace the last one with "header=<header-key>". > >>>>>>>>> > >>>>>>>>> WDYT? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -Matthias > >>>>>>>>> > >>>>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote: > >>>>>>>>>> If selection exists in header, the selection should override the > >>>>> config > >>>>>>>>> value. > >>>>>>>>>> Cheers > >>>>>>>>>> -------- Original message --------From: Luis Cabral > >>>>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18 1:40 AM > >> (GMT-08:00) > >>>>> To: > >>>>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log > >>>>>>> compaction > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> bq. Can the value be determined now ? My thinking is that what > if > >>>>> there > >>>>>>>>> is a third compaction strategy proposed in the future ? We should > >>>>> guard > >>>>>>>>> against user unknowingly choosing the 'future' strategy. > >>>>>>>>>> > >>>>>>>>>> The idea is that the header name to use is flexible, which > >> protects > >>>>>>>>> current clients that may want to use this from having to adapt > >> their > >>>>>>>>> already existing header names (they can just specify a new name). > >>>>>>>>>> > >>>>>>>>>> bq. Shouldn't the selection in header have higher precedence > over > >>>> the > >>>>>>>>> configuration ? > >>>>>>>>>> > >>>>>>>>>> Not sure what you mean here, could you clarify? > >>>>>>>>>> > >>>>>>>>>> bq. Please create JIRA if you haven't already. > >>>>>>>>>> > >>>>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061 > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Luís > >>>>>>>>>> > >>>>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> bq. When this configuration is set to anything other than > >>>> "*offset*" > >>>>>>> or > >>>>>>>>> " > >>>>>>>>>>> *timestamp*", then the record headers are scanned for a key > >>>> matching > >>>>>>>>> this > >>>>>>>>>>> value. > >>>>>>>>>>> > >>>>>>>>>>> Can the value be determined now ? My thinking is that what if > >>>> there > >>>>>>> is a > >>>>>>>>>>> third compaction strategy proposed in the future ? We should > >> guard > >>>>>>>>> against > >>>>>>>>>>> user unknowingly choosing the 'future' strategy. > >>>>>>>>>>> > >>>>>>>>>>> bq. If this header is found > >>>>>>>>>>> > >>>>>>>>>>> Shouldn't the selection in header have higher precedence over > the > >>>>>>>>> configuration > >>>>>>>>>>> ? > >>>>>>>>>>> > >>>>>>>>>>> Please create JIRA if you haven't already. > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral > >>>>>>>>> <luis_cab...@yahoo.com.invalid> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> Any takers on having a look at this KIP and voting on it? > >>>>>>>>>>>> > >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>>>>>>>>>> 280%3A+Enhanced+log+compaction > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Luis > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> -- Guozhang > >>>> > >>>> > >>> > >>> > >> > >> > > > > > > -- -- Guozhang