Luis,

I meant to update the "Rejected Alternative" sections, what you have
done already. Thx.

Originally, I also had the idea about a second config, but thought it
might be easier to just change the allowed values to be `offset`,
`timestamp`, `header=<key>`. (We try to keep the number of configs small
if possible, as more configs are more confusing to users.)

I don't think that using `_offset_`, `_timestamp_` and `<key>` solves
the problem because users still might use `_something_` as header key --
and if we want to introduce a new compaction strategy "something" later
we face the same issues as without the underscores. We only reduce the
likelihood that it happens.

Using `header=` as prefix or introducing a second config, that is only
effective if the strategy is set to `header` seems to be a cleaner solution.

@Luis: why do you think that using `header=<key>` is an "incorrect
approach"?

> Though I would still prefer to keep it as it is, as its a much simple> and 
> cleaner approach – I’m not so sure that a potential client would
> really be so inconvenienced for having to use “_offset” or
> “_timestamp_” as a header

I don't think that it's about the issue that people cannot use
`_offset_` or `_timestamp_` in their header (by "use" I mean for
compaction). With the current KIP, they cannot use `offset` or
`timestamp` either. The issue is, that we cannot introduce a new system
supported compaction strategy in the future without potentially breaking
something, as we basically assign the whole space of Strings (minus
`offset`, `timestamp`) as valid configs to enable header based compaction.

Personally, I prefer either adding a config or going with
`header=<key>`. Using `_timestamp_`, `_offset_`, and `<key>` might be
good enough (even if this is the solution I like least)---for this case,
we should state explicitly, that the whole space of `_*_` is reserved
and users are not allowed to set those for header compaction. In fact, I
would also add a check for the config that only allows for `_offset_`
and `_timestamp_` and throws an exception for all other `_*_` configs.


-Matthias


On 6/18/18 2:03 PM, Luís Cabral wrote:
> I’m ok with that...
> 
> Ted / Matthias?
> 
> 
> From: Guozhang Wang
> Sent: 18 June 2018 22:49
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
> 
> How about make the preserved values to be "_offset_" and "_timestamp_"
> then? Currently in the KIP they are reserved as "offset" and "timestamp".
> 
> 
> Guozhang
> 
> On Mon, Jun 18, 2018 at 1:40 PM, Luís Cabral <luis_cab...@yahoo.com.invalid>
> wrote:
> 
>> Hi Guozhang,
>>
>> Yes, that is what I meant (separate configs).
>> Though I would still prefer to keep it as it is, as its a much simpler and
>> cleaner approach – I’m not so sure that a potential client would really be
>> so inconvenienced for having to use “_offset” or “_timestamp_” as a header
>>
>> Cheers,
>> Luís
>>
>>
>> From: Guozhang Wang
>> Sent: 18 June 2018 19:35
>> To: dev@kafka.apache.org
>> Subject: Re: [VOTE] KIP-280: Enhanced log compaction
>>
>> Hello Luís,
>>
>> I agree that having an expression evaluation as a config value is not the
>> best approach; if there are better ideas to allow users to specify the
>> header key which happen to be the same as the preserved config values
>> "offset" and "timestamp" (although the likelihood may be small, as Ted
>> mentioned there may be more preserved config values added in the future),
>> then I'd be happily follow the suggestions. For example, we could have the
>> config value for header keys as "header-<key-name>"? Is that what you've
>> suggested? Or do you suggest using two configs instead, and the second
>> config specifying the key name, and will only be considered if the first
>> (i.e. current proposed) config's value is `header`, otherwise be ignored?
>>
>>
>> Guozhang
>>
>>
>> On Mon, Jun 18, 2018 at 12:20 AM, Luís Cabral
>> <luis_cab...@yahoo.com.invalid
>>> wrote:
>>
>>>  Hi Ted / Guozhang / Matthias,
>>>
>>> @Ted: I've now added your argument to the "Rejected Alternatives" portion
>>> of the KIP. Please keep in mind that I would like to keep this as
>> backwards
>>> compatible as possible, so a lot of decisions are inferred from that
>> intent.
>>>
>>> @Guozhang: IMHO, adding expression evaluation to configuration is an
>>> incorrect approach. If you absolutely insist on having this clear
>>> distinction between header/key, then I would suggest instead to have a
>>> dedicated property for the "key" part. Of course, this is your project so
>>> I'll just continue whatever approach moves this KIP forward...
>>>
>>> @Matthias: Sorry, but update the KIP according to what?
>>>
>>> Cheers,
>>> Luís
>>>
>>>     On Monday, June 18, 2018, 2:55:17 AM GMT+2, Matthias J. Sax <
>>> matth...@confluent.io> wrote:
>>>
>>>  Well, for "offset" and "timestamp" policy, not communication between
>>> both is required.
>>>
>>> Only if headers are used, user A should communicate the corresponding
>>> header key to user B.
>>>
>>>
>>> @Luis: can you update the KIP accordingly?
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 6/17/18 5:36 PM, Ted Yu wrote:
>>>> My previous reply was just an alternative for consideration.
>>>>
>>>> bq.  than a second user B can add a header with key "offset" and thus
>>> break
>>>> the intention of user A
>>>>
>>>> I didn't see such scenario after reading the KIP. Maybe add this as
>>>> reasoning for the current approach ?
>>>>
>>>> I wonder how user B gets to know the intention of user A. Meaning, if
>>> user
>>>> B doesn't follow the norm set by user A, there still would be issue,
>>> right ?
>>>>
>>>>
>>>> On Sun, Jun 17, 2018 at 4:58 PM, Matthias J. Sax <
>> matth...@confluent.io>
>>>> wrote:
>>>>
>>>>> Let me rephrase your answer to make sure I understand what you
>> suggest:
>>>>>
>>>>> If compaction strategy is configured to use "offset", and if there is
>> a
>>>>> header in the record with `key == offset`, than we should use the
>> value
>>>>> of the record header instead of the actual record offset?
>>>>>
>>>>> Do I understand this correctly? If yes, what is the advantage of doing
>>>>> this? From my point of view, it might be problematic, because if user
>> A
>>>>> creates a topic and configures "offset" compaction (with the intend
>> that
>>>>> the record offset should be uses), than a second user B can add a
>> header
>>>>> with key "offset" and thus break the intention of user A.
>>>>>
>>>>> Also, if existing topics might have data with record header key
>>>>> "offset", the change would not be backward compatible either.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 6/16/18 6:59 PM, Ted Yu wrote:
>>>>>> Pardon the brevity in my previous reply.
>>>>>> I was talking about this bullet:
>>>>>>
>>>>>> bq. When this configuration is set to anything other than "*offset*"
>>> or "
>>>>>> *timestamp*", then the record headers are scanned for a key matching
>>> this
>>>>>> value.
>>>>>>
>>>>>> My point is that if matching key in the header is found, its value
>>> should
>>>>>> take precedence over the value of the configuration.
>>>>>> I understand that such interpretation may have slight performance
>> cost.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Sat, Jun 16, 2018 at 6:29 PM, Matthias J. Sax <
>>> matth...@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Ted,
>>>>>>>
>>>>>>> I am also not sure what you mean by "Shouldn't the selection in
>> header
>>>>>>> have higher precedence over the configuration"? What selection do
>> you
>>>>>>> mean? And want configuration?
>>>>>>>
>>>>>>>
>>>>>>> About the first point, I think this is actually a valid concern: To
>>>>>>> address this issue, it seems that we would need to change the
>> accepted
>>>>>>> format of the config. Instead of "offset", "timestamp",
>>> "<header-key>",
>>>>>>> we could replace the last one with "header=<header-key>".
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 6/15/18 3:06 AM, Ted Yu wrote:
>>>>>>>> If selection exists in header, the selection should override the
>>> config
>>>>>>> value.
>>>>>>>> Cheers
>>>>>>>> -------- Original message --------From: Luis Cabral
>>>>>>> <luis_cab...@yahoo.com.INVALID> Date: 6/15/18  1:40 AM  (GMT-08:00)
>>> To:
>>>>>>> dev@kafka.apache.org Subject: Re: [VOTE] KIP-280: Enhanced log
>>>>> compaction
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> bq. Can the value be determined now ? My thinking is that what if
>>> there
>>>>>>> is a third compaction strategy proposed in the future ? We should
>>> guard
>>>>>>> against user unknowingly choosing the 'future' strategy.
>>>>>>>>
>>>>>>>> The idea is that the header name to use is flexible, which protects
>>>>>>> current clients that may want to use this from having to adapt their
>>>>>>> already existing header names (they can just specify a new name).
>>>>>>>>
>>>>>>>> bq. Shouldn't the selection in header have higher precedence over
>> the
>>>>>>> configuration ?
>>>>>>>>
>>>>>>>> Not sure what you mean here, could you clarify?
>>>>>>>>
>>>>>>>> bq. Please create JIRA if you haven't already.
>>>>>>>>
>>>>>>>> Done: https://issues.apache.org/jira/browse/KAFKA-7061
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Luís
>>>>>>>>
>>>>>>>>> On 11 Jun 2018, at 01:50, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> bq. When this configuration is set to anything other than
>> "*offset*"
>>>>> or
>>>>>>> "
>>>>>>>>> *timestamp*", then the record headers are scanned for a key
>> matching
>>>>>>> this
>>>>>>>>> value.
>>>>>>>>>
>>>>>>>>> Can the value be determined now ? My thinking is that what if
>> there
>>>>> is a
>>>>>>>>> third compaction strategy proposed in the future ? We should guard
>>>>>>> against
>>>>>>>>> user unknowingly choosing the 'future' strategy.
>>>>>>>>>
>>>>>>>>> bq. If this header is found
>>>>>>>>>
>>>>>>>>> Shouldn't the selection in header have higher precedence over the
>>>>>>> configuration
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> Please create JIRA if you haven't already.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral
>>>>>>> <luis_cab...@yahoo.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Any takers on having a look at this KIP and voting on it?
>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>> 280%3A+Enhanced+log+compaction
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Luis
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to