Re: [DISCUSS] KIP-280: Enhanced log compaction

Matthias J. Sax Tue, 05 Nov 2019 23:43:15 -0800

Thanks for updating the KIP, Senthil.

@Eric: good point about using the last found header for the key instead
of the first!


I don't have any further comments at this point.


-Matthias

On 11/5/19 11:37 AM, Senthilnathan Muthusamy wrote:
> Hi Guozhang,
> 
> Sure and I have made a note in the JIRA item to make sure the wiki is updated.
> 
> Thanks,
> Senthil
> 
> -----Original Message-----
> From: Guozhang Wang <wangg...@gmail.com> 
> Sent: Monday, November 4, 2019 11:00 AM
> To: dev <dev@kafka.apache.org>
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hello Senthilnathan,
> 
> Thanks for revamping on the KIP. I have only one comment about the wiki 
> otherwise LGTM.
> 
> 1. We should emphasize that the newly introduced config yields to the 
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` not 
> `compact`, then the previous config would be ignored.
> 
> 
> Guozhang
> 
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy 
> <senth...@microsoft.com.invalid> wrote:
> 
>> Hi all,
>>
>> I will start the vote thread shortly for this updated KIP. If there 
>> are any more thoughts I would love to hear them.
>>
>> Thanks,
>> Senthil
>>
>> -----Original Message-----
>> From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID>
>> Sent: Thursday, October 31, 2019 3:51 AM
>> To: dev@kafka.apache.org
>> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>>
>> Hi Matthias
>>
>> Thanks for the response.
>>
>> (1) Yes
>>
>> (2) Yes, and the config name will be the same (i.e.
>> `log.cleaner.compaction.strategy` &
>> `log.cleaner.compaction.strategy.header`) at broker level and topic 
>> level (to override broker level default compact strategy). Please let 
>> me know if we need to keep it in different naming convention. Note: 
>> Broker level (which will be in the server.properties) configuration is 
>> optional and default it to offset. Topic level configuration will be 
>> default to broker level config...
>>
>> (3) By this new way, it avoids another config parameter and also in 
>> feature if any new strategy like header need addition info, no 
>> additional config required. As this got discussed already and agreed 
>> to have separate config, I will revert it. KIP updated...
>>
>> (4) Done
>>
>> (5) Updated
>>
>> (6) Updated to pick the first header in the list
>>
>> Please let me know if you have any other questions.
>>
>> Thanks,
>> Senthil
>>
>> -----Original Message-----
>> From: Matthias J. Sax <matth...@confluent.io>
>> Sent: Thursday, October 31, 2019 12:13 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>>
>> Thanks for picking up this KIP, Senthil.
>>
>> (1) As far as I remember, the main issue of the original proposal was 
>> a missing topic level configuration for the compaction strategy. With 
>> this being addressed, I am in favor of this KIP.
>>
>> (2) With regard to (1), it seems we would need a new topic level 
>> config `compaction.strategy`, and `log.cleaner.compaction.strategy` 
>> would be the default strategy (ie, broker level config) if a topic does not 
>> overwrite it?
>>
>> (3) Why did you remove `log.cleaner.compaction.strategy.header`
>> parameter and change the accepted values of 
>> `log.cleaner.compaction.strategy` to "header.<key>" instead of keeping 
>> "header"? The original approach seems to be cleaner, and I think this 
>> was discussed on the original discuss thread already.
>>
>> (4) Nit: For the "timestamp" compaction strategy you changed the KIP 
>> to
>>
>> -> `The record [create] timestamp`
>>
>> This is miss leading IMHO, because it depends on the broker/log 
>> configuration `(log.)message.timestamp.type` that can either be 
>> `CreateTime` or `LogAppendTime` what the actual record timestamp is. I 
>> would just remove "create" to keep it unspecified.
>>
>> (5) Nit: the section "Public Interfaces" should list the newly 
>> introduced configs -- configuration parameters are a public interface.
>>
>> (6) What do you mean by "first level header lookup"? The term "first 
>> level" indicates some hierarchy, but headers don't have any hierarchy 
>> -- it's just a list of key-value pairs? If you mean the _order_ of the 
>> headers, ie, pick the first header in the list that matches the key, 
>> please rephrase it to make it clearer.
>>
>>
>>
>> @Tom: I agree with all you are saying, however, I still think that 
>> this KIP will improve the overall situation, because everything you 
>> pointed out is actually true with offset based compaction, too.
>>
>> The KIP is not a silver bullet that solves all issue for interleaved 
>> writes, but I personally believe, it's a good improvement.
>>
>>
>>
>> -Matthias
>>
>>
>> On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
>>> Hi,
>>>
>>> Please let me know if anyone has any questions on this updated KIP-280...
>>>
>>> Thanks,
>>>
>>> Senthil
>>>
>>> -----Original Message-----
>>> From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID>
>>> Sent: Monday, October 28, 2019 11:36 PM
>>> To: dev@kafka.apache.org
>>> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
>>>
>>> Hi Tom,
>>>
>>> Sorry for the delayed response.
>>>
>>> Regarding the fall back to offset decision for both timestamp & 
>>> header
>> value is based on the previous author discuss
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd
>> 02b86645782be67%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Cse
>> nthilm%40microsoft.com%7C8f6cae776082459c793b08d761595294%7C72f988bf86
>> f141af91ab2d7cd011db47%7C1%7C0%7C637085022516423400&amp;sdata=WpEW5ylu
>> %2FsLMyGS2ULWDZ7vA1OzQwFYWSuioLCbABhM%3D&amp;reserved=0
>> and as per the discussion, it is really required to avoid duplicates.
>>>
>>> And the timestamp strategy is from the original KIP author and we 
>>> are
>> keeping it as is.
>>>
>>> Finally on the sequence order guarantee by the producer, it is not
>> feasible on waiting for ack in async / multi-threads/processes 
>> scenarios and hence the header sequence based compact strategy with 
>> producer's responsibility to have a unique sequence generation for the 
>> topic-partition-key level.
>>>
>>> Hoping this clarifies all your questions. Please let us know if you 
>>> have
>> any further questions.
>>>
>>> @Guozhang Wang / @Matthias J. Sax, I see you both had a detail
>> discussion on the original KIP with previous author and it would great 
>> to hear your inputs as well.
>>>
>>> Thanks,
>>> Senthil
>>>
>>> -----Original Message-----
>>> From: Tom Bentley <tbent...@redhat.com>
>>> Sent: Tuesday, October 22, 2019 2:32 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>>>
>>> Hi Senthilnathan,
>>>
>>> In the motivation isn't it a little misleading to say "On the 
>>> producer side, we clearly preserve an order for the two messages, 
>>> <K1, V1> <K1,
>>> V2>"? IMHO, the semantics of the producer are clear that having an 
>>> V2>observed
>>> order of sending records from different producers is not sufficient 
>>> to
>> guarantee ordering on the broker. You really need to send the 2nd 
>> record only after the 1st record is acked. It's the difficultly of 
>> achieving that in practice that's the true motivation for your KIP.
>>>
>>> I can see the attraction of using timestamps, but it would be 
>>> helpful to
>> explain how that really solves the problem. When the producers are in 
>> different processes on different machines you're relying on their 
>> clocks being synchronized, which is a whole subject in itself. Even if 
>> they're synchronized the resolution of System.currentTimeMillis() is 
>> typically many milliseconds. If your producers are in different 
>> threads of the same process that could be a real problem because it makes 
>> ties quite likely.
>>> And you don't explain why it's OK to resolve ties using the offset. 
>>> The
>> basis of your argument is that the offset is giving you the wrong answer.
>>> So it seems to me that using it as a tiebreaker is just narrowing 
>>> the
>> chances of getting the wrong answer. Maybe none of this matters for 
>> your use case, but I think it should be spelled out in the KIP, 
>> because it surely would matter for similar use cases.
>>>
>>> Using a sequence at least removes the problem of ties, but the
>> interesting bit is now in how you deal with races between 
>> threads/processes in getting a sequence number allocated (which is out 
>> of scope of the KIP, I guess).
>>> How is resolving that race any simpler that resolving the motivating
>> race by waiting for the ack of the first record sent?
>>>
>>> Kind regards,
>>>
>>> Tom
>>>
>>> On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy <
>> senth...@microsoft.com.invalid> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We are bring back the KIP-280 to live with small correct for the 
>>>> discussion & voting. Thanks to previous author Luis Cabral on the
>>>> KIP-280 initiation and we are taking over to complete and get it 
>>>> into
>> 2.4...
>>>>
>>>> Below is the correction that we made to the existing KIP-280:
>>>>
>>>>   *   Allowing the compact strategy configuration at the topic level as
>>>> the log compaction is at the topic level and a broker can have 
>>>> multiple topics. This allows the flexibility to have the strategy 
>>>> at both broker level (i.e. for all topics within the broker) and 
>>>> topic level (i.e. for a subset of topics within a broker) as well...
>>>>
>>>> KIP-280:
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
>>>> wi
>>>> k
>>>> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc
>>>> ed
>>>> %
>>>> 2Blog%2Bcompaction&amp;data=02%7C01%7Csenthilm%40microsoft.com%7C68
>>>> 6c
>>>> 3
>>>> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%
>>>> 7C
>>>> 0
>>>> %7C637073341017520406&amp;sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7
>>>> Gs
>>>> 6
>>>> XqJ%2F5SQ%3D&amp;reserved=0 PULL REQUEST:
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
>>>> it
>>>> h
>>>> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&amp;data=02%7C01%7Csenthilm%4
>>>> 0m
>>>> i
>>>> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91
>>>> ab
>>>> 2
>>>> d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=bt32PgDUjJjpXohE
>>>> Wp
>>>> t
>>>> Fxv6mPERCwcRFlVROzinBtnk%3D&amp;reserved=0 (unit test coverage in
>>>> progress)
>>>>
>>>> Previous Thread DISCUSS:
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>>> is
>>>> t
>>>> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1a
>>>> fa
>>>> 5
>>>> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7
>>>> Cs
>>>> e
>>>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b
>>>> f8
>>>> 6
>>>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=XwcUW
>>>> WY
>>>> D
>>>> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&amp;reserved=0
>>>> Previous Thread VOTE:
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>>> is
>>>> t
>>>> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f765058349
>>>> 78
>>>> 1
>>>> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7
>>>> Cs
>>>> e
>>>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b
>>>> f8
>>>> 6
>>>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=8cKQc
>>>> Am
>>>> 2
>>>> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&amp;reserved=0
>>>>
>>>> Appreciate your timely action.
>>>>
>>>> PS: Initiating a separate thread as I was not able to reply to the 
>>>> existing threads...
>>>>
>>>> Thanks,
>>>> Senthil
>>>>
>>
>>
> 
> --
> -- Guozhang
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-280: Enhanced log compaction

Reply via email to