Thanks for updating the KIP, Senthil. @Eric: good point about using the last found header for the key instead of the first!
I don't have any further comments at this point. -Matthias On 11/5/19 11:37 AM, Senthilnathan Muthusamy wrote: > Hi Guozhang, > > Sure and I have made a note in the JIRA item to make sure the wiki is updated. > > Thanks, > Senthil > > -----Original Message----- > From: Guozhang Wang <wangg...@gmail.com> > Sent: Monday, November 4, 2019 11:00 AM > To: dev <dev@kafka.apache.org> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Hello Senthilnathan, > > Thanks for revamping on the KIP. I have only one comment about the wiki > otherwise LGTM. > > 1. We should emphasize that the newly introduced config yields to the > existing "log.cleanup.policy", i.e. if the latter's value is `delete` not > `compact`, then the previous config would be ignored. > > > Guozhang > > On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy > <senth...@microsoft.com.invalid> wrote: > >> Hi all, >> >> I will start the vote thread shortly for this updated KIP. If there >> are any more thoughts I would love to hear them. >> >> Thanks, >> Senthil >> >> -----Original Message----- >> From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID> >> Sent: Thursday, October 31, 2019 3:51 AM >> To: dev@kafka.apache.org >> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction >> >> Hi Matthias >> >> Thanks for the response. >> >> (1) Yes >> >> (2) Yes, and the config name will be the same (i.e. >> `log.cleaner.compaction.strategy` & >> `log.cleaner.compaction.strategy.header`) at broker level and topic >> level (to override broker level default compact strategy). Please let >> me know if we need to keep it in different naming convention. Note: >> Broker level (which will be in the server.properties) configuration is >> optional and default it to offset. Topic level configuration will be >> default to broker level config... >> >> (3) By this new way, it avoids another config parameter and also in >> feature if any new strategy like header need addition info, no >> additional config required. As this got discussed already and agreed >> to have separate config, I will revert it. KIP updated... >> >> (4) Done >> >> (5) Updated >> >> (6) Updated to pick the first header in the list >> >> Please let me know if you have any other questions. >> >> Thanks, >> Senthil >> >> -----Original Message----- >> From: Matthias J. Sax <matth...@confluent.io> >> Sent: Thursday, October 31, 2019 12:13 AM >> To: dev@kafka.apache.org >> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction >> >> Thanks for picking up this KIP, Senthil. >> >> (1) As far as I remember, the main issue of the original proposal was >> a missing topic level configuration for the compaction strategy. With >> this being addressed, I am in favor of this KIP. >> >> (2) With regard to (1), it seems we would need a new topic level >> config `compaction.strategy`, and `log.cleaner.compaction.strategy` >> would be the default strategy (ie, broker level config) if a topic does not >> overwrite it? >> >> (3) Why did you remove `log.cleaner.compaction.strategy.header` >> parameter and change the accepted values of >> `log.cleaner.compaction.strategy` to "header.<key>" instead of keeping >> "header"? The original approach seems to be cleaner, and I think this >> was discussed on the original discuss thread already. >> >> (4) Nit: For the "timestamp" compaction strategy you changed the KIP >> to >> >> -> `The record [create] timestamp` >> >> This is miss leading IMHO, because it depends on the broker/log >> configuration `(log.)message.timestamp.type` that can either be >> `CreateTime` or `LogAppendTime` what the actual record timestamp is. I >> would just remove "create" to keep it unspecified. >> >> (5) Nit: the section "Public Interfaces" should list the newly >> introduced configs -- configuration parameters are a public interface. >> >> (6) What do you mean by "first level header lookup"? The term "first >> level" indicates some hierarchy, but headers don't have any hierarchy >> -- it's just a list of key-value pairs? If you mean the _order_ of the >> headers, ie, pick the first header in the list that matches the key, >> please rephrase it to make it clearer. >> >> >> >> @Tom: I agree with all you are saying, however, I still think that >> this KIP will improve the overall situation, because everything you >> pointed out is actually true with offset based compaction, too. >> >> The KIP is not a silver bullet that solves all issue for interleaved >> writes, but I personally believe, it's a good improvement. >> >> >> >> -Matthias >> >> >> On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote: >>> Hi, >>> >>> Please let me know if anyone has any questions on this updated KIP-280... >>> >>> Thanks, >>> >>> Senthil >>> >>> -----Original Message----- >>> From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID> >>> Sent: Monday, October 28, 2019 11:36 PM >>> To: dev@kafka.apache.org >>> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction >>> >>> Hi Tom, >>> >>> Sorry for the delayed response. >>> >>> Regarding the fall back to offset decision for both timestamp & >>> header >> value is based on the previous author discuss >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist >> s.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd >> 02b86645782be67%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7Cse >> nthilm%40microsoft.com%7C8f6cae776082459c793b08d761595294%7C72f988bf86 >> f141af91ab2d7cd011db47%7C1%7C0%7C637085022516423400&sdata=WpEW5ylu >> %2FsLMyGS2ULWDZ7vA1OzQwFYWSuioLCbABhM%3D&reserved=0 >> and as per the discussion, it is really required to avoid duplicates. >>> >>> And the timestamp strategy is from the original KIP author and we >>> are >> keeping it as is. >>> >>> Finally on the sequence order guarantee by the producer, it is not >> feasible on waiting for ack in async / multi-threads/processes >> scenarios and hence the header sequence based compact strategy with >> producer's responsibility to have a unique sequence generation for the >> topic-partition-key level. >>> >>> Hoping this clarifies all your questions. Please let us know if you >>> have >> any further questions. >>> >>> @Guozhang Wang / @Matthias J. Sax, I see you both had a detail >> discussion on the original KIP with previous author and it would great >> to hear your inputs as well. >>> >>> Thanks, >>> Senthil >>> >>> -----Original Message----- >>> From: Tom Bentley <tbent...@redhat.com> >>> Sent: Tuesday, October 22, 2019 2:32 AM >>> To: dev@kafka.apache.org >>> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction >>> >>> Hi Senthilnathan, >>> >>> In the motivation isn't it a little misleading to say "On the >>> producer side, we clearly preserve an order for the two messages, >>> <K1, V1> <K1, >>> V2>"? IMHO, the semantics of the producer are clear that having an >>> V2>observed >>> order of sending records from different producers is not sufficient >>> to >> guarantee ordering on the broker. You really need to send the 2nd >> record only after the 1st record is acked. It's the difficultly of >> achieving that in practice that's the true motivation for your KIP. >>> >>> I can see the attraction of using timestamps, but it would be >>> helpful to >> explain how that really solves the problem. When the producers are in >> different processes on different machines you're relying on their >> clocks being synchronized, which is a whole subject in itself. Even if >> they're synchronized the resolution of System.currentTimeMillis() is >> typically many milliseconds. If your producers are in different >> threads of the same process that could be a real problem because it makes >> ties quite likely. >>> And you don't explain why it's OK to resolve ties using the offset. >>> The >> basis of your argument is that the offset is giving you the wrong answer. >>> So it seems to me that using it as a tiebreaker is just narrowing >>> the >> chances of getting the wrong answer. Maybe none of this matters for >> your use case, but I think it should be spelled out in the KIP, >> because it surely would matter for similar use cases. >>> >>> Using a sequence at least removes the problem of ties, but the >> interesting bit is now in how you deal with races between >> threads/processes in getting a sequence number allocated (which is out >> of scope of the KIP, I guess). >>> How is resolving that race any simpler that resolving the motivating >> race by waiting for the ack of the first record sent? >>> >>> Kind regards, >>> >>> Tom >>> >>> On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy < >> senth...@microsoft.com.invalid> wrote: >>> >>>> Hi All, >>>> >>>> We are bring back the KIP-280 to live with small correct for the >>>> discussion & voting. Thanks to previous author Luis Cabral on the >>>> KIP-280 initiation and we are taking over to complete and get it >>>> into >> 2.4... >>>> >>>> Below is the correction that we made to the existing KIP-280: >>>> >>>> * Allowing the compact strategy configuration at the topic level as >>>> the log compaction is at the topic level and a broker can have >>>> multiple topics. This allows the flexibility to have the strategy >>>> at both broker level (i.e. for all topics within the broker) and >>>> topic level (i.e. for a subset of topics within a broker) as well... >>>> >>>> KIP-280: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc >>>> wi >>>> k >>>> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc >>>> ed >>>> % >>>> 2Blog%2Bcompaction&data=02%7C01%7Csenthilm%40microsoft.com%7C68 >>>> 6c >>>> 3 >>>> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1% >>>> 7C >>>> 0 >>>> %7C637073341017520406&sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7 >>>> Gs >>>> 6 >>>> XqJ%2F5SQ%3D&reserved=0 PULL REQUEST: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg >>>> it >>>> h >>>> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&data=02%7C01%7Csenthilm%4 >>>> 0m >>>> i >>>> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91 >>>> ab >>>> 2 >>>> d7cd011db47%7C1%7C0%7C637073341017520406&sdata=bt32PgDUjJjpXohE >>>> Wp >>>> t >>>> Fxv6mPERCwcRFlVROzinBtnk%3D&reserved=0 (unit test coverage in >>>> progress) >>>> >>>> Previous Thread DISCUSS: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl >>>> is >>>> t >>>> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1a >>>> fa >>>> 5 >>>> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7 >>>> Cs >>>> e >>>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b >>>> f8 >>>> 6 >>>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&sdata=XwcUW >>>> WY >>>> D >>>> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&reserved=0 >>>> Previous Thread VOTE: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl >>>> is >>>> t >>>> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f765058349 >>>> 78 >>>> 1 >>>> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7 >>>> Cs >>>> e >>>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b >>>> f8 >>>> 6 >>>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&sdata=8cKQc >>>> Am >>>> 2 >>>> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&reserved=0 >>>> >>>> Appreciate your timely action. >>>> >>>> PS: Initiating a separate thread as I was not able to reply to the >>>> existing threads... >>>> >>>> Thanks, >>>> Senthil >>>> >> >> > > -- > -- Guozhang >
signature.asc
Description: OpenPGP digital signature