RE: [DISCUSS] KIP-280: Enhanced log compaction

Senthilnathan Muthusamy Mon, 11 Nov 2019 17:16:31 -0800

Hi Jun,

Thanks for the response and please find below the response!


#50 - got it...

#51 - not sure how the last record will be deleted bcoz of this new compact 
strategy. The reason I am asking is, the compaction is based out of offsetmap 
and the new strategy logic is purely within the offsetmap... the offsetmap will 
always keep track of the latest offset irrespective of the compaction strategy. 
You can have a look at the PR of the new compaction strategy changes: 
https://github.com/apache/kafka/pull/7528/files 

#52 - sure, I have updated JIRA to include this details in the wiki.

#53 - as I am pointed out in #51, the tombstone is abstract to this change 
(i.e. the tombstone is handled within LogCleaner and the compact strategy is by 
the offsetmap). this is what my understand on the tombstone based on the code 
walk-thru... please let me know if I am missing anything here...

Thanks,
Senthil

-----Original Message-----
From: Jun Rao <j...@confluent.io> 
Sent: Thursday, November 7, 2019 4:32 PM
To: dev <dev@kafka.apache.org>
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Hi, Senthil,

Thanks for bringing back this KIP. Overall, this seems like a useful feature. A 
few comments below.

50. One use case for the timestamp based compaction is to resolve conflicts 
during data center failures. The failover of a data center typically happens 
much longer tha millisec. So, timestamp could be enough to determine the value 
to keep.

51. With the timestamp/header strategy, it seems that it may now be possible 
that the last record could be removed during compaction. For example, if the 
active segment is empty, the last record in the previous segment could be 
removed due to compaction. A new replica then won't see the true end offset of 
the partition. If that replica ever becomes the leader, it could write a 
different record on the same end offset, which will be weird.

52. With the timestamp/header strategy, the behavior of the application may 
need to change. In particular, the application can't just blindly take the 
record with a larger offset and assuming that it's the value to keep. It needs 
to check the timestamp or the header now. So, it would be useful to at least 
document this.

53. This also adds complexity for deletes. Currently, we use a null payload to 
indicate a delete tombstone. The tombstone can be removed once all previous 
records with the same key have been removed. If the new strategies apply to 
tombstones, it's not clear when a tombstone can be removed since subsequent 
records could have timestamp/sequenceId smaller than that in the tombstone. It 
would be useful to think this through and document the expected behavior.

Jun

On Tue, Nov 5, 2019 at 11:37 AM Senthilnathan Muthusamy 
<senth...@microsoft.com.invalid> wrote:

> Hi Guozhang,
>
> Sure and I have made a note in the JIRA item to make sure the wiki is 
> updated.
>
> Thanks,
> Senthil
>
> -----Original Message-----
> From: Guozhang Wang <wangg...@gmail.com>
> Sent: Monday, November 4, 2019 11:00 AM
> To: dev <dev@kafka.apache.org>
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Senthilnathan,
>
> Thanks for revamping on the KIP. I have only one comment about the 
> wiki otherwise LGTM.
>
> 1. We should emphasize that the newly introduced config yields to the 
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` 
> not `compact`, then the previous config would be ignored.
>
>
> Guozhang
>
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy < 
> senth...@microsoft.com.invalid> wrote:
>
> > Hi all,
> >
> > I will start the vote thread shortly for this updated KIP. If there 
> > are any more thoughts I would love to hear them.
> >
> > Thanks,
> > Senthil
> >
> > -----Original Message-----
> > From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID>
> > Sent: Thursday, October 31, 2019 3:51 AM
> > To: dev@kafka.apache.org
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Matthias
> >
> > Thanks for the response.
> >
> > (1) Yes
> >
> > (2) Yes, and the config name will be the same (i.e.
> > `log.cleaner.compaction.strategy` &
> > `log.cleaner.compaction.strategy.header`) at broker level and topic 
> > level (to override broker level default compact strategy). Please 
> > let me know if we need to keep it in different naming convention. Note:
> > Broker level (which will be in the server.properties) configuration 
> > is optional and default it to offset. Topic level configuration will 
> > be default to broker level config...
> >
> > (3) By this new way, it avoids another config parameter and also in 
> > feature if any new strategy like header need addition info, no 
> > additional config required. As this got discussed already and agreed 
> > to have separate config, I will revert it. KIP updated...
> >
> > (4) Done
> >
> > (5) Updated
> >
> > (6) Updated to pick the first header in the list
> >
> > Please let me know if you have any other questions.
> >
> > Thanks,
> > Senthil
> >
> > -----Original Message-----
> > From: Matthias J. Sax <matth...@confluent.io>
> > Sent: Thursday, October 31, 2019 12:13 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Thanks for picking up this KIP, Senthil.
> >
> > (1) As far as I remember, the main issue of the original proposal 
> > was a missing topic level configuration for the compaction strategy. 
> > With this being addressed, I am in favor of this KIP.
> >
> > (2) With regard to (1), it seems we would need a new topic level 
> > config `compaction.strategy`, and `log.cleaner.compaction.strategy` 
> > would be the default strategy (ie, broker level config) if a topic 
> > does
> not overwrite it?
> >
> > (3) Why did you remove `log.cleaner.compaction.strategy.header`
> > parameter and change the accepted values of 
> > `log.cleaner.compaction.strategy` to "header.<key>" instead of 
> > keeping "header"? The original approach seems to be cleaner, and I 
> > think this was discussed on the original discuss thread already.
> >
> > (4) Nit: For the "timestamp" compaction strategy you changed the KIP 
> > to
> >
> > -> `The record [create] timestamp`
> >
> > This is miss leading IMHO, because it depends on the broker/log 
> > configuration `(log.)message.timestamp.type` that can either be 
> > `CreateTime` or `LogAppendTime` what the actual record timestamp is. 
> > I would just remove "create" to keep it unspecified.
> >
> > (5) Nit: the section "Public Interfaces" should list the newly 
> > introduced configs -- configuration parameters are a public interface.
> >
> > (6) What do you mean by "first level header lookup"? The term "first 
> > level" indicates some hierarchy, but headers don't have any 
> > hierarchy
> > -- it's just a list of key-value pairs? If you mean the _order_ of 
> > the headers, ie, pick the first header in the list that matches the 
> > key, please rephrase it to make it clearer.
> >
> >
> >
> > @Tom: I agree with all you are saying, however, I still think that 
> > this KIP will improve the overall situation, because everything you 
> > pointed out is actually true with offset based compaction, too.
> >
> > The KIP is not a silver bullet that solves all issue for interleaved 
> > writes, but I personally believe, it's a good improvement.
> >
> >
> >
> > -Matthias
> >
> >
> > On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> > > Hi,
> > >
> > > Please let me know if anyone has any questions on this updated
> KIP-280...
> > >
> > > Thanks,
> > >
> > > Senthil
> > >
> > > -----Original Message-----
> > > From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID>
> > > Sent: Monday, October 28, 2019 11:36 PM
> > > To: dev@kafka.apache.org
> > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Tom,
> > >
> > > Sorry for the delayed response.
> > >
> > > Regarding the fall back to offset decision for both timestamp & 
> > > header
> > value is based on the previous author discuss 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > st 
> > s.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbcc
> > dd 
> > 02b86645782be67%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7C
> > se
> > nthilm%40microsoft.com%7C8f6cae776082459c793b08d761595294%7C72f988bf
> > 86 
> > f141af91ab2d7cd011db47%7C1%7C0%7C637085022516423400&amp;sdata=WpEW5y
> > lu
> > %2FsLMyGS2ULWDZ7vA1OzQwFYWSuioLCbABhM%3D&amp;reserved=0
> > and as per the discussion, it is really required to avoid duplicates.
> > >
> > > And the timestamp strategy is from the original KIP author and we 
> > > are
> > keeping it as is.
> > >
> > > Finally on the sequence order guarantee by the producer, it is not
> > feasible on waiting for ack in async / multi-threads/processes 
> > scenarios and hence the header sequence based compact strategy with 
> > producer's responsibility to have a unique sequence generation for 
> > the topic-partition-key level.
> > >
> > > Hoping this clarifies all your questions. Please let us know if 
> > > you have
> > any further questions.
> > >
> > > @Guozhang Wang / @Matthias J. Sax, I see you both had a detail
> > discussion on the original KIP with previous author and it would 
> > great to hear your inputs as well.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -----Original Message-----
> > > From: Tom Bentley <tbent...@redhat.com>
> > > Sent: Tuesday, October 22, 2019 2:32 AM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Senthilnathan,
> > >
> > > In the motivation isn't it a little misleading to say "On the 
> > > producer side, we clearly preserve an order for the two messages, 
> > > <K1, V1> <K1,
> > > V2>"? IMHO, the semantics of the producer are clear that having an 
> > > V2>observed
> > > order of sending records from different producers is not 
> > > sufficient to
> > guarantee ordering on the broker. You really need to send the 2nd 
> > record only after the 1st record is acked. It's the difficultly of 
> > achieving that in practice that's the true motivation for your KIP.
> > >
> > > I can see the attraction of using timestamps, but it would be 
> > > helpful to
> > explain how that really solves the problem. When the producers are 
> > in different processes on different machines you're relying on their 
> > clocks being synchronized, which is a whole subject in itself. Even 
> > if they're synchronized the resolution of System.currentTimeMillis() 
> > is typically many milliseconds. If your producers are in different 
> > threads of the same process that could be a real problem because it
> makes ties quite likely.
> > > And you don't explain why it's OK to resolve ties using the offset.
> > > The
> > basis of your argument is that the offset is giving you the wrong answer.
> > > So it seems to me that using it as a tiebreaker is just narrowing 
> > > the
> > chances of getting the wrong answer. Maybe none of this matters for 
> > your use case, but I think it should be spelled out in the KIP, 
> > because it surely would matter for similar use cases.
> > >
> > > Using a sequence at least removes the problem of ties, but the
> > interesting bit is now in how you deal with races between 
> > threads/processes in getting a sequence number allocated (which is 
> > out of scope of the KIP, I guess).
> > > How is resolving that race any simpler that resolving the 
> > > motivating
> > race by waiting for the ack of the first record sent?
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy <
> > senth...@microsoft.com.invalid> wrote:
> > >
> > >> Hi All,
> > >>
> > >> We are bring back the KIP-280 to live with small correct for the 
> > >> discussion & voting. Thanks to previous author Luis Cabral on the
> > >> KIP-280 initiation and we are taking over to complete and get it 
> > >> into
> > 2.4...
> > >>
> > >> Below is the correction that we made to the existing KIP-280:
> > >>
> > >>   *   Allowing the compact strategy configuration at the topic level
> as
> > >> the log compaction is at the topic level and a broker can have 
> > >> multiple topics. This allows the flexibility to have the strategy 
> > >> at both broker level (i.e. for all topics within the broker) and 
> > >> topic level (i.e. for a subset of topics within a broker) as well...
> > >>
> > >> KIP-280:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fc
> > >> wi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnha
> > >> nc
> > >> ed
> > >> %
> > >> 2Blog%2Bcompaction&amp;data=02%7C01%7Csenthilm%40microsoft.com%7C
> > >> 68
> > >> 6c
> > >> 3
> > >> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C
> > >> 1%
> > >> 7C
> > >> 0
> > >> %7C637073341017520406&amp;sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeA
> > >> a7
> > >> Gs
> > >> 6
> > >> XqJ%2F5SQ%3D&amp;reserved=0 PULL REQUEST:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fg
> > >> it
> > >> h
> > >> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&amp;data=02%7C01%7Csenthilm
> > >> %4
> > >> 0m
> > >> i
> > >> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af
> > >> 91
> > >> ab
> > >> 2
> > >> d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=bt32PgDUjJjpXo
> > >> hE
> > >> Wp
> > >> t
> > >> Fxv6mPERCwcRFlVROzinBtnk%3D&amp;reserved=0 (unit test coverage in
> > >> progress)
> > >>
> > >> Previous Thread DISCUSS:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fl
> > >> is
> > >> t
> > >> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a
> > >> 1a
> > >> fa
> > >> 5
> > >> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01
> > >> %7
> > >> Cs
> > >> e
> > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f98
> > >> 8b
> > >> f8
> > >> 6
> > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=Xwc
> > >> UW
> > >> WY
> > >> D
> > >> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&amp;reserved=0
> > >> Previous Thread VOTE:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > >> Fl
> > >> is
> > >> t
> > >> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f7650583
> > >> 49
> > >> 78
> > >> 1
> > >> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01
> > >> %7
> > >> Cs
> > >> e
> > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f98
> > >> 8b
> > >> f8
> > >> 6
> > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=8cK
> > >> Qc
> > >> Am
> > >> 2
> > >> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&amp;reserved=0
> > >>
> > >> Appreciate your timely action.
> > >>
> > >> PS: Initiating a separate thread as I was not able to reply to 
> > >> the existing threads...
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> >
> >
>
> --
> -- Guozhang
>

RE: [DISCUSS] KIP-280: Enhanced log compaction

Reply via email to