Hi Jun / Ismael, Any chance to get your opinion on this? Thanks in advance!
Regards, Luís > On 22 May 2018, at 17:30, Guozhang Wang <wangg...@gmail.com> wrote: > > Hello Luís, > > While reviewing your PR I realized my previous calculation on the memory > usage was incorrect: in fact, in the current implementation, each entry in > the memory-bounded cache is 16 (default MD5 hash digest length) + 8 (long > type) = 24 bytes, and if we add the long-typed version value it is 32 > bytes. I.e. each entry will be increased by 33%, not doubling. > > After redoing the math I'm bit leaning towards just adding this entry for > all cases rather than treating timestamp differently with others (sorry for > being back and forth, but I just want to make sure we've got a good balance > between efficiency and semantics consistency). I've also chatted with Jun > and Ismael about this (cc'ed), and maybe you guys can chime in here as well. > > > Guozhang > > > On Tue, May 22, 2018 at 6:45 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> > wrote: > >> Hi Matthias / Guozhang, >> >> Were the questions clarified? >> Please feel free to add more feedback, otherwise it would be nice to move >> this topic onwards 😊 >> >> Kind Regards, >> Luís Cabral >> >> From: Guozhang Wang >> Sent: 09 May 2018 20:00 >> To: dev@kafka.apache.org >> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction >> >> I have thought about being consistency in strategy v.s. practical concerns >> about storage convenience to its impact on compaction effectiveness. >> >> The different between timestamp and the header key-value pairs is that for >> the latter, as I mentioned before, "it is arguably out of Kafka's control, >> and indeed users may (mistakenly) generate many records with the same key >> and the same header value." So giving up tie breakers may result in very >> poor compaction effectiveness when it happens, while for timestamps the >> likelihood of this is considered very small. >> >> >> Guozhang >> >> >> On Sun, May 6, 2018 at 8:55 PM, Matthias J. Sax <matth...@confluent.io> >> wrote: >> >>> Thanks. >>> >>> To reverse the question: if this argument holds, why does it not apply >>> to the case when the header key is used as compaction attribute? >>> >>> I am not against keeping both records in case timestamps are equal, but >>> shouldn't we apply the same strategy for all cases and don't use offset >>> as tie-breaker at all? >>> >>> >>> -Matthias >>> >>>> On 5/6/18 8:47 PM, Guozhang Wang wrote: >>>> Hello Matthias, >>>> >>>> The related discussion was in the PR: >>>> https://github.com/apache/kafka/pull/4822#discussion_r184588037 >>>> >>>> The concern is that, to use offset as tie breaker we need to double the >>>> entry size of the entry in bounded compaction cache, and hence largely >>>> reduce the effectiveness of the compaction itself. Since with >>> milliseconds >>>> timestamp the scenario of ties with the same key is expected to be >>> small, I >>>> think it would be a reasonable tradeoff to make. >>>> >>>> >>>> Guozhang >>>> >>>> On Sun, May 6, 2018 at 9:37 AM, Matthias J. Sax <matth...@confluent.io >>> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I just updated myself on this KIP. One question (maybe it was >> discussed >>>>> and I missed it). What is the motivation to not use the offset as tie >>>>> breaker for the "timestamp" case? Isn't this inconsistent behavior? >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>>> On 5/2/18 2:07 PM, Guozhang Wang wrote: >>>>>> Hello Luís, >>>>>> >>>>>> Sorry for the late reply. >>>>>> >>>>>> My understanding is that such duplicates will only happen if the >>>>> non-offset >>>>>> version value, either the timestamp or some long-typed header key, >> are >>>>> the >>>>>> same (i.e. we cannot break ties). >>>>>> >>>>>> 1. For timestamp, which is in milli-seconds, I think in practice the >>>>>> likelihood of records with the same key and the same milli-sec >>> timestamp >>>>>> are very small. And hence the duplicate amount should be very small. >>>>>> >>>>>> 2. For long-typed header key, it is arguably out of Kafka's control, >>> and >>>>>> indeed users may (mistakenly) generate many records with the same key >>> and >>>>>> the same header value. >>>>>> >>>>>> >>>>>> So I'd like to propose a counter-offer: for 1), we still use only 8 >>> bytes >>>>>> and allows for potential duplicates due to ties; for 2) we use 16 >> bytes >>>>> to >>>>>> always break ties. The motivation for distinguishing 1) and 2), is >> that >>>>> my >>>>>> expectation for 1) would be much common, and hence worth special >>> handling >>>>>> it to be more effective in cleaning. WDYT? >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> >>>>>> >>>>>> On Wed, May 2, 2018 at 2:36 AM, Luís Cabral >>>>> <luis_cab...@yahoo.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Hi Guozhang, >>>>>>> >>>>>>> Have you managed to have a look at my reply? >>>>>>> How do you feel about this? >>>>>>> >>>>>>> Kind Regards, >>>>>>> Luís Cabral >>>>>>> On Monday, April 30, 2018, 9:27:15 AM GMT+2, Luís Cabral < >>>>>>> luis_cab...@yahoo.com> wrote: >>>>>>> >>>>>>> Hi Guozhang, >>>>>>> >>>>>>> I understand the argument, but this is a hazardous compromise for >>> using >>>>>>> Kafka as an event store (as is my original intention). >>>>>>> >>>>>>> I expect to have many duplicated messages in Kafka as the overall >>>>>>> architecture being used allows for the producer to re-send a fresh >>>>> state of >>>>>>> the backed data into Kafka.Though this scenario is not common, as >> the >>>>>>> intention is for Kafka to bear the weight of replaying all the >> records >>>>> for >>>>>>> new consumers, but it will occasionally happen. >>>>>>> >>>>>>> As there are plenty of records which are not updated frequently, >> this >>>>>>> would leave the topic with a surplus of quite a few million >> duplicate >>>>>>> records (and increasing every time the above mentioned function is >>>>> applied). >>>>>>> >>>>>>> I would prefer to have the temporary memory footprint of 8 bytes per >>>>>>> record whenever the compaction is run (only when not in 'offset' >>> mode), >>>>>>> than allowing for the topic to run into this state. >>>>>>> >>>>>>> What do you think? Is this scenario too specific for me, or do you >>>>> believe >>>>>>> that it could happen to other clients as well? >>>>>>> >>>>>>> Thanks again for the continued discussion! >>>>>>> Cheers, >>>>>>> Luis On Friday, April 27, 2018, 8:21:13 PM GMT+2, Guozhang Wang < >>>>>>> wangg...@gmail.com> wrote: >>>>>>> >>>>>>> Hello Luis, >>>>>>> >>>>>>> When the comparing the version returns `equal`, the original >> proposal >>>>> is to >>>>>>> use the offset as the tie breaker. My previous comment is that >>>>>>> >>>>>>> 1) when we build the map calling `put`, if there is already an entry >>> for >>>>>>> the key, compare its stored version, and replace if the put record's >>>>>>> version is "no smaller than" the stored record: this is because when >>>>>>> building the map we are always going from smaller offsets to larger >>>>> ones. >>>>>>> >>>>>>> 2) when making a second pass to determine if each record should be >>>>> retained >>>>>>> based on the map, we do not try to break the tie if the map's >> returned >>>>>>> version is the same but always treat it as "keep". In this case when >>> we >>>>> are >>>>>>> comparing a record with itself stored in the offset map, version >>>>> comparison >>>>>>> would return `equals`. As I mentioned in the PR, one caveat is that >> we >>>>> may >>>>>>> indeed have multiple records with the same key and the same version, >>> but >>>>>>> once a new versioned record is appended it will be deleted. >>>>>>> >>>>>>> >>>>>>> Does that make sense? >>>>>>> >>>>>>> Guozhang >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 27, 2018 at 4:20 AM, Luís Cabral >>>>> <luis_cab...@yahoo.com.invalid >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I was updating the PR to match the latest decisions and noticed (or >>>>>>>> rather, the integration tests noticed) that without storing the >>> offset, >>>>>>>> then the cache doesn't know when to keep the record itself. >>>>>>>> >>>>>>>> This is because, after the cache is populated, all the records are >>>>>>>> compared against the stored ones, so "Record{key:A,offset:1, >>>>> version:1}" >>>>>>>> will compare against itself and be flagged as "don't keep", since >> we >>>>> only >>>>>>>> compared based on the version and didn't check to see if the offset >>> was >>>>>>> the >>>>>>>> same or not. >>>>>>>> >>>>>>>> This sort of invalidates not storing the offset in the cache, >>>>>>>> unfortunately, and the binary footprint increases two-fold when >>>>> "offset" >>>>>>> is >>>>>>>> not used as a compaction strategy. >>>>>>>> >>>>>>>> Guozhang: Is it ok with you if we go back on this decision and >> leave >>>>> the >>>>>>>> offset as a tie-breaker? >>>>>>>> >>>>>>>> >>>>>>>> Kind Regards,Luis >>>>>>>> >>>>>>>> On Friday, April 27, 2018, 11:11:55 AM GMT+2, Luís Cabral >>>>>>>> <luis_cab...@yahoo.com.INVALID> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> The KIP is now updated with the results of the byte array >> discussion. >>>>>>>> >>>>>>>> This is my first contribution to Kafka, so I'm not sure on what the >>>>>>>> processes are. Is it now acceptable to take this into a vote, or >>>>> should I >>>>>>>> ask for more contributors to join the discussion first? >>>>>>>> >>>>>>>> Kind Regards,Luis On Friday, April 27, 2018, 6:12:03 AM GMT+2, >>>>>>> Guozhang >>>>>>>> Wang <wangg...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hello Luís, >>>>>>>> >>>>>>>>> Offset is an integer? I've only noticed it being resolved as a >> long >>> so >>>>>>>> far. >>>>>>>> >>>>>>>> You are right, offset is a long. >>>>>>>> >>>>>>>> As for timestamp / other types, I left a comment in your PR about >>>>>>> handling >>>>>>>> tie breakers. >>>>>>>> >>>>>>>>> Given these arguments, is this point something that you absolutely >>>>> must >>>>>>>> have? >>>>>>>> >>>>>>>> No I do not have a strong use case in mind to go with arbitrary >> byte >>>>>>>> arrays, was just thinking that if we are going to enhance log >>>>> compaction >>>>>>>> why not generalize it more :) >>>>>>>> >>>>>>>> Your concern about the memory usage makes sense. I'm happy to take >> my >>>>>>>> suggestion back and enforce only long typed fields. >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Apr 26, 2018 at 1:44 AM, Luís Cabral >>>>>>> <luis_cab...@yahoo.com.invalid >>>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> bq. have a integer typed OffsetMap (for offset) >>>>>>>>> >>>>>>>>> Offset is an integer? I've only noticed it being resolved as a >> long >>> so >>>>>>>> far. >>>>>>>>> >>>>>>>>> >>>>>>>>> bq. long typed OffsetMap (for timestamp) >>>>>>>>> >>>>>>>>> We would still need to store the offset, as it is functioning as a >>>>>>>>> tie-breaker. Not that this is a big deal, we can be easily have >> both >>>>>>> (as >>>>>>>>> currently done on the PR). >>>>>>>>> >>>>>>>>> >>>>>>>>> bq. For the byte array typed offset map, we can use a general >>> hashmap, >>>>>>>>> where the hashmap's CAPACITY will be reasoned from the given "val >>>>>>> memory: >>>>>>>>> Int" parameter >>>>>>>>> >>>>>>>>> If you have a map with 128 byte capacity, then store a value with >> 16 >>>>>>>> bytes >>>>>>>>> and another with 32 bytes, how many free slots do you have left in >>>>> this >>>>>>>> map? >>>>>>>>> >>>>>>>>> You can make this work, but I think you would need to re-design >> the >>>>>>> whole >>>>>>>>> log cleaner approach, which implies changing some of the already >>>>>>> existing >>>>>>>>> configurations (like "log.cleaner.io.buffer.load.factor"). I >> would >>>>>>>> rather >>>>>>>>> maintain backwards compatibility as much as possible in this KIP, >>> and >>>>>>> if >>>>>>>>> this means that using "foo" / "bar" or "2.1-a" / "3.20-b" as >> record >>>>>>>>> versions is not viable, then so be it. >>>>>>>>> >>>>>>>>> Given these arguments, is this point something that you absolutely >>>>> must >>>>>>>>> have? I'm still sort of hoping that you are just entertaining the >>> idea >>>>>>>> and >>>>>>>>> are ok with having a long (now conceded to be unsigned, so the >> byte >>>>>>>> arrays >>>>>>>>> can be compared directly). >>>>>>>>> >>>>>>>>> >>>>>>>>> Kind Regards,Luis >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -- Guozhang >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- Guozhang >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> -- >> -- Guozhang >> >> > > > -- > -- Guozhang