Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Guozhang Wang
19978 > .yiv6853119978MsoChpDefault {} _filtered #yiv6853119978 {margin:72.0pt > 72.0pt 72.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 > {}#yiv6853119978 > That is definitely clearer, KIP updated! > > > > From: Guozhang Wang > Sent: 23 April 2018 23:44 > To: dev@kaf

Re: RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-24 Thread Luís Cabral
2.0pt 72.0pt;}#yiv6853119978 div.yiv6853119978WordSection1 {}#yiv6853119978 That is definitely clearer, KIP updated!   From: Guozhang Wang Sent: 23 April 2018 23:44 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction   Thanks Luís. The KIP looks good to me. Just

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
That is definitely clearer, KIP updated! From: Guozhang Wang Sent: 23 April 2018 23:44 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Thanks Luís. The KIP looks good to me. Just that what I left as a minor: `When both records being compared contain a

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
wrote: > Hello Guozhang, > > The KIP is now updated to reflect this choice in strategy. > Please let me know your thoughts there. > > Kind Regards, > Luís > > From: Guozhang Wang > Sent: 23 April 2018 19:32 > To: dev@kafka.apache.org > Subject: Re: RE: [DISCUSS]

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hello Guozhang, The KIP is now updated to reflect this choice in strategy. Please let me know your thoughts there. Kind Regards, Luís From: Guozhang Wang Sent: 23 April 2018 19:32 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Hi Luis, I think by

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
ich ends up > being ok for my own use case... > This would then generally guarantee the lexicographic ordering, as you say. > Is this what you mean? Should I then add this restriction to the KIP? > > Cheers, > Luis > > From: Guozhang Wang > Sent: 23 April 2018 17:55 > To:

RE: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
antee the lexicographic ordering, as you say. Is this what you mean? Should I then add this restriction to the KIP? Cheers, Luis From: Guozhang Wang Sent: 23 April 2018 17:55 To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction Hello Luis, Thanks for your

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Guozhang Wang
Hello Luis, Thanks for your email, replying to your points in the following: > I don't personally see advantages in it, but also the only disadvantage that I can think of is putting multiple meanings on this field. If we do not treat timestamp as a special value of the config, then I cannot use

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-23 Thread Luís Cabral
Hi Guozhang, Thank you very much for the patience in explaining your points, I've learnt quite a bit in researching and experimenting after your replies. bq. I still think it is worth defining `timestamp` as a special compaction value I don't personally see advantages in it, but also the only

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Guozhang Wang
Hi Luís, What I'm thinking primarily is that we only need to compare the compaction values as LONG for the offset and timestmap "type" (I still think it is worth defining `timestamp` as a special compaction value, with the reasons below). Not sure if you've seen my other comment earlier regarding

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-20 Thread Luís Cabral
Guozhang, is this reply ok with you? If you insist on the byte[] comparison directly, then I would need some suggestions on how to represent a "version" with it, and then the KIP could be changed to that. On Tuesday, April 17, 2018, 2:44:16 PM GMT+2, Luís Cabral wrote: Oops, missed

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Oops, missed that email... bq. It is because when we compare the bytes we do not treat them as longs atall, so we just compare them based on bytes; I admit that if users's headertypes have some semantic meanings (e.g. it is encoded from a long) they weare forcing them to choose the encoder that

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Ted Yu
Can you respond to: http://search-hadoop.com/m/Kafka/uyzND1OlYaSzZ3SM1?subj=Re+RE+DISCUSS+KIP+280+Enhanced+log+compaction Original message From: Luís Cabral Date: 4/17/18 2:41 AM (GMT-08:00) To: dev@kafka.apache.org Subject: Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-17 Thread Luís Cabral
Hi all, There aren't that many discussions on this KIP, does that mean it should now move to voting? I'm not sure on the process here... Cheers

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Yup, lazy copy-paste punishment :P Guozhang On Wed, Apr 11, 2018 at 10:19 AM, Ted Yu wrote: > bq. 2. if the config value is "timestamp", look into the offset field; > > I think you meant looking into timestamp field. > > Cheers > > On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang > wrote: > >

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
If you are referring to, for example: -4611686018427387904 > 0 -4611686018427387904 > 4611686018427387903 It is because when we compare the bytes we do not treat them as longs at all, so we just compare them based on bytes; I admit that if users's header types have some semantic meanings (e.g. i

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Ted Yu
bq. 2. if the config value is "timestamp", look into the offset field; I think you meant looking into timestamp field. Cheers On Wed, Apr 11, 2018 at 10:18 AM, Guozhang Wang wrote: > > I do not mean that it is "used", but if what you meant is that you would > prefer to use that field instead o

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
> I do not mean that it is "used", but if what you meant is that you would prefer to use that field instead of a header? > This is in relation to a previous point of yours: I think maybe we have a mis-communication here: I'm not against the idea of using headers, but just trying to argue that we c

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral
Hi Guozhang, bq. I'm not sure I understand you statement that it is used to determine the "version" of the record I do not mean that it is "used", but if what you meant is that you would prefer to use that field instead of a header? This is in relation to a previous point of yours: >>> 1) I'm

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Guozhang Wang
Hello Luís, Regarding the timestamp: it is designed to be mainly used for indicating the time when this record is generated (i.e. CREATE_TIME at the producer side, it will set the timestamp), or when the record has been appended to Kafka brokers (i.e. LOG_APPEND_TIME at the broker side, where prod

Re: RE: [DISCUSS] KIP-280: Enhanced log compaction

2018-04-11 Thread Luís Cabral
Hi all, On my own previous statement: bq. Not that I mind doing it directly (I intend to use a Java client), but please be aware that a String binary representation is based on the charset encoding, while the Long binary representation varies according to the language. I went back to double