I updated the KIP page <https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression> following the discussion here. Please take a look when you are free. If you have any opinion, don't hesitate to give me a message.
Best, Dongjin On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <dong...@apache.org> wrote: > I just updated the draft implementation[^1], rebasing against the latest > trunk and implementing error routine (i.e., Error code 74 for > UnsupportedCompressionTypeException.) Since we decided to disallow all > fetch request below version 2.1.0 for the topics specifying ZStandard, I > added an error logic only. > > Please have a look when you are free. > > Thanks, > Dongjin > > [^1]: Please check the last commit here: > https://github.com/apache/kafka/pull/2267 > > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <dong...@apache.org> wrote: > >> Jason, >> >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE. >> >> Best, >> Dongjin >> >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io> >> wrote: >> >>> Hey Dongjin, >>> >>> Yeah that's right. For what it's worth, librdkafka also appears to handle >>> unexpected error codes. I expect that most client implementations would >>> either pass through the raw type or convert to an enum using something >>> like >>> what the java client does. Since we're expecting the client to fail >>> anyway, >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error >>> code. >>> >>> -Jason >>> >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <dong...@apache.org> wrote: >>> >>> > Jason and Ismael, >>> > >>> > It seems like the only thing we need to regard if we define a new error >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation >>> of >>> > the other language clients, right? At least, this strategy causes any >>> > problem for Java client. Do I understand correctly? >>> > >>> > Thanks, >>> > Dongjin >>> > >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <dong...@apache.org> >>> wrote: >>> > >>> > > Jason, >>> > > >>> > > > I think we would only use this error code when we /know/ that zstd >>> was >>> > > in use and the client doesn't support it? This is true if either 1) >>> the >>> > > message needs down-conversion and we encounter a zstd compressed >>> message, >>> > > or 2) if the topic is explicitly configured to use zstd. >>> > > >>> > > Yes, it is right. And you know, the case 1 includes 1.a) old clients' >>> > > request v0, v1 records or 1.b) implicit zstd, the compression type of >>> > > "producer" with Zstd compressed data. >>> > > >>> > > > However, if the compression type is set to "producer," then the >>> fetched >>> > > data may or may not be compressed with zstd. In this case, we return >>> the >>> > > data to the client and expect it to fail parsing. Is that correct? >>> > > >>> > > Exactly. >>> > > >>> > > Following your message, I reviewed the implementation of >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch request. And >>> > found >>> > > that the information we can use is like the following: >>> > > >>> > > 1. Client's fetch request version. (`versionId` variable) >>> > > 2. Log's compression type. (`logConfig` variable) >>> > > >>> > > We can't detect the actual compression type of the data, unless we >>> > inspect >>> > > the `RecordBatch` included in the `Records` instance (i.e., >>> > > `unconvertedRecords` variable.) Since it requires some performance >>> issue, >>> > > it is not our option - in short, we can't be sure if given chunks of >>> data >>> > > are compressed with zstd or not. >>> > > >>> > > So, conclusion: we can return an error in the case of 1.a and 2 >>> easily, >>> > > with the information above. In the case 1.b (implicit zstd), we can >>> just >>> > > return the data by do nothing special and expect it to fail parsing. >>> > > >>> > > Thanks, >>> > > Dongjin >>> > > >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <isma...@gmail.com> >>> wrote: >>> > > >>> > >> Jason, that's an interesting point regarding the Java client. Do we >>> know >>> > >> what clients in other languages do in these cases? >>> > >> >>> > >> Ismael >>> > >> >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io> >>> wrote: >>> > >> >>> > >> > Hi Dongjin, >>> > >> > >>> > >> > One of the complications is that old versions of the API will not >>> > >> expect a >>> > >> > new error code. However, since we expect this to be a fatal error >>> > anyway >>> > >> > for old clients, it may still be more useful to return the correct >>> > error >>> > >> > code. For example, the Kafka clients use the following code to >>> convert >>> > >> the >>> > >> > error code: >>> > >> > >>> > >> > public static Errors forCode(short code) { >>> > >> > Errors error = codeToError.get(code); >>> > >> > if (error != null) { >>> > >> > return error; >>> > >> > } else { >>> > >> > log.warn("Unexpected error code: {}.", code); >>> > >> > return UNKNOWN_SERVER_ERROR; >>> > >> > } >>> > >> > } >>> > >> > >>> > >> > If we return an unsupported error code, it will be converted to an >>> > >> UNKNOWN >>> > >> > error, but at least we will get the message in the log with the >>> > correct >>> > >> > code. That seems preferable to returning a misleading error code. >>> So I >>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error >>> even >>> > for >>> > >> > older versions. >>> > >> > >>> > >> > Also, one question just to check my understanding. I think we >>> would >>> > only >>> > >> > use this error code when we /know/ that zstd was in use and the >>> client >>> > >> > doesn't support it? This is true if either 1) the message needs >>> > >> > down-conversion and we encounter a zstd compressed message, or 2) >>> if >>> > the >>> > >> > topic is explicitly configured to use zstd. However, if the >>> > compression >>> > >> > type is set to "producer," then the fetched data may or may not be >>> > >> > compressed with zstd. In this case, we return the data to the >>> client >>> > and >>> > >> > expect it to fail parsing. Is that correct? >>> > >> > >>> > >> > Thanks, >>> > >> > Jason >>> > >> > >>> > >> > >>> > >> > >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <dong...@apache.org> >>> > >> wrote: >>> > >> > >>> > >> > > Ismael, Jason and all, >>> > >> > > >>> > >> > > I rewrote the backward compatibility strategy & its alternatives >>> > like >>> > >> > > following, based on Ismael & Jason's comments. Since it is not >>> > >> updated to >>> > >> > > the wiki yet, don't hesitate to give me a message if you have >>> any >>> > >> opinion >>> > >> > > on it. >>> > >> > > >>> > >> > > ``` >>> > >> > > *Backward Compatibility* >>> > >> > > >>> > >> > > We need to establish some backward-compatibility strategy for >>> the >>> > >> case an >>> > >> > > old client subscribes a topic using ZStandard implicitly (i.e., >>> > >> > > 'compression.type' configuration of given topic is 'producer' >>> and >>> > the >>> > >> > > producer compressed the records with ZStandard). We have the >>> > following >>> > >> > > options for this situation: >>> > >> > > >>> > >> > > *A. Support ZStandard to the old clients which can understand >>> v0, v1 >>> > >> > > messages only.* >>> > >> > > >>> > >> > > This strategy necessarily requires the down-conversion of v2 >>> message >>> > >> > > compressed with Zstandard into v0 or v1 messages, which means a >>> > >> > > considerable performance degradation. So we rejected this >>> strategy. >>> > >> > > >>> > >> > > *B. Bump the API version and support only v2-available clients* >>> > >> > > >>> > >> > > With this approach, we can message the old clients that they >>> are old >>> > >> and >>> > >> > > should be upgraded. However, there are still several options >>> for the >>> > >> > Error >>> > >> > > code. >>> > >> > > >>> > >> > > *B.1. INVALID_REQUEST (42)* >>> > >> > > >>> > >> > > This option gives the client so little information; the user >>> can be >>> > >> > > confused about why the client worked correctly in the past >>> suddenly >>> > >> > > encounters a problem. So we rejected this strategy. >>> > >> > > >>> > >> > > *B.2. CORRUPT_MESSAGE (2)* >>> > >> > > >>> > >> > > This option gives inaccurate information; the user can be >>> surprised >>> > >> and >>> > >> > > misunderstand that the log files are broken in some way. So we >>> > >> rejected >>> > >> > > this strategy. >>> > >> > > >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)* >>> > >> > > >>> > >> > > The advantage of this approach is we don't need to define a new >>> > error >>> > >> > code; >>> > >> > > we can reuse it and that's all. >>> > >> > > >>> > >> > > The disadvantage of this approach is that it is also a little >>> bit >>> > >> vague; >>> > >> > > This error code is defined as a work for KIP-98[^1] and now >>> returned >>> > >> in >>> > >> > the >>> > >> > > transaction error. >>> > >> > > >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)* >>> > >> > > >>> > >> > > The advantage of this approach is that it is clear and provides >>> an >>> > >> exact >>> > >> > > description. The disadvantage is we need to add a new error >>> code. >>> > >> > > ``` >>> > >> > > >>> > >> > > *It seems like what we need to choose is now so clear: >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or >>> UNSUPPORTED_COMPRESSION_TYPE >>> > >> > > (B.4).* >>> > >> > > The first one doesn't need a new error message but the latter is >>> > more >>> > >> > > explicit. Which one do you prefer? Since all of you have much >>> more >>> > >> > > experience and knowledge than me, I will follow your decision. >>> The >>> > >> wiki >>> > >> > > page will be updated following the decision also. >>> > >> > > >>> > >> > > Best, >>> > >> > > Dongjin >>> > >> > > >>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990 >>> > >> > > >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <isma...@gmail.com> >>> > >> wrote: >>> > >> > > >>> > >> > > > Sounds reasonable to me. >>> > >> > > > >>> > >> > > > Ismael >>> > >> > > > >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, < >>> ja...@confluent.io> >>> > >> > wrote: >>> > >> > > > >>> > >> > > > > Hey Ismael, >>> > >> > > > > >>> > >> > > > > Your summary looks good to me. I think it might also be a >>> good >>> > >> idea >>> > >> > to >>> > >> > > > add >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along >>> with >>> > the >>> > >> > > > version >>> > >> > > > > bumps. We won't be able to use it for old api versions >>> since the >>> > >> > > clients >>> > >> > > > > will not understand it, but we can use it going forward so >>> that >>> > >> we're >>> > >> > > not >>> > >> > > > > stuck in a similar situation with a new message format and >>> a new >>> > >> > codec >>> > >> > > to >>> > >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_ >>> > FORMAT, >>> > >> but >>> > >> > > it >>> > >> > > > is >>> > >> > > > > not as explicit. >>> > >> > > > > >>> > >> > > > > -Jason >>> > >> > > > > >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma < >>> ism...@juma.me.uk >>> > > >>> > >> > > wrote: >>> > >> > > > > >>> > >> > > > > > Hi Dongjin and Jason, >>> > >> > > > > > >>> > >> > > > > > I would agree. My summary: >>> > >> > > > > > >>> > >> > > > > > 1. Support zstd with message format 2 only. >>> > >> > > > > > 2. Bump produce and fetch request versions. >>> > >> > > > > > 3. Provide broker errors whenever possible based on the >>> > request >>> > >> > > version >>> > >> > > > > and >>> > >> > > > > > rely on clients for the cases where the broker can't >>> validate >>> > >> > > > efficiently >>> > >> > > > > > (example message format 2 consumer that supports the >>> latest >>> > >> fetch >>> > >> > > > version >>> > >> > > > > > but doesn't support zstd). >>> > >> > > > > > >>> > >> > > > > > If there's general agreement on this, I suggest we update >>> the >>> > >> KIP >>> > >> > to >>> > >> > > > > state >>> > >> > > > > > the proposal and to move the rejected options to its own >>> > >> section. >>> > >> > And >>> > >> > > > > then >>> > >> > > > > > start a vote! >>> > >> > > > > > >>> > >> > > > > > Ismael >>> > >> > > > > > >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson < >>> > >> > ja...@confluent.io> >>> > >> > > > > > wrote: >>> > >> > > > > > >>> > >> > > > > > > Hi Dongjin, >>> > >> > > > > > > >>> > >> > > > > > > Yes, that's a good summary. For clients which support >>> v2, >>> > the >>> > >> > > client >>> > >> > > > > can >>> > >> > > > > > > parse the message format and hopefully raise a useful >>> error >>> > >> > message >>> > >> > > > > > > indicating the unsupported compression type. For older >>> > >> clients, >>> > >> > our >>> > >> > > > > > options >>> > >> > > > > > > are probably (1) to down-convert to the old format >>> using no >>> > >> > > > compression >>> > >> > > > > > > type, or (2) to return an error code. I'm leaning >>> toward the >>> > >> > latter >>> > >> > > > as >>> > >> > > > > > the >>> > >> > > > > > > simpler solution, but the challenge is finding a good >>> error >>> > >> code. >>> > >> > > Two >>> > >> > > > > > > possibilities might be INVALID_REQUEST or >>> CORRUPT_MESSAGE. >>> > The >>> > >> > > > downside >>> > >> > > > > > is >>> > >> > > > > > > that old clients probably won't get a helpful message. >>> > >> However, >>> > >> > at >>> > >> > > > > least >>> > >> > > > > > > the behavior will be consistent in the sense that all >>> > clients >>> > >> > will >>> > >> > > > fail >>> > >> > > > > > if >>> > >> > > > > > > they do not support zstandard. >>> > >> > > > > > > >>> > >> > > > > > > What do you think? >>> > >> > > > > > > >>> > >> > > > > > > Thanks, >>> > >> > > > > > > Jason >>> > >> > > > > > > >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee < >>> > >> dong...@apache.org >>> > >> > > >>> > >> > > > > wrote: >>> > >> > > > > > > >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic >>> > following >>> > >> > your >>> > >> > > > > > > > explanation.[^1] You mean the following routines, >>> right? >>> > >> > > > > > > > >>> > >> > > > > > > > - >>> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/ >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534 >>> > >> > > > > > > > - >>> > >> > > > > > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/ >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/ >>> > >> > > LazyDownConversionRecords. >>> > >> > > > > > > > java#L165 >>> > >> > > > > > > > - >>> > >> > > > > > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/ >>> > >> > > > > > > > >>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40 >>> > >> > > > > > > > >>> > >> > > > > > > > It seems like your stance is like following: >>> > >> > > > > > > > >>> > >> > > > > > > > 1. In principle, Kafka does not change the compression >>> > codec >>> > >> > when >>> > >> > > > > > > > down-converting, since it requires inspecting the >>> fetched >>> > >> data, >>> > >> > > > which >>> > >> > > > > > is >>> > >> > > > > > > > expensive. >>> > >> > > > > > > > 2. However, there are some cases the fetched data is >>> > >> inspected >>> > >> > > > > anyway. >>> > >> > > > > > In >>> > >> > > > > > > > this case, we can provide compression conversion from >>> > >> Zstandard >>> > >> > > to >>> > >> > > > > > > > classical ones[^2]. >>> > >> > > > > > > > >>> > >> > > > > > > > And from what I understand, the cases where the client >>> > >> without >>> > >> > > > > > ZStandard >>> > >> > > > > > > > support receives ZStandard compressed records can be >>> > >> organized >>> > >> > > into >>> > >> > > > > two >>> > >> > > > > > > > cases: >>> > >> > > > > > > > >>> > >> > > > > > > > a. The 'compression.type' configuration of given >>> topic is >>> > >> > > > 'producer' >>> > >> > > > > > and >>> > >> > > > > > > > the producer compressed the records with ZStandard. >>> (that >>> > >> is, >>> > >> > > using >>> > >> > > > > > > > ZStandard implicitly.) >>> > >> > > > > > > > b. The 'compression.type' configuration of given >>> topic is >>> > >> > > 'zstd'; >>> > >> > > > > that >>> > >> > > > > > > is, >>> > >> > > > > > > > using ZStandard explicitly. >>> > >> > > > > > > > >>> > >> > > > > > > > As you stated, we don't have to handle the case b >>> > specially. >>> > >> > So, >>> > >> > > It >>> > >> > > > > > seems >>> > >> > > > > > > > like we can narrow the focus of the problem by joining >>> > case >>> > >> 1 >>> > >> > and >>> > >> > > > > case >>> > >> > > > > > b >>> > >> > > > > > > > like the following: >>> > >> > > > > > > > >>> > >> > > > > > > > > Given the topic with 'producer' as its >>> > 'compression.type' >>> > >> > > > > > > configuration, >>> > >> > > > > > > > ZStandard compressed records and old client without >>> > >> ZStandard, >>> > >> > is >>> > >> > > > > there >>> > >> > > > > > > any >>> > >> > > > > > > > case we need to inspect the records and can change the >>> > >> > > compression >>> > >> > > > > > type? >>> > >> > > > > > > If >>> > >> > > > > > > > so, can we provide compression type converting? >>> > >> > > > > > > > >>> > >> > > > > > > > Do I understand correctly? >>> > >> > > > > > > > >>> > >> > > > > > > > Best, >>> > >> > > > > > > > Dongjin >>> > >> > > > > > > > >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit >>> > >> > misunderstanding >>> > >> > > > how >>> > >> > > > > > API >>> > >> > > > > > > > version works, after reviewing the downconvert logic >>> & the >>> > >> > > protocol >>> > >> > > > > > > > documentation <https://kafka.apache.org/protocol>. >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4. >>> > >> > > > > > > > >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson < >>> > >> > > > ja...@confluent.io> >>> > >> > > > > > > > wrote: >>> > >> > > > > > > > >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > But in my opinion, since the client will fail >>> with the >>> > >> API >>> > >> > > > > version, >>> > >> > > > > > > so >>> > >> > > > > > > > we >>> > >> > > > > > > > > > don't need to down-convert the messages anyway. >>> Isn't >>> > >> it? >>> > >> > > So, I >>> > >> > > > > > think >>> > >> > > > > > > > we >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not >>> > >> familiar >>> > >> > > with >>> > >> > > > > > > > > down-convert >>> > >> > > > > > > > > > logic.) >>> > >> > > > > > > > > >>> > >> > > > > > > > > >>> > >> > > > > > > > > Currently the broker down-converts automatically >>> when it >>> > >> > > receives >>> > >> > > > > an >>> > >> > > > > > > old >>> > >> > > > > > > > > version of the fetch request (a version which is >>> known >>> > to >>> > >> > > predate >>> > >> > > > > the >>> > >> > > > > > > > > message format in use). Typically when >>> down-converting >>> > the >>> > >> > > > message >>> > >> > > > > > > > format, >>> > >> > > > > > > > > we use the same compression type, but there is not >>> much >>> > >> point >>> > >> > > in >>> > >> > > > > > doing >>> > >> > > > > > > so >>> > >> > > > > > > > > when we know the client doesn't support it. So if >>> > >> zstandard >>> > >> > is >>> > >> > > in >>> > >> > > > > > use, >>> > >> > > > > > > > and >>> > >> > > > > > > > > we have to down-convert anyway, then we can choose >>> to >>> > use >>> > >> a >>> > >> > > > > different >>> > >> > > > > > > > > compression type or no compression type. >>> > >> > > > > > > > > >>> > >> > > > > > > > > From my perspective, there is no significant >>> downside to >>> > >> > > bumping >>> > >> > > > > the >>> > >> > > > > > > > > protocol version and it has several potential >>> benefits. >>> > >> > Version >>> > >> > > > > bumps >>> > >> > > > > > > are >>> > >> > > > > > > > > cheap. The main question mark in my mind is about >>> > >> > > > down-conversion. >>> > >> > > > > > > > Figuring >>> > >> > > > > > > > > out whether down-conversion is needed is hard >>> generally >>> > >> > without >>> > >> > > > > > > > inspecting >>> > >> > > > > > > > > the fetched data, which is expensive. I think we >>> agree >>> > in >>> > >> > > > principle >>> > >> > > > > > > that >>> > >> > > > > > > > we >>> > >> > > > > > > > > do not want to have to pay this cost generally and >>> > prefer >>> > >> the >>> > >> > > > > clients >>> > >> > > > > > > to >>> > >> > > > > > > > > fail when they see an unhandled compression type. >>> The >>> > >> point I >>> > >> > > was >>> > >> > > > > > > making >>> > >> > > > > > > > is >>> > >> > > > > > > > > that there are some cases where we are either >>> inspecting >>> > >> the >>> > >> > > data >>> > >> > > > > > > anyway >>> > >> > > > > > > > > (because we have to down-convert the message >>> format), or >>> > >> we >>> > >> > > have >>> > >> > > > an >>> > >> > > > > > > easy >>> > >> > > > > > > > > way to tell whether zstandard is in use (the topic >>> has >>> > it >>> > >> > > > > configured >>> > >> > > > > > > > > explicitly). In the latter case, we don't have to >>> handle >>> > >> it >>> > >> > > > > > specially. >>> > >> > > > > > > > But >>> > >> > > > > > > > > we do have to decide how we will handle >>> down-conversion >>> > to >>> > >> > > older >>> > >> > > > > > > formats. >>> > >> > > > > > > > > >>> > >> > > > > > > > > -Jason >>> > >> > > > > > > > > >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee < >>> > >> > > dong...@apache.org >>> > >> > > > > >>> > >> > > > > > > wrote: >>> > >> > > > > > > > > >>> > >> > > > > > > > > > Colin and Jason, >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the >>> Pros and >>> > >> Cons >>> > >> > > of >>> > >> > > > > > > bumping >>> > >> > > > > > > > > > fetch API version are: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > Cons: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > - The Broker can't know whether a given message >>> batch >>> > is >>> > >> > > > > compressed >>> > >> > > > > > > > with >>> > >> > > > > > > > > > zstd or not. >>> > >> > > > > > > > > > - Need some additional logic for the topic >>> explicitly >>> > >> > > > configured >>> > >> > > > > to >>> > >> > > > > > > use >>> > >> > > > > > > > > > zstd. >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > Pros: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > - The broker doesn't need to conduct expensive >>> > >> > > down-conversion. >>> > >> > > > > > > > > > - Can message the users to update their client. >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > So, opinions for the backward-compatibility >>> policy by >>> > >> far: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason) >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor) >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > Here are my additional comments: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > @Colin >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > I greatly appreciate your response. In the case >>> of the >>> > >> > > > dictionary >>> > >> > > > > > > > > support, >>> > >> > > > > > > > > > of course, this issue should be addressed later >>> so we >>> > >> don't >>> > >> > > > need >>> > >> > > > > it >>> > >> > > > > > > in >>> > >> > > > > > > > > the >>> > >> > > > > > > > > > first version. You are right - it is not late to >>> try >>> > it >>> > >> > after >>> > >> > > > > some >>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in >>> mind on >>> > >> that >>> > >> > > > > > potential >>> > >> > > > > > > > > > feature. >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > @Jason >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > You wrote, >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway >>> because >>> > >> the >>> > >> > > > client >>> > >> > > > > > > does >>> > >> > > > > > > > > not >>> > >> > > > > > > > > > understand the message format, then we could also >>> use >>> > a >>> > >> > > > different >>> > >> > > > > > > > > > compression type. >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > But in my opinion, since the client will fail >>> with the >>> > >> API >>> > >> > > > > version, >>> > >> > > > > > > so >>> > >> > > > > > > > we >>> > >> > > > > > > > > > don't need to down-convert the messages anyway. >>> Isn't >>> > >> it? >>> > >> > > So, I >>> > >> > > > > > think >>> > >> > > > > > > > we >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not >>> > >> familiar >>> > >> > > with >>> > >> > > > > > > > > down-convert >>> > >> > > > > > > > > > logic.) >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > Please give more opinions. Thanks! >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > - Dongjin >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson < >>> > >> > > > > ja...@confluent.io >>> > >> > > > > > > >>> > >> > > > > > > > > wrote: >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > > Hey Colin, >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > The problem for the fetch API is that the broker >>> > does >>> > >> not >>> > >> > > > > > generally >>> > >> > > > > > > > > know >>> > >> > > > > > > > > > if >>> > >> > > > > > > > > > > a batch was compressed with zstd unless it >>> parses >>> > it. >>> > >> I >>> > >> > > think >>> > >> > > > > the >>> > >> > > > > > > > goal >>> > >> > > > > > > > > > here >>> > >> > > > > > > > > > > is to avoid the expensive down-conversion that >>> is >>> > >> needed >>> > >> > to >>> > >> > > > > > ensure >>> > >> > > > > > > > > > > compatibility because it is only necessary if >>> zstd >>> > is >>> > >> > > > actually >>> > >> > > > > in >>> > >> > > > > > > > use. >>> > >> > > > > > > > > > But >>> > >> > > > > > > > > > > as long as old clients can parse the message >>> format, >>> > >> they >>> > >> > > > > should >>> > >> > > > > > > get >>> > >> > > > > > > > a >>> > >> > > > > > > > > > > reasonable error if they see an unsupported >>> > >> compression >>> > >> > > type >>> > >> > > > in >>> > >> > > > > > the >>> > >> > > > > > > > > > > attributes. Basically the onus is on users to >>> ensure >>> > >> that >>> > >> > > > their >>> > >> > > > > > > > > consumers >>> > >> > > > > > > > > > > have been updated prior to using zstd. It seems >>> > like a >>> > >> > > > > reasonable >>> > >> > > > > > > > > > tradeoff >>> > >> > > > > > > > > > > to me. There are a couple cases that might be >>> worth >>> > >> > > thinking >>> > >> > > > > > > through: >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to use >>> zstd, >>> > >> then >>> > >> > we >>> > >> > > > > don't >>> > >> > > > > > > > need >>> > >> > > > > > > > > to >>> > >> > > > > > > > > > > check the fetched data for the compression type >>> to >>> > >> know >>> > >> > if >>> > >> > > we >>> > >> > > > > > need >>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch API >>> > version, >>> > >> > then >>> > >> > > > we >>> > >> > > > > > > could >>> > >> > > > > > > > > > handle >>> > >> > > > > > > > > > > this case by either down-converting using a >>> > different >>> > >> > > > > compression >>> > >> > > > > > > > type >>> > >> > > > > > > > > or >>> > >> > > > > > > > > > > returning an error. >>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway >>> > >> because >>> > >> > the >>> > >> > > > > > client >>> > >> > > > > > > > does >>> > >> > > > > > > > > > not >>> > >> > > > > > > > > > > understand the message format, then we could >>> also >>> > use >>> > >> a >>> > >> > > > > different >>> > >> > > > > > > > > > > compression type. >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > For the produce API, I think it's reasonable to >>> bump >>> > >> the >>> > >> > > api >>> > >> > > > > > > version. >>> > >> > > > > > > > > > This >>> > >> > > > > > > > > > > can be used by clients to check whether a broker >>> > >> supports >>> > >> > > > zstd. >>> > >> > > > > > For >>> > >> > > > > > > > > > > example, we might support a list of preferred >>> > >> compression >>> > >> > > > types >>> > >> > > > > > in >>> > >> > > > > > > > the >>> > >> > > > > > > > > > > producer and we could use the broker to detect >>> which >>> > >> > > version >>> > >> > > > to >>> > >> > > > > > > use. >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > -Jason >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe < >>> > >> > > > > cmcc...@apache.org >>> > >> > > > > > > >>> > >> > > > > > > > > wrote: >>> > >> > > > > > > > > > > >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin. ZStd is a >>> good >>> > >> > > > compression >>> > >> > > > > > > codec >>> > >> > > > > > > > > > and I >>> > >> > > > > > > > > > > > hope we can get this support in soon! >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > I would say we can just bump the API version >>> to >>> > >> > indicate >>> > >> > > > that >>> > >> > > > > > > ZStd >>> > >> > > > > > > > > > > support >>> > >> > > > > > > > > > > > is expected in new clients. We probably need >>> some >>> > >> way >>> > >> > of >>> > >> > > > > > > > indicating >>> > >> > > > > > > > > to >>> > >> > > > > > > > > > > the >>> > >> > > > > > > > > > > > older clients that they can't consume the >>> > >> partitions, >>> > >> > as >>> > >> > > > > well. >>> > >> > > > > > > > > Perhaps >>> > >> > > > > > > > > > > we >>> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT >>> error? >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > The license thing seems straightforward -- >>> it's >>> > >> just a >>> > >> > > > matter >>> > >> > > > > > of >>> > >> > > > > > > > > adding >>> > >> > > > > > > > > > > > the text to the right files as per ASF >>> guidelines. >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > With regard to the dictionary support, do we >>> > really >>> > >> > need >>> > >> > > > that >>> > >> > > > > > in >>> > >> > > > > > > > the >>> > >> > > > > > > > > > > first >>> > >> > > > > > > > > > > > version? Hopefully message batches are big >>> enough >>> > >> that >>> > >> > > > this >>> > >> > > > > > > isn't >>> > >> > > > > > > > > > > needed. >>> > >> > > > > > > > > > > > Some benchmarks might help here. >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > best, >>> > >> > > > > > > > > > > > Colin >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee >>> wrote: >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot >>> this >>> > >> issue, >>> > >> > > > > KIP-110 >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/ >>> > >> > > confluence/display/KAFKA/KIP- >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression> >>> > >> > > > > > > > > > > > > . >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of the >>> > >> history: >>> > >> > > > KIP-110 >>> > >> > > > > > was >>> > >> > > > > > > > > > > > originally >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it >>> lacked >>> > >> > > benchmark >>> > >> > > > > > > results >>> > >> > > > > > > > to >>> > >> > > > > > > > > > get >>> > >> > > > > > > > > > > > the >>> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan >>> Babrou >>> > and >>> > >> > some >>> > >> > > > > other >>> > >> > > > > > > > users >>> > >> > > > > > > > > > who >>> > >> > > > > > > > > > > > > adopted the patch provided their excellent >>> > >> > performance >>> > >> > > > > report >>> > >> > > > > > > > which >>> > >> > > > > > > > > > is >>> > >> > > > > > > > > > > > now >>> > >> > > > > > > > > > > > > included in the KIP, but it postponed again >>> > >> because >>> > >> > of >>> > >> > > > the >>> > >> > > > > > > > > community >>> > >> > > > > > > > > > > was >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now >>> reboot >>> > >> this >>> > >> > > > issue. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > The following is the current status of the >>> > >> feature: >>> > >> > You >>> > >> > > > can >>> > >> > > > > > > check >>> > >> > > > > > > > > the >>> > >> > > > > > > > > > > > > current draft implementation here >>> > >> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. >>> It >>> > >> is >>> > >> > > based >>> > >> > > > > on >>> > >> > > > > > > zstd >>> > >> > > > > > > > > > 1.3.5 >>> > >> > > > > > > > > > > > and >>> > >> > > > > > > > > > > > > periodically rebased onto the latest >>> trunk[^1]. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > The issues that should be addressed is like >>> > >> > following: >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > *1. Backward Compatibility* >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > To support old consumers, we need to take a >>> > >> strategy >>> > >> > to >>> > >> > > > > > handle >>> > >> > > > > > > > the >>> > >> > > > > > > > > > old >>> > >> > > > > > > > > > > > > consumers. Current candidates are: >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > - Bump API version >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients fail. >>> > >> > > > > > > > > > > > > - Improve the error messages: >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > *2. Dictionary Support* >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > To support zstd's dictionary feature in the >>> > future >>> > >> > (if >>> > >> > > > > > needed), >>> > >> > > > > > > > we >>> > >> > > > > > > > > > need >>> > >> > > > > > > > > > > > to >>> > >> > > > > > > > > > > > > sketch how it should be and leave some room >>> for >>> > >> it. >>> > >> > As >>> > >> > > of >>> > >> > > > > > now, >>> > >> > > > > > > > > there >>> > >> > > > > > > > > > > has >>> > >> > > > > > > > > > > > > been no discussion on this topic yet. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > *3. License* >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > To use this feature, we need to add license >>> of >>> > >> zstd >>> > >> > and >>> > >> > > > > > > zstd-jni >>> > >> > > > > > > > to >>> > >> > > > > > > > > > the >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for >>> raising >>> > >> this >>> > >> > > > issue!) >>> > >> > > > > > It >>> > >> > > > > > > > > seems >>> > >> > > > > > > > > > > like >>> > >> > > > > > > > > > > > > what Apache Spark did would be a good >>> example >>> > but >>> > >> > there >>> > >> > > > has >>> > >> > > > > > > been >>> > >> > > > > > > > no >>> > >> > > > > > > > > > > > > discussion yet. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > You can find the details of the above >>> issues in >>> > >> the >>> > >> > KIP >>> > >> > > > > > > document. >>> > >> > > > > > > > > > > Please >>> > >> > > > > > > > > > > > > have a look when you are free, and give me >>> > >> feedback. >>> > >> > > All >>> > >> > > > > > kinds >>> > >> > > > > > > of >>> > >> > > > > > > > > > > > > participating are welcome. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > Best, >>> > >> > > > > > > > > > > > > Dongjin >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit >>> 6b4fb8152. >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin >>> Lee < >>> > >> > > > > > > dong...@apache.org >>> > >> > > > > > > > > >>> > >> > > > > > > > > > > wrote: >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > > Sorry for the late reply. >>> > >> > > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > > In short, I could not submit the updated >>> KIP >>> > by >>> > >> the >>> > >> > > > > feature >>> > >> > > > > > > > > freeze >>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it >>> will >>> > not >>> > >> be >>> > >> > > > > included >>> > >> > > > > > > in >>> > >> > > > > > > > > the >>> > >> > > > > > > > > > > > 2.0.0 >>> > >> > > > > > > > > > > > > > release and all discussion for this issue >>> were >>> > >> > > > postponed >>> > >> > > > > > > after >>> > >> > > > > > > > > the >>> > >> > > > > > > > > > > > release >>> > >> > > > > > > > > > > > > > of 2.0.0. >>> > >> > > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > > I have been updating the PR following >>> recent >>> > >> > updates. >>> > >> > > > > Just >>> > >> > > > > > > > now, I >>> > >> > > > > > > > > > > > rebased >>> > >> > > > > > > > > > > > > > it against the latest trunk and updated >>> the >>> > zstd >>> > >> > > > version >>> > >> > > > > > into >>> > >> > > > > > > > > > 1.3.5. >>> > >> > > > > > > > > > > > If you >>> > >> > > > > > > > > > > > > > need some request, don't hesitate to >>> notify >>> > me. >>> > >> > (But >>> > >> > > > not >>> > >> > > > > > this >>> > >> > > > > > > > > > thread >>> > >> > > > > > > > > > > - >>> > >> > > > > > > > > > > > just >>> > >> > > > > > > > > > > > > > send me the message directly.) >>> > >> > > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > > Best, >>> > >> > > > > > > > > > > > > > Dongjin >>> > >> > > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby >>> Evans < >>> > >> > > > > > > bo...@apache.org >>> > >> > > > > > > > > >>> > >> > > > > > > > > > > wrote: >>> > >> > > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > >> I there any update on this. The >>> performance >>> > >> > > > > improvements >>> > >> > > > > > > are >>> > >> > > > > > > > > > quite >>> > >> > > > > > > > > > > > > >> impressive and I really would like to >>> stop >>> > >> forking >>> > >> > > > kafka >>> > >> > > > > > > just >>> > >> > > > > > > > to >>> > >> > > > > > > > > > get >>> > >> > > > > > > > > > > > this >>> > >> > > > > > > > > > > > > >> in. >>> > >> > > > > > > > > > > > > >> >>> > >> > > > > > > > > > > > > >> Thanks, >>> > >> > > > > > > > > > > > > >> >>> > >> > > > > > > > > > > > > >> Bobby >>> > >> > > > > > > > > > > > > >> >>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin >>> Lee < >>> > >> > > > > > > > dong...@apache.org >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > > > wrote: >>> > >> > > > > > > > > > > > > >> >>> > >> > > > > > > > > > > > > >> > Ismael, >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working >>> > frenzy >>> > >> > for >>> > >> > > > 2.0! >>> > >> > > > > > No >>> > >> > > > > > > > > > problem, >>> > >> > > > > > > > > > > > take >>> > >> > > > > > > > > > > > > >> > your time. I am also working at another >>> > issue >>> > >> > now. >>> > >> > > > > Thank >>> > >> > > > > > > you >>> > >> > > > > > > > > for >>> > >> > > > > > > > > > > > > >> letting me >>> > >> > > > > > > > > > > > > >> > know. >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > Best, >>> > >> > > > > > > > > > > > > >> > Dongjin >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael >>> Juma >>> > < >>> > >> > > > > > > > isma...@gmail.com >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > > > wrote: >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. >>> Everyone is >>> > >> busy >>> > >> > > > > > finalising >>> > >> > > > > > > > > > 2.0.0. >>> > >> > > > > > > > > > > > This >>> > >> > > > > > > > > > > > > >> KIP >>> > >> > > > > > > > > > > > > >> > > seems like a great candidate for >>> 2.1.0 >>> > and >>> > >> > > > hopefully >>> > >> > > > > > > there >>> > >> > > > > > > > > > will >>> > >> > > > > > > > > > > be >>> > >> > > > > > > > > > > > > >> more >>> > >> > > > > > > > > > > > > >> > of >>> > >> > > > > > > > > > > > > >> > > a discussion next week. :) >>> > >> > > > > > > > > > > > > >> > > >>> > >> > > > > > > > > > > > > >> > > Ismael >>> > >> > > > > > > > > > > > > >> > > >>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin >>> Lee, < >>> > >> > > > > > > > dong...@apache.org >>> > >> > > > > > > > > > >>> > >> > > > > > > > > > > > wrote: >>> > >> > > > > > > > > > > > > >> > > >>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft >>> > >> > implementation: >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit >>> > >> 5145d6b) >>> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4 >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > You can check out the >>> implementation >>> > from >>> > >> > here >>> > >> > > > > > > > > > > > > >> > > > < >>> > >> https://github.com/apache/kafka/pull/2267 >>> > >> > >. >>> > >> > > If >>> > >> > > > > you >>> > >> > > > > > > > > > > experience >>> > >> > > > > > > > > > > > any >>> > >> > > > > > > > > > > > > >> > > problem >>> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to give >>> me a >>> > >> > > mention. >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > Best, >>> > >> > > > > > > > > > > > > >> > > > Dongjin >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM >>> Dongjin >>> > >> Lee >>> > >> > < >>> > >> > > > > > > > > > > dong...@apache.org >>> > >> > > > > > > > > > > > > >>> > >> > > > > > > > > > > > > >> > wrote: >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion >>> about >>> > the >>> > >> > > license >>> > >> > > > > > > > problem: >>> > >> > > > > > > > > > *We >>> > >> > > > > > > > > > > > can >>> > >> > > > > > > > > > > > > >> use >>> > >> > > > > > > > > > > > > >> > > zstd >>> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any >>> problem, but >>> > >> we >>> > >> > > need >>> > >> > > > to >>> > >> > > > > > > > include >>> > >> > > > > > > > > > > their >>> > >> > > > > > > > > > > > > >> > license, >>> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.* >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 >>> > Clause >>> > >> > > > License >>> > >> > > > > > > > requires >>> > >> > > > > > > > > > to >>> > >> > > > > > > > > > > > > >> include >>> > >> > > > > > > > > > > > > >> > the >>> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause >>> > License >>> > >> > > > requires >>> > >> > > > > > that >>> > >> > > > > > > > the >>> > >> > > > > > > > > > > name >>> > >> > > > > > > > > > > > of >>> > >> > > > > > > > > > > > > >> the >>> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to >>> endorse >>> > or >>> > >> > > > promote >>> > >> > > > > > the >>> > >> > > > > > > > > > product. >>> > >> > > > > > > > > > > > > >> That's >>> > >> > > > > > > > > > > > > >> > it >>> > >> > > > > > > > > > > > > >> > > > > < >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> >>> > >> > http://www.mikestratton.net/2011/12/is-bsd-license- >>> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/ >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the >>> list of >>> > >> > > > prohibited >>> > >> > > > > > > > licenses >>> > >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/ >>> > >> > > > > > > > resolved.html#category-x> >>> > >> > > > > > > > > > > also. >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it >>> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/ >>> > >> > > > > > jira/browse/SPARK-19112 >>> > >> > > > > > > >: >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > - They made a directory >>> dedicated to >>> > >> the >>> > >> > > > > > dependency >>> > >> > > > > > > > > > license >>> > >> > > > > > > > > > > > files >>> > >> > > > > > > > > > > > > >> > > > > < >>> > >> > > > > > > https://github.com/apache/spark/tree/master/licenses >>> > >> > > > > > > > > >>> > >> > > > > > > > > > and >>> > >> > > > > > > > > > > > added >>> > >> > > > > > > > > > > > > >> > > > licenses >>> > >> > > > > > > > > > > > > >> > > > > for Zstd >>> > >> > > > > > > > > > > > > >> > > > > < >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > https://github.com/apache/spark/blob/master/licenses/ >>> > >> > > > > > > > > > > > LICENSE-zstd.txt >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > > & >>> > >> > > > > > > > > > > > > >> > > > > Zstd-jni >>> > >> > > > > > > > > > > > > >> > > > > < >>> > >> > > > > > > > > > > > > >> > > > >>> > >> > > > > > > > > > > > > >> > > >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> https://github.com/apache/ >>> > >> > > spark/blob/master/licenses/ >>> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt >>> > >> > > > > > > > > > > > > >> > >>> > >> > > > > > > > > > > > > >> > > > > . >>> > >> > > > > > > > > > > > > >> > > > > - Added a link to the original >>> > license >>> > >> > files >>> > >> > > > in >>> > >> > > > > > > > LICENSE. >>> > >> > > > > > > > > > > > > >> > > > > < >>> > >> > > > > https://github.com/apache/spark/pull/18805/files >>> > >> > > > > > > >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar >>> > update. >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this >>> problem, >>> > >> > > Viktor! >>> > >> > > > > Nice >>> > >> > > > > > > > > catch! >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > Best, >>> > >> > > > > > > > > > > > > >> > > > > Dongjin >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM >>> > >> Dongjin >>> > >> > > Lee < >>> > >> > > > > > > > > > > > dong...@apache.org> >>> > >> > > > > > > > > > > > > >> > > wrote: >>> > >> > > > > > > > > > > > > >> > > > > >>> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your >>> > >> comprehensive >>> > >> > > > > > reasoning. >>> > >> > > > > > > > so: >>> > >> > > > > > > > > +1 >>> > >> > > > > > > > > > > > for b >>> > >> > > > > > > > > > > > > >> > until >>> > >> > > > > > > > > > > > > >> > > > now. >>> > >> > > > > > > > > > > > > >> > > > >> >>> > >> > > > > > > > > > > > > >> > > > >> For the license issues, I will >>> have >>> > a >>> > >> > check >>> > >> > > > on >>> > >> > > > > > how >>> > >> > > > > > > > the >>> > >> > > > > > > > > > over >>> > >> > > > > > > > > > > > > >> projects >>> > >> > > > > > > > > > > > > >> > > are >>> > >> > > > > > > > > > > > > >> > > > >> doing and share the results. >>> > >> > > > > > > > > > > > > >> > > > >> >>> > >> > > > > > > > > > > > > >> > > > >> Best, >>> > >> > > > > > > > > > > > > >> > > > >> Dongjin >>> > >> > > > > > > > > > > > > >> > > > >> >>> > >> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM >>> > >> Viktor >>> > >> > > > > Somogyi < >>> > >> > > > > > > > > > > > > >> > > > viktorsomo...@gmail.com> >>> > >> > > > > > > > > > > > > >> > > > >> wrote: >>> > >> > > > > > > > > > > > > >> > > > >> >>> > >> > > > > > > > > > > > > >> > > > >>> Hi Dongjin, >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> A couple of comments: >>> > >> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in >>> the >>> > >> > > "backward >>> > >> > > > > > > > > > compatibility" >>> > >> > > > > > > > > > > > > >> section. >>> > >> > > > > > > > > > > > > >> > > My >>> > >> > > > > > > > > > > > > >> > > > >>> reasoning for this is that >>> users >>> > >> > upgrading >>> > >> > > > to >>> > >> > > > > a >>> > >> > > > > > > zstd >>> > >> > > > > > > > > > > > compatible >>> > >> > > > > > > > > > > > > >> > > version >>> > >> > > > > > > > > > > > > >> > > > >>> won't start to use it >>> > automatically, >>> > >> so >>> > >> > > > manual >>> > >> > > > > > > > > > > > reconfiguration >>> > >> > > > > > > > > > > > > >> is >>> > >> > > > > > > > > > > > > >> > > > >>> required. >>> > >> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't >>> mess up >>> > >> the >>> > >> > > > > cluster. >>> > >> > > > > > If >>> > >> > > > > > > > not >>> > >> > > > > > > > > > all >>> > >> > > > > > > > > > > > the >>> > >> > > > > > > > > > > > > >> > > clients >>> > >> > > > > > > > > > > > > >> > > > >>> are >>> > >> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them >>> and >>> > >> > they'd >>> > >> > > > > start >>> > >> > > > > > to >>> > >> > > > > > > > use >>> > >> > > > > > > > > > > zstd >>> > >> > > > > > > > > > > > > >> then it >>> > >> > > > > > > > > > > > > >> > > > would >>> > >> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. >>> I'd >>> > >> like to >>> > >> > > > > presume >>> > >> > > > > > > > > though >>> > >> > > > > > > > > > > that >>> > >> > > > > > > > > > > > > >> this >>> > >> > > > > > > > > > > > > >> > is >>> > >> > > > > > > > > > > > > >> > > a >>> > >> > > > > > > > > > > > > >> > > > >>> very >>> > >> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody >>> > >> should >>> > >> > be >>> > >> > > > > > > surprised >>> > >> > > > > > > > if >>> > >> > > > > > > > > > it >>> > >> > > > > > > > > > > > didn't >>> > >> > > > > > > > > > > > > >> > > work. >>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I >>> think we >>> > >> > should >>> > >> > > > bump >>> > >> > > > > > the >>> > >> > > > > > > > > fetch >>> > >> > > > > > > > > > > and >>> > >> > > > > > > > > > > > > >> > produce >>> > >> > > > > > > > > > > > > >> > > > >>> requests if it's a change in >>> the >>> > >> message >>> > >> > > > > format. >>> > >> > > > > > > > > > Moreover >>> > >> > > > > > > > > > > if >>> > >> > > > > > > > > > > > > >> some >>> > >> > > > > > > > > > > > > >> > of >>> > >> > > > > > > > > > > > > >> > > > the >>> > >> > > > > > > > > > > > > >> > > > >>> producers and the brokers are >>> > >> upgraded >>> > >> > but >>> > >> > > > > some >>> > >> > > > > > of >>> > >> > > > > > > > the >>> > >> > > > > > > > > > > > consumers >>> > >> > > > > > > > > > > > > >> > are >>> > >> > > > > > > > > > > > > >> > > > not, >>> > >> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the >>> error >>> > >> when >>> > >> > > the >>> > >> > > > > old >>> > >> > > > > > > > > consumer >>> > >> > > > > > > > > > > > tries >>> > >> > > > > > > > > > > > > >> to >>> > >> > > > > > > > > > > > > >> > > > consume >>> > >> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages. >>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as >>> I >>> > >> think >>> > >> > > > binding >>> > >> > > > > > the >>> > >> > > > > > > > > > > > compression >>> > >> > > > > > > > > > > > > >> type >>> > >> > > > > > > > > > > > > >> > > to >>> > >> > > > > > > > > > > > > >> > > > an >>> > >> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the >>> > >> > developer's >>> > >> > > > > > > > > perspective. >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the >>> > >> existing >>> > >> > > > > binding, >>> > >> > > > > > > > > however >>> > >> > > > > > > > > > > we >>> > >> > > > > > > > > > > > must >>> > >> > > > > > > > > > > > > >> > > > respect >>> > >> > > > > > > > > > > > > >> > > > >>> the licenses: >>> > >> > > > > > > > > > > > > >> > > > >>> "The code for these JNI >>> bindings is >>> > >> > > licenced >>> > >> > > > > > under >>> > >> > > > > > > > > > > 2-clause >>> > >> > > > > > > > > > > > BSD >>> > >> > > > > > > > > > > > > >> > > > license. >>> > >> > > > > > > > > > > > > >> > > > >>> The native Zstd library is >>> licensed >>> > >> > under >>> > >> > > > > > 3-clause >>> > >> > > > > > > > BSD >>> > >> > > > > > > > > > > > license >>> > >> > > > > > > > > > > > > >> and >>> > >> > > > > > > > > > > > > >> > > > GPL2" >>> > >> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page >>> > >> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/ >>> > >> > > > > > > > resolved.html#category-a >>> > >> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD >>> > >> licenses >>> > >> > > but >>> > >> > > > > the >>> > >> > > > > > > > Apache >>> > >> > > > > > > > > > > > license >>> > >> > > > > > > > > > > > > >> is >>> > >> > > > > > > > > > > > > >> > not >>> > >> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm >>> hoping >>> > that >>> > >> > the >>> > >> > > > > > > "3-clause >>> > >> > > > > > > > > BSD >>> > >> > > > > > > > > > > > license >>> > >> > > > > > > > > > > > > >> and >>> > >> > > > > > > > > > > > > >> > > > GPL2" >>> > >> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR >>> in >>> > >> this >>> > >> > > case, >>> > >> > > > > but >>> > >> > > > > > > I'm >>> > >> > > > > > > > > no >>> > >> > > > > > > > > > > > lawyer, >>> > >> > > > > > > > > > > > > >> > just >>> > >> > > > > > > > > > > > > >> > > > >>> wanted >>> > >> > > > > > > > > > > > > >> > > > >>> to make the point that we >>> should >>> > >> watch >>> > >> > out >>> > >> > > > for >>> > >> > > > > > > > > licenses. >>> > >> > > > > > > > > > > :) >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> Regards, >>> > >> > > > > > > > > > > > > >> > > > >>> Viktor >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM >>> > Ivan >>> > >> > > Babrou >>> > >> > > > < >>> > >> > > > > > > > > > > > ibob...@gmail.com> >>> > >> > > > > > > > > > > > > >> > > wrote: >>> > >> > > > > > > > > > > > > >> > > > >>> >>> > >> > > > > > > > > > > > > >> > > > >>> > Hello, >>> > >> > > > > > > > > > > > > >> > > > >>> > >>> > >> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very >>> > much >>> > >> > > support >>> > >> > > > > the >>> > >> > > > > > > > fact >>> > >> > > > > > > > > > that >>> > >> > > > > > > > > > > > zstd >>> > >> > > > > > > > > > > > > >> > > > >>> compression >>> > >> > > > > > > > > > > > > >> > > > >>> > should be included out of the >>> > box. >>> > >> > > > > > > > > > > > > >> > > > >>> > >>> > >> > > > > > > > > > > > > >> > > > >>> > Please think about the >>> > environment, >>> > >> > you >>> > >> > > > can >>> > >> > > > > > save >>> > >> > > > > > > > > > quite a >>> > >> > > > > > > > > > > > lot >>> > >> > > > > > > > > > > > > >> of >>> > >> > > > > > > > > > > > > >> > > > >>> hardware >>> > >> > > > > > > > > > > > > >> > > > >>> > with it. >>> > >> > > > > > > > > > > > > >> > > > >>> > >>> > >> > > > > > > > > > > > > >> > > > >>> > Thank you. >>> > >> > > > > > > > > > > > > >> > > > >>> > >>> > >> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 >>> > >> Dongjin >>> > >> > > Lee < >>> > >> > > > > > > > > > > > dong...@apache.org> >>> > >> > > > > > > > > > > > > >> > > wrote: >>> > >> > > > > > > > > > > > > >> > > > >>> > >>> > >> > > > > > > > > > > > > >> > > > >>> > > Since there are no >>> responses >>> > for >>> > >> a >>> > >> > > > week, I >>> > >> > > > > > > > decided >>> > >> > > > > > > > > > to >>> > >> > > > > > > > > > > > > >> > reinitiate >>> > >> > > > > > > > > > > > > >> > > > the >>> > >> > > > > > > > > > > > > >> > > > >>> > > discussion thread. >>> > >> > > > > > > > > > > > > >> > > > >>> > > >>> > >> > > > > > >> >> -- *Dongjin Lee* *A hitchhiker in the mathematical world.* *github: <http://goog_969573159/>github.com/dongjinleekr <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr <http://www.slideshare.net/dongjinleekr>*