Good work on this Dana. I'll test it with librdkafka (which uses the official liblz4) and report back.
2016-05-03 20:02 GMT+02:00 Dana Powers <dana.pow...@gmail.com>: > Yes, great point. The intent of adding "naive" support for the > remaining LZ4 header flags (contentsize and contentchecksum) is to > avoid rejecting any lz4 messages framed according to the interoperable > spec (v1.5.1). If we can accept all such messages (which this KIP > does), we should never have to make similar 'breaking' changes in the > future. > > That said, there is one feature that is left unsupported: > block-dependent (de)compression, aka the lz4 "streaming" api. If a > library *only* supported lz4 streaming, it would be incompatible with > this kafka implementation. I haven't done a survey of libraries, but I > would be shocked to find an lz4 implementation that only supported > streaming -- or even that used streaming as the default. This is > because it is an optimization on the default block api (this is what > kafka has used and continues to use -- no change here). There's more > detail on this here: > https://github.com/Cyan4973/lz4/wiki/LZ4-Streaming-API-Basics > > I should also note that we do not support the old dictionary-id flag, > which was available in earlier lz4f spec versions but has since been > removed. I can't find any evidence of it ever being used, which is > probably why it was dropped from the spec. > > Re testing, kafka-python uses https://github.com/darkdragn/lz4tools > which implements 1.5.0 of the lz4f spec and is listed as one of the > interoperable libraries. I'm not sure which library Magnus uses in > librdkafka, but it would be great to get independent verification from > him that this patch works there as well. You'll note that there is > currently no interoperable java library :( There is some desire to > merge the kafka classes back into jpountz/lz4-java, which I think > would be reasonable after these compatibility fixes. See > https://github.com/jpountz/lz4-java/issues/21 . > > -Dana > > On Tue, May 3, 2016 at 10:38 AM, Ewen Cheslack-Postava > <e...@confluent.io> wrote: > > +1 > > > > One caveat on the vote though -- I don't know the details of LZ4 (format, > > libraries, etc) well enough to have a handle on whether the changes under > > "KafkaLZ4* code" are going to be sufficient to get broad support from > other > > LZ4 libraries. Are we going to have multiple implementations we can test > a > > PR with before we merge? Or would any subsequent fixes also have to come > > with bumping the produce request version to indicate varying feature > > support if we had to further change that code? What I want to avoid is > > clients in some languages having to work with broker version numbers > > instead of protocol version numbers due to further incompatibilities we > > might find. > > > > -Ewen > > > > On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <dana.pow...@gmail.com> > wrote: > > > >> Sure thing. Yes, the substantive change is fixing the HC checksum. > >> > >> But to further improve interoperability, the kafka LZ4 class would no > >> longer reject messages that have these optional header flags set. The > >> flags might get set if the client/user chooses to use a non-java lz4 > >> compression library that includes them. In practice, naive support for > >> the flags just means reading a few extra bytes in the header and/or > >> footer of the payload. The KIP does not intend to use or validate this > >> extra data. > >> > >> ContentSize is described as: "This field has no impact on decoding, it > >> just informs the decoder how much data the frame holds (for example, > >> to display it during decoding process, or for verification purpose). > >> It can be safely skipped by a conformant decoder." We skip it. > >> > >> ContentChecksum is "Content Checksum validates the result, that all > >> blocks were fully transmitted in the correct order and without error, > >> and also that the encoding/decoding process itself generated no > >> distortion." We skip it. > >> > >> -Dana > >> > >> > >> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <j...@confluent.io> wrote: > >> > Hi, Dana, > >> > > >> > Could you explain the following from the KIP a bit more? The KIP is > >> > intended to just fix the HC checksum, but the following seems to > suggest > >> > there are other format changes? > >> > > >> > KafkaLZ4* code: > >> > > >> > - add naive support for optional header flags (ContentSize, > >> > ContentChecksum) to enable interoperability with off-the-shelf lz4 > >> libraries > >> > - the only flag left unsupported is dependent-block compression, > which > >> > our implementation does not currently support. > >> > > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <dana.pow...@gmail.com> > >> wrote: > >> > > >> >> Hi all, > >> >> > >> >> Initiating a vote thread because the KIP-57 proposal is specific to > >> >> the 0.10 release. > >> >> > >> >> KIP-57 can be accessed here: > >> >> < > >> >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing > >> >> >. > >> >> > >> >> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160 > >> >> and working github PR at https://github.com/apache/kafka/pull/1212 > >> >> > >> >> The vote will run for 72 hours. > >> >> > >> >> +1 (non-binding) > >> >> > >> > > > > > > > > -- > > Thanks, > > Ewen >