Good work on this Dana.

I'll test it with librdkafka (which uses the official liblz4) and report
back.


2016-05-03 20:02 GMT+02:00 Dana Powers <dana.pow...@gmail.com>:

> Yes, great point. The intent of adding "naive" support for the
> remaining LZ4 header flags (contentsize and contentchecksum) is to
> avoid rejecting any lz4 messages framed according to the interoperable
> spec (v1.5.1). If we can accept all such messages (which this KIP
> does), we should never have to make similar 'breaking' changes in the
> future.
>
> That said, there is one feature that is left unsupported:
> block-dependent (de)compression, aka the lz4 "streaming" api. If a
> library *only* supported lz4 streaming, it would be incompatible with
> this kafka implementation. I haven't done a survey of libraries, but I
> would be shocked to find an lz4 implementation that only supported
> streaming -- or even that used streaming as the default. This is
> because it is an optimization on the default block api (this is what
> kafka has used and continues to use -- no change here). There's more
> detail on this here:
> https://github.com/Cyan4973/lz4/wiki/LZ4-Streaming-API-Basics
>
> I should also note that we do not support the old dictionary-id flag,
> which was available in earlier lz4f spec versions but has since been
> removed. I can't find any evidence of it ever being used, which is
> probably why it was dropped from the spec.
>
> Re testing, kafka-python uses https://github.com/darkdragn/lz4tools
> which implements 1.5.0 of the lz4f spec and is listed as one of the
> interoperable libraries. I'm not sure which library Magnus uses in
> librdkafka, but it would be great to get independent verification from
> him that this patch works there as well. You'll note that there is
> currently no interoperable java library :( There is some desire to
> merge the kafka classes back into jpountz/lz4-java, which I think
> would be reasonable after these compatibility fixes. See
> https://github.com/jpountz/lz4-java/issues/21 .
>
> -Dana
>
> On Tue, May 3, 2016 at 10:38 AM, Ewen Cheslack-Postava
> <e...@confluent.io> wrote:
> > +1
> >
> > One caveat on the vote though -- I don't know the details of LZ4 (format,
> > libraries, etc) well enough to have a handle on whether the changes under
> > "KafkaLZ4* code" are going to be sufficient to get broad support from
> other
> > LZ4 libraries. Are we going to have multiple implementations we can test
> a
> > PR with before we merge? Or would any subsequent fixes also have to come
> > with bumping the produce request version to indicate varying feature
> > support if we had to further change that code? What I want to avoid is
> > clients in some languages having to work with broker version numbers
> > instead of protocol version numbers due to further incompatibilities we
> > might find.
> >
> > -Ewen
> >
> > On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <dana.pow...@gmail.com>
> wrote:
> >
> >> Sure thing. Yes, the substantive change is fixing the HC checksum.
> >>
> >> But to further improve interoperability, the kafka LZ4 class would no
> >> longer reject messages that have these optional header flags set. The
> >> flags might get set if the client/user chooses to use a non-java lz4
> >> compression library that includes them. In practice, naive support for
> >> the flags just means reading a few extra bytes in the header and/or
> >> footer of the payload. The KIP does not intend to use or validate this
> >> extra data.
> >>
> >> ContentSize is described as: "This field has no impact on decoding, it
> >> just informs the decoder how much data the frame holds (for example,
> >> to display it during decoding process, or for verification purpose).
> >> It can be safely skipped by a conformant decoder." We skip it.
> >>
> >> ContentChecksum is "Content Checksum validates the result, that all
> >> blocks were fully transmitted in the correct order and without error,
> >> and also that the encoding/decoding process itself generated no
> >> distortion." We skip it.
> >>
> >> -Dana
> >>
> >>
> >> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <j...@confluent.io> wrote:
> >> > Hi, Dana,
> >> >
> >> > Could you explain the following from the KIP a bit more? The KIP is
> >> > intended to just fix the HC checksum, but the following seems to
> suggest
> >> > there are other format changes?
> >> >
> >> > KafkaLZ4* code:
> >> >
> >> >    - add naive support for optional header flags (ContentSize,
> >> >    ContentChecksum) to enable interoperability with off-the-shelf lz4
> >> libraries
> >> >    - the only flag left unsupported is dependent-block compression,
> which
> >> >    our implementation does not currently support.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <dana.pow...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> Initiating a vote thread because the KIP-57 proposal is specific to
> >> >> the 0.10 release.
> >> >>
> >> >> KIP-57 can be accessed here:
> >> >> <
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
> >> >> >.
> >> >>
> >> >> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
> >> >> and working github PR at https://github.com/apache/kafka/pull/1212
> >> >>
> >> >> The vote will run for 72 hours.
> >> >>
> >> >> +1 (non-binding)
> >> >>
> >>
> >
> >
> >
> > --
> > Thanks,
> > Ewen
>

Reply via email to