I use lz4 compression for topic with JSON data, don't remember exact numbers but lz4 compression ratio is higher than gzip with lower CPU load. Not a huge difference but still significant.
But I found one serious issue - lz4 compression is not compatible with Spark 1.5 or higher. Spark 1.5 updated lz4 library to version 1.3.0, but Kafka depends on version 1.2.0. And both versions are not binary compatible - you could expect method not found errors (method changeRange has been moved from Utils to SafeUtils class). When I patched KafkaLZ4BlockInputStream and KafkaLZ4BlockOutputStream Kafka Client classes to use SafeUtils instead of Utils everything was working again. Alternatively you could patch Utils class in lz4 library and backport changeRange method from SafeUtils to Utils. Not very elegant and safe but it works. Regards, Marcin On 21 March 2016 at 21:17, Dana Powers <dana.pow...@gmail.com> wrote: > The LZ4 implementation "works" but has a framing bug that can make third > party client use difficult. See KAFKA-3160. If you only plan to use the > official Java client then that issue shouldn't be a problem. > > -Dana > On Mar 21, 2016 12:26 PM, "Pete Wright" <pwri...@rubiconproject.com> > wrote: > > > > > > > On 03/17/2016 04:03 PM, Virendra Pratap Singh wrote: > > > >> More like getting a feel from the community about using lz4 for > >> compression? Has anyone used in the kafka setup. > >> I am aware that gzip and snappy are more older implementation and > >> regressed. Given that lz4 has better compression/decompression cycles > >> (though slightly less compression ratio), was thinking to leveraging the > >> same. > >> Regards,Virendra > >> > >> > > i use the lz4 compression algorithm quite extensively in conjunction with > > ZFS (ZFS can configure a filesystem to do transparent compression) and > have > > not had any issues with it under load. i've also found that it does a > > better job than snappy with negligible overhead that i've been able to > > observe. > > > > i tend to avoid gzip in production as i have measured more overhead using > > this algorithm, and generally speaking i've found lz4 to compact data > > better. > > > > i am not super familiar with the lz4 code as implemented in kafka, but i > > would assume the java implementation is pretty solid. > > > > hope this helps, > > -pete > > > > -- > > Pete Wright > > Lead Systems Architect > > Rubicon Project > > pwri...@rubiconproject.com > > 310.309.9298 > > >