I have some updates on this. I tried this on latest kafka 2.8. Ran my application. Results are same, snappy and lz4 dont seem to be working as uncompressed and compressed storage both measure the same.
*I even tried kafka-producer-perf-test tool*. Below are the results Without any compression: ==========================>> sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000 --record-size 102400 --topic perf-test-uncompressed --producer-props *compression.type=none* bootstrap.servers=localhost:9092 --print-metrics 100000 records sent, *862.113558 records/sec (84.19 MB/sec)*, 376.08 ms avg latency, 1083.00 ms max latency, 371 ms 50th, 610 ms 95th, 778 ms 99th, 1061 ms 99.9th. ... producer-topic-metrics:*compression-rate*:{client-id=producer-1, topic=perf-test-uncompressed} : *1.000* With snappy compression: ==========================>> sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000 --record-size 102400 --topic perf-test-uncompressed --producer-props *compression.type=snappy batch.size=100000 linger.ms <http://linger.ms>=5 *bootstrap.servers=localhost:9092 --print-metrics 100000 records sent, 599.905215 *records/sec (58.58 MB/sec)*, 540.79 ms avg latency, 1395.00 ms max latency, 521 ms 50th, 816 ms 95th, 1016 ms 99th, 1171 ms 99.9th. ... producer-topic-metrics:*compression-rate*:{client-id=producer-1, topic=perf-test-uncompressed} : *1.001* <<======++++=============== Above mentioned compression-rate didnt change even with With Gzip compression *==========================>>* sh bin/kafka-producer-perf-test.sh --num-records 100000 --throughput 10000 --record-size 102400 --topic perf-test-compressed --producer-props *compression.type=gzip* bootstrap.servers=localhost:9092 *batch.size=100000 linger.ms <http://linger.ms>=5* --print-metrics 100000 records sent, *200.760078 records/sec (19.61 MB/sec)*, 1531.40 ms avg latency, 2744.00 ms max latency, 1514 ms 50th, 1897 ms 95th, 2123 ms 99th, 2610 ms 99.9th. ... producer-topic-metrics:*compression-rate*:{client-id=producer-1, topic=perf-test-compressed} : *0.635* *<<============================* To summarise*:* compression type messages sent avg latency/throughput effective compression-rate none 100000 862.113558 records/sec (84.19 MB/sec) 1.000 snappy 100000 599.905215 records/sec (58.58 MB/sec), 1.001 gzip 100000 200.760078 records/sec (19.61 MB/sec) 0.635 In short snappy = uncompressed !! Why is this happening? On Wed, May 12, 2021 at 11:40 AM Shantanu Deshmukh <shantanu...@gmail.com> wrote: > Hey Nitin, > > I have already done that. I used dump-log-segments option. And I can see > the codec used is snappy/gzip/lz4. My question is, only gzip is giving me > compression. Rest are equivalent to uncompressed storage, > > On Wed, May 12, 2021 at 11:16 AM nitin agarwal <nitingarg...@gmail.com> > wrote: > >> You can read the data from the disk and see compression type. >> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026 >> >> Thanks, >> Nitin >> >> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu...@gmail.com >> > >> wrote: >> >> > I am trying snappy compression on my producer. Here's my setup >> > >> > Kafka - 2.0.0 >> > Spring-Kafka - 2.1.2 >> > >> > Here's my producer config >> > >> > compressed producer ========== >> > >> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, >> > bootstrapServer); >> > configProps.put( >> > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, >> > StringSerializer.class); >> > configProps.put( >> > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, >> > StringSerializer.class); >> > configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); >> > configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10); >> > >> > config of un-compressed producer ============ >> > >> > configProps.put( >> > ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, >> > bootstrapServer); >> > configProps.put( >> > ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, >> > StringSerializer.class); >> > configProps.put( >> > ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, >> > StringSerializer.class); >> > >> > My payload is almost 1mb worth of string. After sending 1000 compressed >> and >> > 1000 uncompressed such messages this is the result >> > ======================= >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc >> > /data/compressed-string-test-0/* >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index >> > 990M /data/compressed-string-test-0/00000000000000000000.log >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint >> > 990M total >> > >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc >> > /data/uncompressed-string-test-0/* >> > 8.0K /data/uncompressed-string-test-0/00000000000000000000.index >> > 992M /data/uncompressed-string-test-0/00000000000000000000.log >> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex >> > 4.0K /data/uncompressed-string-test-0/leader-epoch-checkpoint >> > 992M total >> > ======================= >> > >> > Here we can see the difference is merely 2MB. Is compression even >> working? >> > I used dump-log-segment tool >> > ======================= >> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh >> > kafka.tools.DumpLogSegments --files >> > /data/compressed-string-test-0/00000000000000000000.log >> --print-data-log | >> > head | grep compresscodec >> > >> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize: >> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1 >> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] >> > payload: >> > >> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn >> > ======================= >> > >> > I can see SNAPPY is mentioned as compression codec. But the difference >> > between compressed and uncompressed disk size is negligible. >> > >> > I tried gzip later on. And results are >> > ======================= >> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc >> > /data/compressed-string-test-0/* >> > 8.0K /data/compressed-string-test-0/00000000000000000000.index >> > 640M /data/compressed-string-test-0/00000000000000000000.log >> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex >> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint >> > 640M total >> > ======================= >> > >> > So gzip seems to have worked somehow. I tried lz4 compression as well. >> > Results were same as that of snappy. >> > >> > Is snappy/lz4 compression really working here? Gzip seems to be working >> but >> > I have read a lot that snappy gives best CPU usage to compression ratio >> > balance. So we want to go ahead with snappy. >> > >> > Please help >> > >> > *Thanks & Regards,* >> > *Shantanu* >> > >> >