I am trying snappy compression on my producer. Here's my setup Kafka - 2.0.0 Spring-Kafka - 2.1.2
Here's my producer config compressed producer ========== configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer); configProps.put( ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); configProps.put( ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class); configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy"); configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10); config of un-compressed producer ============ configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer); configProps.put( ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); configProps.put( ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class); My payload is almost 1mb worth of string. After sending 1000 compressed and 1000 uncompressed such messages this is the result ======================= [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc /data/compressed-string-test-0/* 8.0K /data/compressed-string-test-0/00000000000000000000.index 990M /data/compressed-string-test-0/00000000000000000000.log 12K /data/compressed-string-test-0/00000000000000000000.timeindex 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint 990M total [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc /data/uncompressed-string-test-0/* 8.0K /data/uncompressed-string-test-0/00000000000000000000.index 992M /data/uncompressed-string-test-0/00000000000000000000.log 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex 4.0K /data/uncompressed-string-test-0/leader-epoch-checkpoint 992M total ======================= Here we can see the difference is merely 2MB. Is compression even working? I used dump-log-segment tool ======================= [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /data/compressed-string-test-0/00000000000000000000.log --print-data-log | head | grep compresscodec offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize: -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] payload: klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn ======================= I can see SNAPPY is mentioned as compression codec. But the difference between compressed and uncompressed disk size is negligible. I tried gzip later on. And results are ======================= [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc /data/compressed-string-test-0/* 8.0K /data/compressed-string-test-0/00000000000000000000.index 640M /data/compressed-string-test-0/00000000000000000000.log 12K /data/compressed-string-test-0/00000000000000000000.timeindex 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint 640M total ======================= So gzip seems to have worked somehow. I tried lz4 compression as well. Results were same as that of snappy. Is snappy/lz4 compression really working here? Gzip seems to be working but I have read a lot that snappy gives best CPU usage to compression ratio balance. So we want to go ahead with snappy. Please help *Thanks & Regards,* *Shantanu*