Hey Nitin,

I have already done that. I used dump-log-segments option. And I can see
the codec used is snappy/gzip/lz4. My question is, only gzip is giving me
compression. Rest are equivalent to uncompressed storage,

On Wed, May 12, 2021 at 11:16 AM nitin agarwal <nitingarg...@gmail.com>
wrote:

> You can read the data from the disk and see compression type.
> https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026
>
> Thanks,
> Nitin
>
> On Wed, May 12, 2021 at 11:10 AM Shantanu Deshmukh <shantanu...@gmail.com>
> wrote:
>
> > I am trying snappy compression on my producer. Here's my setup
> >
> > Kafka - 2.0.0
> > Spring-Kafka - 2.1.2
> >
> > Here's my producer config
> >
> > compressed producer ==========
> >
> > configProps.put( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >             bootstrapServer);
> >     configProps.put(
> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(
> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
> >     configProps.put(ProducerConfig.LINGER_MS_CONFIG, 10);
> >
> > config of un-compressed producer ============
> >
> > configProps.put(
> >             ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> >             bootstrapServer);
> >     configProps.put(
> >             ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >     configProps.put(
> >             ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> >             StringSerializer.class);
> >
> > My payload is almost 1mb worth of string. After sending 1000 compressed
> and
> > 1000 uncompressed such messages this is the result
> > =======================
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > /data/compressed-string-test-0/*
> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > 990M /data/compressed-string-test-0/00000000000000000000.log
> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > 990M total
> >
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -shc
> > /data/uncompressed-string-test-0/*
> > 8.0K    /data/uncompressed-string-test-0/00000000000000000000.index
> > 992M    /data/uncompressed-string-test-0/00000000000000000000.log
> > 12K /data/uncompressed-string-test-0/00000000000000000000.timeindex
> > 4.0K    /data/uncompressed-string-test-0/leader-epoch-checkpoint
> > 992M    total
> > =======================
> >
> > Here we can see the difference is merely 2MB. Is compression even
> working?
> > I used dump-log-segment tool
> > =======================
> > [shantanu@oc0148610736 kafka_2.11-2.0.0]$ sh bin/kafka-run-class.sh
> > kafka.tools.DumpLogSegments --files
> > /data/compressed-string-test-0/00000000000000000000.log --print-data-log
> |
> > head | grep compresscodec
> >
> > offset: 0 position: 0 CreateTime: 1620744081357 isvalid: true keysize:
> > -1 valuesize: 1039999 magic: 2 compresscodec: SNAPPY producerId: -1
> > producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
> > payload:
> >
> klxhbpyxmcazvhekqnltuenwhsewjjfmctcqyrppellyfqglfnvhqctlfplslhpuulknsncbgzzndizwmlnelotcbniyprdgihdazwn
> > =======================
> >
> > I can see SNAPPY is mentioned as compression codec. But the difference
> > between compressed and uncompressed disk size is negligible.
> >
> > I tried gzip later on. And results are
> > =======================
> > [shantanu@oc0148610736 uncompressed-string-test-0]$ du -hsc
> > /data/compressed-string-test-0/*
> > 8.0K /data/compressed-string-test-0/00000000000000000000.index
> > 640M /data/compressed-string-test-0/00000000000000000000.log
> > 12K /data/compressed-string-test-0/00000000000000000000.timeindex
> > 4.0K /data/compressed-string-test-0/leader-epoch-checkpoint
> > 640M total
> > =======================
> >
> > So gzip seems to have worked somehow. I tried lz4 compression as well.
> > Results were same as that of snappy.
> >
> > Is snappy/lz4 compression really working here? Gzip seems to be working
> but
> > I have read a lot that snappy gives best CPU usage to compression ratio
> > balance. So we want to go ahead with snappy.
> >
> > Please help
> >
> > *Thanks & Regards,*
> > *Shantanu*
> >
>

Reply via email to