Re: [DISCUSS] KIP-390: Support Compression Level (rebooted)

Dongjin Lee Wed, 09 Jun 2021 08:09:44 -0700

> I am OK with doing compression level first, but I don't want to rule out
the buffer size change without understanding better.


I see. I am now retrying buffer size configuration & benchmark. As soon as
I get a promising result, I will update the KIP.

Thanks,
Dongjin

On Wed, Jun 9, 2021 at 12:36 AM Ismael Juma <[email protected]> wrote:

> Btw, I am OK with doing compression level first, but I don't want to rule
> out the buffer size change without understanding better.
>
> Ismael
>
> On Tue, Jun 8, 2021 at 8:33 AM Ismael Juma <[email protected]> wrote:
>
> > Hi Dongjin,
> >
> > I was thinking of a simple test: Snappy with 1 KB block size vs 32 KB
> > block size. If the compression rate is similar for both, then it seems
> very
> > wasteful to use 32 KB. I suspect you will see a significant difference
> > though.
> >
> > Ismael
> >
> > On Tue, Jun 8, 2021 at 8:27 AM Dongjin Lee <[email protected]> wrote:
> >
> >> Hi Ismael,
> >>
> >> I added the linear write benchmark result to the proposal. Like the
> >> producer benchmark, the least compression level showed the best MB/sec
> for
> >> any case. I tested several configurations, but the result was almost the
> >> same.
> >>
> >> If you have any proposals for the benchmark, don't hesitate to give me a
> >> suggestion. I am a newbie to run the linear write benchmark.
> >>
> >> Best,
> >> Dongjin
> >>
> >> On Sun, Jun 6, 2021 at 8:20 AM Dongjin Lee <[email protected]> wrote:
> >>
> >> > Hi Ismael,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > > So you're saying that reducing the buffer size didn't reduce the
> >> > compression rate for codecs like lz4?
> >> >
> >> > Of course, there were some improvements in compressed size when I
> tried
> >> > the 'buffer.size' option, but the gain was not significant. I tried
> >> several
> >> > datasets, but the result was the same. It made me so skeptical about
> >> adding
> >> > this option, which seemed to make the configuration option complex
> only.
> >> >
> >> > In contrast, 'compression.level' showed its effectiveness immediately.
> >> It
> >> > is why I decided to focus on the 'compression.level' in this rework.
> >> >
> >> > As you can see in the update KIP with the benchmark, IMHO, the true
> >> value
> >> > of supporting the compression option may not be the compressed size or
> >> > rate, but speed. By tweaking the compression level slightly, it showed
> >> > great produce performance gain.
> >> >
> >> > Thanks,
> >> > Dongjin
> >> >
> >> >
> >> > On Sun, Jun 6, 2021 at 6:48 AM Ismael Juma <[email protected]> wrote:
> >> >
> >> >> Thanks Dongjin. So you're saying that reducing the buffer size didn't
> >> >> reduce the compression rate for codecs like lz4? If so, that would
> >> suggest
> >> >> reducing the default value, but that seems odd.
> >> >>
> >> >> Ismael
> >> >>
> >> >> On Sat, Jun 5, 2021, 9:25 AM Dongjin Lee <[email protected]> wrote:
> >> >>
> >> >> > Hello Kafka dev,
> >> >> >
> >> >> > I hope to reboot the discussion of KIP-390: Support Compression
> Level
> >> >> > <
> >> >> >
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> >> >> > >.
> >> >> > It proposes to add a new option, 'compression.level', that controls
> >> the
> >> >> > compression level.
> >> >> >
> >> >> > This KIP has been submitted more than one year ago, but had been
> >> >> neglected
> >> >> > for a long time. Recently I reworked it from scratch with the
> >> following
> >> >> > differences:
> >> >> >
> >> >> > 1. Tested how it works with a real-world dataset. As you can see in
> >> the
> >> >> > updated KIP, *this feature can improve the producer's
> message/second
> >> >> rate
> >> >> > by more than 50%*, such a significant enhancement.
> >> >> > 2. Dropped 'compression.buffer.size' option that was in the initial
> >> >> work.
> >> >> > With the repeated benchmarks, I could not find any evidence this
> >> option
> >> >> > results in meaningful differences. So I removed it.
> >> >> >
> >> >> > All feedback will be highly appreciated.
> >> >> >
> >> >> > Best,
> >> >> > Dongjin
> >> >> >
> >> >> >
> >> >> > --
> >> >> > *Dongjin Lee*
> >> >> >
> >> >> > *A hitchhiker in the mathematical world.*
> >> >> >
> >> >> >
> >> >> >
> >> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> >> > <https://github.com/dongjinleekr>keybase:
> >> >> https://keybase.io/dongjinleekr
> >> >> > <https://keybase.io/dongjinleekr>linkedin:
> >> >> kr.linkedin.com/in/dongjinleekr
> >> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> >> > speakerdeck.com/dongjin
> >> >> > <https://speakerdeck.com/dongjin>*
> >> >> >
> >> >>
> >> >
> >> >
> >> > --
> >> > *Dongjin Lee*
> >> >
> >> > *A hitchhiker in the mathematical world.*
> >> >
> >> >
> >> >
> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > <https://github.com/dongjinleekr>keybase:
> >> https://keybase.io/dongjinleekr
> >> > <https://keybase.io/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> speakerdeck.com/dongjin
> >> > <https://speakerdeck.com/dongjin>*
> >> >
> >>
> >>
> >> --
> >> *Dongjin Lee*
> >>
> >> *A hitchhiker in the mathematical world.*
> >>
> >>
> >>
> >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> <https://github.com/dongjinleekr>keybase:
> https://keybase.io/dongjinleekr
> >> <https://keybase.io/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> speakerdeck.com/dongjin
> >> <https://speakerdeck.com/dongjin>*
> >>
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*



*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr
<https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Support Compression Level (rebooted)

Reply via email to