Thanks Colt!

For a library's default configs, I think the principle would be "it
runs appropriately out of the box for the first time you played with
it", so I'm not suggesting we should try to make sure it is a
generally good combination for a wide range of production usage since
in that case most people would prefer to have some customizations
rather than blindly accept those default config values anyways. Hence,
what I had in mind regarding "benchmarks" is something light, like
running those stateful examples in our tutorials
(https://github.com/apache/kafka/tree/trunk/streams/examples/src/main/java/org/apache/kafka/streams/examples),
and see if the new config over all gives a better performance. It's
not required to be a very comprehensive one. If you could help with
that validation it would be great.


Guozhang

On Wed, Jul 26, 2023 at 8:38 AM Colt McNealy <c...@littlehorse.io> wrote:
>
> Guozhang,
>
> Thanks for your response. That makes a lot of sense; I can't promise any
> super-formal benchmarks but we will definitely play with the configurations
> you sent and report back within a month about our high-level findings.
>
> For our purposes (a workflow engine), we will mostly monitor workflow
> execution metrics + state store restoration times. But in the interest of a
> formal benchmark that could be included in a KIP—what monitoring software
> tooling and setup environment would you recommend? If it doesn't involve
> writing copious amounts of custom code, perhaps (no promises) my team could
> put something together that's more suitable for a general Streams audience
> rather than just our own internal usage.
>
> Cheers,
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Sun, Jul 23, 2023 at 11:21 AM Guozhang Wang <guozhang.wang...@gmail.com>
> wrote:
>
> > Yeah I can shed some light here: I used Universal originally since at
> > the beginning of Kafka Streams journey there were user reports
> > complaining about its storage amplifications. But soon enough (around
> > 2019) I've realized that, as a OOTB config, level compaction may be
> > more preferable.
> >
> > I had a PR dating back to that time where I suggested changing a bunch
> > of OOTB configs or RocksDB including the compaction config:
> > https://github.com/apache/kafka/pull/6406/files, unfortunately it was
> > not merged since I wanted to run some benchmarks to make sure it does
> > not have any gotchas but never got the time to do so. I would be very
> > happy in fact if someone could pick that up and re-examine if they
> > still make sense, and if yes drive it through and merge.
> >
> > Guozhang
> >
> >
> > On Sun, Jul 23, 2023 at 10:29 AM Matthias J. Sax <mj...@apache.org> wrote:
> > >
> > > Do you happen to know?
> > >
> > >
> > > -------- Forwarded Message --------
> > > Subject: Streams/RocksDB: Why Universal Compaction?
> > > Date: Fri, 23 Jun 2023 13:19:36 -0700
> > > From: Colt McNealy <c...@littlehorse.io>
> > > Reply-To: users@kafka.apache.org
> > > To: users@kafka.apache.org
> > >
> > > Hello there!
> > >
> > > I was wondering if anyone (perhaps an early developer or power-user of
> > > Kafka Streams) knows why the Streams developers made the default setting
> > > for RocksDB compaction "Universal" compaction rather than "Level"
> > > compaction?
> > >
> > > My understanding (in which I am extremely UNconfident) is as follows—
> > >
> > > Supposedly Universal compaction leads to lower write amplification after
> > > compaction finishes. In a run of Universal compaction, all data is
> > > compacted; as per the RocksDB documentation it is possible for temporary
> > > write amplification of up to 2x during this process. There have also been
> > > reports of "write stalls" during this process [1].
> > >
> > > In Level compaction, only certain levels (tiers of SST files) are
> > compacted
> > > at once, meaning that the compaction process is shorter and less
> > intensive,
> > > but that write amplification after compaction finishes is higher than
> > with
> > > universal compaction.
> > >
> > > Can anyone confirm/deny/correct this?
> > >
> > > [1] https://github.com/solana-labs/solana/issues/14586 (not
> > > Streams-related, but it is RocksDB)
> > >
> > > Thanks in advance,
> > > Colt McNealy
> > >
> > > *Founder, LittleHorse.dev*
> > >
> >

Reply via email to