Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Patrik Kleindl Mon, 15 Jul 2019 03:12:07 -0700

Hi
Adding this here because I asked and may have found the answer:
The memory consumption may not be exposed as RocksDB metrics, but they
should be available as properties of the RocksDB instance itself.
RocksDBStore could easily make this available by allowing access
to db.getProperty.
Available data (examples):
rocksdb.block-cache-usage
rocksdb.estimate-table-readers-mem
rocksdb.cur-size-all-mem-tables


Reference: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB and
found the names of the properties here
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h

Would it make sense to allow this kind of access?
Should this be part of a follow-up KIP?

best regards

Patrik

On Wed, 19 Jun 2019 at 21:55, Bill Bejeck <bbej...@gmail.com> wrote:

> Hi Bruno,
>
> Just getting caught up on this KIP thread.  Looks good to me, and I don't
> have any additional comments to what's already been presented.
>
> Thanks,
> Bill
>
> On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna <br...@confluent.io> wrote:
>
> > John and Guozhang,
> >
> > thank you for your comments.
> >
> > @Guozhang could you please also vote on the voting thread so that we
> > have all votes in one place.
> >
> > @John, the only situation I can think of where a non-uniform
> > configuration of segments would make sense is to account for
> > seasonality. But this would be a really advanced configuration IMO.
> >
> > Best,
> > Bruno
> >
> > On Wed, Jun 19, 2019 at 7:18 PM John Roesler <j...@confluent.io> wrote:
> > >
> > > One last thought. I think it makes sense what you propose for merging
> > > the metrics when a logical store is composed of multiple physical
> > > stores.
> > >
> > > The basic standard for these metrics is that they should be relevant
> > > to performance, and they should be controllable via configurations,
> > > specifically via RocksDBConfigSetter. The config setter takes as input
> > > the store name. For Key/Value stores, this name is visible in the tags
> > > as "rocksdb-state-id", so the workflow is that I notice (e.g.) the
> > > metric rocksdb-state-id=STORE_X is showing a low block cache hit
> > > ratio, so I add a condition in my config setter to increase the block
> > > cache size for stores named STORE_X.
> > >
> > > For this case where the logical store has multiple physical stores,
> > > we're really talking about segmented stores, window stores and
> > > segmented stores. In these stores, every segment has a different name
> > > (logicalStoreName + "." + segmentId * segmentInterval), so the config
> > > setter would need to use a prefix match with respect to the metric tag
> > > (e.g.) rocksdb-window-state-id. But at the same time, this is totally
> > > doable.
> > >
> > > It's also perfectly reasonable, since segments get rotated out all the
> > > time, it's implausible that you'd ever want a non-uniform
> > > configuration over the segments in a store. For this reason,
> > > specifically, it makes more sense just to summarize the metrics than
> > > to present them individually. Might be worth documenting the
> > > relationship, though.
> > >
> > > Thanks again, I'll vote now,
> > > -John
> > >
> > > On Wed, Jun 19, 2019 at 12:03 PM John Roesler <j...@confluent.io>
> wrote:
> > > >
> > > > Just taking a look over the metrics again, I had one thought...
> > > >
> > > > Stuff that happens in a background thread (like compaction metrics)
> > > > can't directly identify compactions as a bottleneck from Streams'
> > > > perspective. I.e., a DB might do a lot of compactions, but if those
> > > > compactions never delay a write or read, then they cannot be a
> > > > bottleneck.
> > > >
> > > > Thus, the "stall" metric should be the starting point for bottleneck
> > > > identification, and then the flush/compaction metrics can be used to
> > > > secondarily identify what to do to relieve the bottleneck.
> > > >
> > > > This doesn't affect the metrics you proposed, but I'd suggest saying
> > > > something to this effect in whatever documentation or descriptions we
> > > > provide.
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > > > On Wed, Jun 19, 2019 at 11:25 AM John Roesler <j...@confluent.io>
> > wrote:
> > > > >
> > > > > Thanks for the updates.
> > > > >
> > > > > Personally, I'd be in favor of not going out on a limb with
> > > > > unsupported metrics APIs. We should take care to make sure that
> what
> > > > > we add in KIP-471 is stable and well supported, even if it's not
> the
> > > > > complete picture. We can always do follow-on work to tackle complex
> > > > > metrics as an isolated design exercise.
> > > > >
> > > > > Just my two cents.
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna <br...@confluent.io>
> > wrote:
> > > > > >
> > > > > > Hi Guozhang,
> > > > > >
> > > > > > Regarding your comments about the wiki page:
> > > > > >
> > > > > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > > > > >
> > > > > > 2) Yes, I used the wrong term. All hit related metrics are
> ratios.
> > I
> > > > > > corrected the names of the affected metrics.
> > > > > >
> > > > > > Regarding your meta comments:
> > > > > >
> > > > > > 1) The plan is to expose the hit ratio. I used the wrong term.
> The
> > > > > > formulas compute ratios. Regarding your question about a metric
> to
> > > > > > know from where a read is served when it is not in the memtable,
> > there
> > > > > > are metrics in RocksDB that give you the number of get() queries
> > that
> > > > > > are served from L0, L1, and L2_AND_UP. I could not find any
> metric
> > > > > > that give you information about whether a query was served from
> > disk
> > > > > > vs. OS cache. One metric that could be used to indirectly measure
> > > > > > whether disk or OS cache is accessed seems to be
> > READ_BLOCK_GET_MICROS
> > > > > > that gives you the time for an IO read of a block. If it is high,
> > it
> > > > > > was read from disk, otherwise from the OS cache. A similar
> > strategy to
> > > > > > monitor the performance is described in [1]. DISCLAIMER:
> > > > > > READ_BLOCK_GET_MICROS is not documented. I had to look into the
> C++
> > > > > > code to understand its meaning. I could have missed something.
> > > > > >
> > > > > > 2) There are some additional compaction statistics that contain
> > sizes
> > > > > > of files on disk and numbers about write amplification that you
> can
> > > > > > get programmatically in RocksDB, but they are for debugging
> > purposes
> > > > > > [2]. To get this data and publish it into a metric, one has to
> > parse a
> > > > > > string. Since this data is for debugging purposes, I do not know
> > how
> > > > > > stable the output format is. One thing, we could do, is to dump
> the
> > > > > > string with the compaction statistics into our log files at DEBUG
> > > > > > level. But that is outside of the scope of this KIP.
> > > > > >
> > > > > > Best,
> > > > > > Bruno
> > > > > >
> > > > > > [1]
> >
> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> > > > > > [2]
> >
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
> > > > > >
> > > > > > On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang <
> wangg...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Hello Bruno,
> > > > > > >
> > > > > > > I've read through the aggregation section and I think they look
> > good to me.
> > > > > > > There are a few minor comments about the wiki page itself:
> > > > > > >
> > > > > > > 1) A state store might consist of multiple state stores -> You
> > mean a
> > > > > > > `logical` state store be consistent of multiple `physical`
> store
> > instances?
> > > > > > >
> > > > > > > 2) The "Hit Rates" calculation seems to be referring to the
> `Hit
> > Ratio`
> > > > > > > (which is a percentage) than `Hit Rate`?
> > > > > > >
> > > > > > > And a couple further meta comments:
> > > > > > >
> > > > > > > 1) For memtable / block cache, instead of the hit-rate do you
> > think we
> > > > > > > should expose the hit-ratio? I felt it is more useful for users
> > to debug
> > > > > > > what's the root cause of unexpected slow performance.
> > > > > > >
> > > > > > > And for block cache misses, is it easy to provide a metric as
> of
> > "target
> > > > > > > read" of where a read is served (from which level, either in OS
> > cache or in
> > > > > > > SST files), similar to Fig.11 in
> > > > > > > http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf?
> > > > > > >
> > > > > > > 2) As @Patrik mentioned, is there a good way we can expose the
> > total amount
> > > > > > > of memory and disk usage for each state store as well? I think
> > it would
> > > > > > > also be very helpful for users to understand their capacity
> > needs and read
> > > > > > > / write amplifications.
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna <
> > br...@confluent.io> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I decided to go for the option in which metrics are exposed
> > for each
> > > > > > > > logical state store. I revisited the KIP correspondingly and
> > added a
> > > > > > > > section on how to aggregate metrics over multiple physical
> > RocksDB
> > > > > > > > instances within one logical state store. Would be great, if
> > you could
> > > > > > > > take a look and give feedback. If nobody has complaints about
> > the
> > > > > > > > chosen option I would proceed with voting on this KIP since
> > this was
> > > > > > > > the last open question.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Bruno
> > > > > > > >
> > > > > > > > On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl <
> > pklei...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Sophie
> > > > > > > > > This will be a good change, I have been thinking about
> > proposing
> > > > > > > > something similar or even passing the properties per store.
> > > > > > > > > RocksDB should probably know how much memory was reserved
> > but maybe does
> > > > > > > > not expose it.
> > > > > > > > > We are limiting it already as you suggested but this is a
> > rather crude
> > > > > > > > tool.
> > > > > > > > > Especially in a larger topology with mixed loads par topic
> > it would be
> > > > > > > > helpful to get more insights which store puts a lot of load
> on
> > memory.
> > > > > > > > > Regarding the limiting capability, I think I remember
> > reading that those
> > > > > > > > only affect some parts of the memory and others can still
> > exceed this
> > > > > > > > limit. I‘ll try to look up the difference.
> > > > > > > > > Best regards
> > > > > > > > > Patrik
> > > > > > > > >
> > > > > > > > > > Am 07.06.2019 um 21:03 schrieb Sophie Blee-Goldman <
> > > > > > > > sop...@confluent.io>:
> > > > > > > > > >
> > > > > > > > > > Hi Patrik,
> > > > > > > > > >
> > > > > > > > > > As of 2.3 you will be able to use the RocksDBConfigSetter
> > to
> > > > > > > > effectively
> > > > > > > > > > bound the total memory used by RocksDB for a single app
> > instance. You
> > > > > > > > > > should already be able to limit the memory used per
> > rocksdb store,
> > > > > > > > though
> > > > > > > > > > as you mention there can be a lot of them. I'm not sure
> > you can
> > > > > > > > monitor the
> > > > > > > > > > memory usage if you are not limiting it though.
> > > > > > > > > >
> > > > > > > > > >> On Fri, Jun 7, 2019 at 2:06 AM Patrik Kleindl <
> > pklei...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> Hi
> > > > > > > > > >> Thanks Bruno for the KIP, this is a very good idea.
> > > > > > > > > >>
> > > > > > > > > >> I have one question, are there metrics available for the
> > memory
> > > > > > > > consumption
> > > > > > > > > >> of RocksDB?
> > > > > > > > > >> As they are running outside the JVM we have run into
> > issues because
> > > > > > > > they
> > > > > > > > > >> were using all the other memory.
> > > > > > > > > >> And with multiple streams applications on the same
> > machine, each with
> > > > > > > > > >> several KTables and 10+ partitions per topic the number
> > of stores can
> > > > > > > > get
> > > > > > > > > >> out of hand pretty easily.
> > > > > > > > > >> Or did I miss something obvious how those can be
> > monitored better?
> > > > > > > > > >>
> > > > > > > > > >> best regards
> > > > > > > > > >>
> > > > > > > > > >> Patrik
> > > > > > > > > >>
> > > > > > > > > >>> On Fri, 17 May 2019 at 23:54, Bruno Cadonna <
> > br...@confluent.io>
> > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> Hi all,
> > > > > > > > > >>>
> > > > > > > > > >>> this KIP describes the extension of the Kafka Streams'
> > metrics to
> > > > > > > > include
> > > > > > > > > >>> RocksDB's internal statistics.
> > > > > > > > > >>>
> > > > > > > > > >>> Please have a look at it and let me know what you
> think.
> > Since I am
> > > > > > > > not a
> > > > > > > > > >>> RocksDB expert, I am thankful for any additional pair
> of
> > eyes that
> > > > > > > > > >>> evaluates this KIP.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams
> > > > > > > > > >>>
> > > > > > > > > >>> Best regards,
> > > > > > > > > >>> Bruno
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> >
>

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Reply via email to