Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Bill Bejeck Wed, 19 Jun 2019 12:55:37 -0700

Hi Bruno,

Just getting caught up on this KIP thread.  Looks good to me, and I don't
have any additional comments to what's already been presented.


Thanks,
Bill

On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna <[email protected]> wrote:

> John and Guozhang,
>
> thank you for your comments.
>
> @Guozhang could you please also vote on the voting thread so that we
> have all votes in one place.
>
> @John, the only situation I can think of where a non-uniform
> configuration of segments would make sense is to account for
> seasonality. But this would be a really advanced configuration IMO.
>
> Best,
> Bruno
>
> On Wed, Jun 19, 2019 at 7:18 PM John Roesler <[email protected]> wrote:
> >
> > One last thought. I think it makes sense what you propose for merging
> > the metrics when a logical store is composed of multiple physical
> > stores.
> >
> > The basic standard for these metrics is that they should be relevant
> > to performance, and they should be controllable via configurations,
> > specifically via RocksDBConfigSetter. The config setter takes as input
> > the store name. For Key/Value stores, this name is visible in the tags
> > as "rocksdb-state-id", so the workflow is that I notice (e.g.) the
> > metric rocksdb-state-id=STORE_X is showing a low block cache hit
> > ratio, so I add a condition in my config setter to increase the block
> > cache size for stores named STORE_X.
> >
> > For this case where the logical store has multiple physical stores,
> > we're really talking about segmented stores, window stores and
> > segmented stores. In these stores, every segment has a different name
> > (logicalStoreName + "." + segmentId * segmentInterval), so the config
> > setter would need to use a prefix match with respect to the metric tag
> > (e.g.) rocksdb-window-state-id. But at the same time, this is totally
> > doable.
> >
> > It's also perfectly reasonable, since segments get rotated out all the
> > time, it's implausible that you'd ever want a non-uniform
> > configuration over the segments in a store. For this reason,
> > specifically, it makes more sense just to summarize the metrics than
> > to present them individually. Might be worth documenting the
> > relationship, though.
> >
> > Thanks again, I'll vote now,
> > -John
> >
> > On Wed, Jun 19, 2019 at 12:03 PM John Roesler <[email protected]> wrote:
> > >
> > > Just taking a look over the metrics again, I had one thought...
> > >
> > > Stuff that happens in a background thread (like compaction metrics)
> > > can't directly identify compactions as a bottleneck from Streams'
> > > perspective. I.e., a DB might do a lot of compactions, but if those
> > > compactions never delay a write or read, then they cannot be a
> > > bottleneck.
> > >
> > > Thus, the "stall" metric should be the starting point for bottleneck
> > > identification, and then the flush/compaction metrics can be used to
> > > secondarily identify what to do to relieve the bottleneck.
> > >
> > > This doesn't affect the metrics you proposed, but I'd suggest saying
> > > something to this effect in whatever documentation or descriptions we
> > > provide.
> > >
> > > Thanks,
> > > -John
> > >
> > > On Wed, Jun 19, 2019 at 11:25 AM John Roesler <[email protected]>
> wrote:
> > > >
> > > > Thanks for the updates.
> > > >
> > > > Personally, I'd be in favor of not going out on a limb with
> > > > unsupported metrics APIs. We should take care to make sure that what
> > > > we add in KIP-471 is stable and well supported, even if it's not the
> > > > complete picture. We can always do follow-on work to tackle complex
> > > > metrics as an isolated design exercise.
> > > >
> > > > Just my two cents.
> > > > Thanks,
> > > > -John
> > > >
> > > > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna <[email protected]>
> wrote:
> > > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > Regarding your comments about the wiki page:
> > > > >
> > > > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > > > >
> > > > > 2) Yes, I used the wrong term. All hit related metrics are ratios.
> I
> > > > > corrected the names of the affected metrics.
> > > > >
> > > > > Regarding your meta comments:
> > > > >
> > > > > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > > > > formulas compute ratios. Regarding your question about a metric to
> > > > > know from where a read is served when it is not in the memtable,
> there
> > > > > are metrics in RocksDB that give you the number of get() queries
> that
> > > > > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > > > > that give you information about whether a query was served from
> disk
> > > > > vs. OS cache. One metric that could be used to indirectly measure
> > > > > whether disk or OS cache is accessed seems to be
> READ_BLOCK_GET_MICROS
> > > > > that gives you the time for an IO read of a block. If it is high,
> it
> > > > > was read from disk, otherwise from the OS cache. A similar
> strategy to
> > > > > monitor the performance is described in [1]. DISCLAIMER:
> > > > > READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> > > > > code to understand its meaning. I could have missed something.
> > > > >
> > > > > 2) There are some additional compaction statistics that contain
> sizes
> > > > > of files on disk and numbers about write amplification that you can
> > > > > get programmatically in RocksDB, but they are for debugging
> purposes
> > > > > [2]. To get this data and publish it into a metric, one has to
> parse a
> > > > > string. Since this data is for debugging purposes, I do not know
> how
> > > > > stable the output format is. One thing, we could do, is to dump the
> > > > > string with the compaction statistics into our log files at DEBUG
> > > > > level. But that is outside of the scope of this KIP.
> > > > >
> > > > > Best,
> > > > > Bruno
> > > > >
> > > > > [1]
> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> > > > > [2]
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
> > > > >
> > > > > On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang <[email protected]>
> wrote:
> > > > > >
> > > > > > Hello Bruno,
> > > > > >
> > > > > > I've read through the aggregation section and I think they look
> good to me.
> > > > > > There are a few minor comments about the wiki page itself:
> > > > > >
> > > > > > 1) A state store might consist of multiple state stores -> You
> mean a
> > > > > > `logical` state store be consistent of multiple `physical` store
> instances?
> > > > > >
> > > > > > 2) The "Hit Rates" calculation seems to be referring to the `Hit
> Ratio`
> > > > > > (which is a percentage) than `Hit Rate`?
> > > > > >
> > > > > > And a couple further meta comments:
> > > > > >
> > > > > > 1) For memtable / block cache, instead of the hit-rate do you
> think we
> > > > > > should expose the hit-ratio? I felt it is more useful for users
> to debug
> > > > > > what's the root cause of unexpected slow performance.
> > > > > >
> > > > > > And for block cache misses, is it easy to provide a metric as of
> "target
> > > > > > read" of where a read is served (from which level, either in OS
> cache or in
> > > > > > SST files), similar to Fig.11 in
> > > > > > http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf?
> > > > > >
> > > > > > 2) As @Patrik mentioned, is there a good way we can expose the
> total amount
> > > > > > of memory and disk usage for each state store as well? I think
> it would
> > > > > > also be very helpful for users to understand their capacity
> needs and read
> > > > > > / write amplifications.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna <
> [email protected]> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I decided to go for the option in which metrics are exposed
> for each
> > > > > > > logical state store. I revisited the KIP correspondingly and
> added a
> > > > > > > section on how to aggregate metrics over multiple physical
> RocksDB
> > > > > > > instances within one logical state store. Would be great, if
> you could
> > > > > > > take a look and give feedback. If nobody has complaints about
> the
> > > > > > > chosen option I would proceed with voting on this KIP since
> this was
> > > > > > > the last open question.
> > > > > > >
> > > > > > > Best,
> > > > > > > Bruno
> > > > > > >
> > > > > > > On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl <
> [email protected]> wrote:
> > > > > > > >
> > > > > > > > Hi Sophie
> > > > > > > > This will be a good change, I have been thinking about
> proposing
> > > > > > > something similar or even passing the properties per store.
> > > > > > > > RocksDB should probably know how much memory was reserved
> but maybe does
> > > > > > > not expose it.
> > > > > > > > We are limiting it already as you suggested but this is a
> rather crude
> > > > > > > tool.
> > > > > > > > Especially in a larger topology with mixed loads par topic
> it would be
> > > > > > > helpful to get more insights which store puts a lot of load on
> memory.
> > > > > > > > Regarding the limiting capability, I think I remember
> reading that those
> > > > > > > only affect some parts of the memory and others can still
> exceed this
> > > > > > > limit. I‘ll try to look up the difference.
> > > > > > > > Best regards
> > > > > > > > Patrik
> > > > > > > >
> > > > > > > > > Am 07.06.2019 um 21:03 schrieb Sophie Blee-Goldman <
> > > > > > > [email protected]>:
> > > > > > > > >
> > > > > > > > > Hi Patrik,
> > > > > > > > >
> > > > > > > > > As of 2.3 you will be able to use the RocksDBConfigSetter
> to
> > > > > > > effectively
> > > > > > > > > bound the total memory used by RocksDB for a single app
> instance. You
> > > > > > > > > should already be able to limit the memory used per
> rocksdb store,
> > > > > > > though
> > > > > > > > > as you mention there can be a lot of them. I'm not sure
> you can
> > > > > > > monitor the
> > > > > > > > > memory usage if you are not limiting it though.
> > > > > > > > >
> > > > > > > > >> On Fri, Jun 7, 2019 at 2:06 AM Patrik Kleindl <
> [email protected]>
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> Hi
> > > > > > > > >> Thanks Bruno for the KIP, this is a very good idea.
> > > > > > > > >>
> > > > > > > > >> I have one question, are there metrics available for the
> memory
> > > > > > > consumption
> > > > > > > > >> of RocksDB?
> > > > > > > > >> As they are running outside the JVM we have run into
> issues because
> > > > > > > they
> > > > > > > > >> were using all the other memory.
> > > > > > > > >> And with multiple streams applications on the same
> machine, each with
> > > > > > > > >> several KTables and 10+ partitions per topic the number
> of stores can
> > > > > > > get
> > > > > > > > >> out of hand pretty easily.
> > > > > > > > >> Or did I miss something obvious how those can be
> monitored better?
> > > > > > > > >>
> > > > > > > > >> best regards
> > > > > > > > >>
> > > > > > > > >> Patrik
> > > > > > > > >>
> > > > > > > > >>> On Fri, 17 May 2019 at 23:54, Bruno Cadonna <
> [email protected]>
> > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> Hi all,
> > > > > > > > >>>
> > > > > > > > >>> this KIP describes the extension of the Kafka Streams'
> metrics to
> > > > > > > include
> > > > > > > > >>> RocksDB's internal statistics.
> > > > > > > > >>>
> > > > > > > > >>> Please have a look at it and let me know what you think.
> Since I am
> > > > > > > not a
> > > > > > > > >>> RocksDB expert, I am thankful for any additional pair of
> eyes that
> > > > > > > > >>> evaluates this KIP.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams
> > > > > > > > >>>
> > > > > > > > >>> Best regards,
> > > > > > > > >>> Bruno
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
>

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Reply via email to