Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Guozhang Wang Mon, 20 Jul 2020 15:40:03 -0700

Hello Bruno,

Thanks for the updated KIP. I made a pass and here are some comments:


1) What's the motivation of keeping it as INFO while KIP-471 metrics are
defined in DEBUG?

2) Some namings are a bit inconsistent with others and with KIP-471, for
example:

2.a) KIP-471 uses "memtable" while in this KIP we use "mem-table", also the
"memtable" is prefixed and then the metric name. I'd suggest we keep them
consistent. e.g. "num-immutable-mem-table" => "immutable-memtable-count",
"cur-size-active-mem-table" => "active-memable-bytes"

2.b) "immutable" are abbreviated as "imm" in some names but not in others,
I'd suggest we do not use abbreviations across all names,
e.g. "num-entries-imm-mem-tables" => "immutable-memtable-num-entries".

2.c) "-size" "-num" semantics is usually a bit unclear, and I'd suggest we
just more concrete terms, e.g. "total-sst-files-size" =>
"total-sst-files-bytes", "num-live-versions" => "live-versions-count",
"background-errors" => "background-errors-count".

3) Some metrics are a bit confusing, e.g.

3.a) What's the difference between "cur-size-all-mem-tables" and
"size-all-mem-tables"?

3.b) And the explanation of "estimate-table-readers-mem" does not read very
clear to me either, does it refer to "estimate-sst-file-read-buffer-bytes"?

3.c) How does "estimate-oldest-key-time" help with memory usage debugging?

4) For my own education, does "estimate-pending-compaction-bytes" capture
all the memory usage for compaction buffers?

5) This is just of a nit comment to help readers better understand rocksDB:
maybe we can explain in the wiki doc which part of rocksDB uses memory
(block cache, OS cache, memtable, compaction buffer, read buffer), and
which of them are on-heap and wich of them are off-heap, which can be hard
bounded and which can only be soft bounded and which cannot be bounded at
all, etc.


Guozhang


On Mon, Jul 20, 2020 at 11:00 AM Bruno Cadonna <br...@confluent.io> wrote:

> Hi,
>
> During the implementation of this KIP and after some discussion about
> RocksDB metrics, I decided to make some major modifications to this KIP
> and kick off discussion again.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB
>
> Best,
> Bruno
>
> On 15.05.20 17:11, Bill Bejeck wrote:
> > Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory
> > reporting will be a welcomed addition.
> >
> > FWIW I also agree to have the metrics reported on a store level. I'm glad
> > you changed the KIP to that effect.
> >
> > -Bill
> >
> >
> >
> > On Wed, May 13, 2020 at 6:24 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> >> Hi Bruno,
> >>
> >> Sounds good to me.
> >>
> >> I think I'm just a bit more curious to see its impact on performance: as
> >> long as we have one INFO level rocksDB metrics, then we'd have to turn
> on
> >> the scheduled rocksdb metrics recorder whereas previously, we can
> decide to
> >> not turn on the recorder at all if all are set as DEBUG and we
> configure at
> >> INFO level in production. But this is an implementation detail anyways
> and
> >> maybe the impact is negligible after all. We can check and re-discuss
> this
> >> afterwards :)
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Wed, May 13, 2020 at 9:34 AM Sophie Blee-Goldman <
> sop...@confluent.io>
> >> wrote:
> >>
> >>> Thanks Bruno! I took a look at the revised KIP and it looks good to me.
> >>>
> >>> Sophie
> >>>
> >>> On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna <br...@confluent.io>
> >> wrote:
> >>>
> >>>> Hi John,
> >>>>
> >>>> Thank you for the feedback!
> >>>>
> >>>> I agree and I will change the KIP as I stated in my previous e-mail to
> >>>> Guozhang.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On Tue, May 12, 2020 at 3:07 AM John Roesler <vvcep...@apache.org>
> >>> wrote:
> >>>>>
> >>>>> Thanks, all.
> >>>>>
> >>>>> If you don’t mind, I’ll pitch in a few cents’ worth.
> >>>>>
> >>>>> In my life I’ve generally found more granular metrics to be more
> >>> useful,
> >>>> as long as there’s a sane way to roll them up. It does seem nice to
> see
> >>> it
> >>>> on the per-store level. For roll-up purposes, the task and thread tags
> >>>> should be sufficient.
> >>>>>
> >>>>> I think the only reason we make some metrics Debug is that
> >> _recording_
> >>>> them can be expensive. If there’s no added expense, I think we can
> just
> >>>> register store-level metrics at Info level.
> >>>>>
> >>>>> Thanks for the KIP, Bruno!
> >>>>> -John
> >>>>>
> >>>>> On Mon, May 11, 2020, at 17:32, Guozhang Wang wrote:
> >>>>>> Hello Sophie / Bruno,
> >>>>>>
> >>>>>> I've also thought about the leveling question, and one motivation I
> >>>> had for
> >>>>>> setting it in instance-level is that we want to expose it in INFO
> >>>> level:
> >>>>>> today our report leveling is not very finer grained --- which I
> >> think
> >>>> is
> >>>>>> sth. worth itself --- such that one have to either turn on all
> >> DEBUG
> >>>>>> metrics recording or none of them. If we can allow users to e.g.
> >>>> specify
> >>>>>> "turn on streams-metrics and stream-state-metrics, but not others"
> >>> and
> >>>> then
> >>>>>> I think it should be just at store-level. However, right now if we
> >>>> want to
> >>>>>> set it as store-level then it would be DEBUG and not exposed by
> >>>> default.
> >>>>>>
> >>>>>> So it seems we have several options in addition to the proposed
> >> one:
> >>>>>>
> >>>>>> a) we set it at store-level as INFO; but then one can argue why
> >> this
> >>> is
> >>>>>> INFO while others (bytes-written, etc) are DEBUG.
> >>>>>> b) we set it at store-level as DEBUG, believing that we do not
> >>> usually
> >>>> need
> >>>>>> to turn it on.
> >>>>>> c) maybe, we can set it at task-level (? I'm not so sure myself
> >> about
> >>>>>> this.. :P) as INFO.
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, May 11, 2020 at 12:29 PM Sophie Blee-Goldman <
> >>>> sop...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey Bruno,
> >>>>>>>
> >>>>>>> Thanks for the KIP! I have one high-level concern, which is that
> >> we
> >>>> should
> >>>>>>> consider
> >>>>>>> reporting these metrics on the per-store level rather than
> >>>> instance-wide. I
> >>>>>>> know I was
> >>>>>>> the one who first proposed making it instance-wide, so bear with
> >>> me:
> >>>>>>>
> >>>>>>> While I would still argue that the instance-wide memory usage is
> >>>> probably
> >>>>>>> the most *useful*,
> >>>>>>> exposing them at the store-level does not prevent users from
> >>>> monitoring the
> >>>>>>> instance-wide
> >>>>>>> memory. They should be able to roll up all the store-level
> >> metrics
> >>>> on an
> >>>>>>> instance to
> >>>>>>> compute the total off-heap memory. But rolling it up for the
> >> users
> >>>> does
> >>>>>>> prevent them from
> >>>>>>> using this to debug rare cases where one store may be using
> >>>> significantly
> >>>>>>> more memory than
> >>>>>>> expected.
> >>>>>>>
> >>>>>>> It's also worth considering that some users may be using the
> >>> bounded
> >>>> memory
> >>>>>>> config setter
> >>>>>>> to put a cap on the off-heap memory of the entire process, in
> >> which
> >>>> case
> >>>>>>> the memory usage
> >>>>>>> metric for any one store should reflect the memory usage of the
> >>>> entire
> >>>>>>> instance. In that case
> >>>>>>> any effort to roll up the memory usages ourselves would just be
> >>>> wasted.
> >>>>>>>
> >>>>>>> Sorry for the reversal, but after a second thought I'm pretty
> >>>> strongly in
> >>>>>>> favor of reporting these
> >>>>>>> at the store level.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Sophie
> >>>>>>>
> >>>>>>> On Wed, May 6, 2020 at 8:41 AM Bruno Cadonna <br...@confluent.io
> >>>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I'd like to discuss KIP-607 that aims to add RocksDB memory
> >> usage
> >>>>>>>> metrics to Kafka Streams.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Bruno
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Reply via email to