Hi Patrik, Ah, I guess that was an important distinction! Thanks for clarifying. If there's a method to call on the RocksDB, then you probably want to consider just registering the metric as a Gauge that delegates to RocksDB to get the value, as opposed to the normal pattern of using a Sensor with sampled stats metrics.
Thanks, -John On Mon, Jul 15, 2019 at 1:05 PM Patrik Kleindl <pklei...@gmail.com> wrote: > > Hi John > Translated it wrong, I meant ‚might‘ instead of ‚may‘. > If I find the time I‘ll take a look at the code how they could be added as > metrics. > Thanks for your input. > Regards > Patrik > > > Am 15.07.2019 um 18:03 schrieb John Roesler <j...@confluent.io>: > > > > Hey Patrik, > > > > Since KIP-471 is already accepted, and since this idea is not a > > trivial extension of the KIP, I think we'd need to do a new KIP. > > > > Some points to consider: these additions could not be made to the > > KeyValueStore interface, since they'd be only applicable to > > RocksDB-backed stores, but there is no RocksDBKeyValueStore<K,V>, > > which could be used to query the property. Rather, the rocks interface > > is wrapped inside multiple layers that comprise the KeyValueStore. > > Since these layers themselves are not public interfaces, it would be a > > challenge to think of how to get access to these methods from > > user-space code. > > > > On the other hand, could we just add new metrics to access those > > properties? Offhand, it's unclear to me why you said they "may not be > > exposed". > > > > Thanks, > > -John > > > >> On Mon, Jul 15, 2019 at 5:11 AM Patrik Kleindl <pklei...@gmail.com> wrote: > >> > >> Hi > >> Adding this here because I asked and may have found the answer: > >> The memory consumption may not be exposed as RocksDB metrics, but they > >> should be available as properties of the RocksDB instance itself. > >> RocksDBStore could easily make this available by allowing access > >> to db.getProperty. > >> Available data (examples): > >> rocksdb.block-cache-usage > >> rocksdb.estimate-table-readers-mem > >> rocksdb.cur-size-all-mem-tables > >> > >> Reference: > >> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB and > >> found the names of the properties here > >> https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h > >> > >> Would it make sense to allow this kind of access? > >> Should this be part of a follow-up KIP? > >> > >> best regards > >> > >> Patrik > >> > >>> On Wed, 19 Jun 2019 at 21:55, Bill Bejeck <bbej...@gmail.com> wrote: > >>> > >>> Hi Bruno, > >>> > >>> Just getting caught up on this KIP thread. Looks good to me, and I don't > >>> have any additional comments to what's already been presented. > >>> > >>> Thanks, > >>> Bill > >>> > >>>> On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna <br...@confluent.io> wrote: > >>>> > >>>> John and Guozhang, > >>>> > >>>> thank you for your comments. > >>>> > >>>> @Guozhang could you please also vote on the voting thread so that we > >>>> have all votes in one place. > >>>> > >>>> @John, the only situation I can think of where a non-uniform > >>>> configuration of segments would make sense is to account for > >>>> seasonality. But this would be a really advanced configuration IMO. > >>>> > >>>> Best, > >>>> Bruno > >>>> > >>>>> On Wed, Jun 19, 2019 at 7:18 PM John Roesler <j...@confluent.io> wrote: > >>>>> > >>>>> One last thought. I think it makes sense what you propose for merging > >>>>> the metrics when a logical store is composed of multiple physical > >>>>> stores. > >>>>> > >>>>> The basic standard for these metrics is that they should be relevant > >>>>> to performance, and they should be controllable via configurations, > >>>>> specifically via RocksDBConfigSetter. The config setter takes as input > >>>>> the store name. For Key/Value stores, this name is visible in the tags > >>>>> as "rocksdb-state-id", so the workflow is that I notice (e.g.) the > >>>>> metric rocksdb-state-id=STORE_X is showing a low block cache hit > >>>>> ratio, so I add a condition in my config setter to increase the block > >>>>> cache size for stores named STORE_X. > >>>>> > >>>>> For this case where the logical store has multiple physical stores, > >>>>> we're really talking about segmented stores, window stores and > >>>>> segmented stores. In these stores, every segment has a different name > >>>>> (logicalStoreName + "." + segmentId * segmentInterval), so the config > >>>>> setter would need to use a prefix match with respect to the metric tag > >>>>> (e.g.) rocksdb-window-state-id. But at the same time, this is totally > >>>>> doable. > >>>>> > >>>>> It's also perfectly reasonable, since segments get rotated out all the > >>>>> time, it's implausible that you'd ever want a non-uniform > >>>>> configuration over the segments in a store. For this reason, > >>>>> specifically, it makes more sense just to summarize the metrics than > >>>>> to present them individually. Might be worth documenting the > >>>>> relationship, though. > >>>>> > >>>>> Thanks again, I'll vote now, > >>>>> -John > >>>>> > >>>>> On Wed, Jun 19, 2019 at 12:03 PM John Roesler <j...@confluent.io> > >>> wrote: > >>>>>> > >>>>>> Just taking a look over the metrics again, I had one thought... > >>>>>> > >>>>>> Stuff that happens in a background thread (like compaction metrics) > >>>>>> can't directly identify compactions as a bottleneck from Streams' > >>>>>> perspective. I.e., a DB might do a lot of compactions, but if those > >>>>>> compactions never delay a write or read, then they cannot be a > >>>>>> bottleneck. > >>>>>> > >>>>>> Thus, the "stall" metric should be the starting point for bottleneck > >>>>>> identification, and then the flush/compaction metrics can be used to > >>>>>> secondarily identify what to do to relieve the bottleneck. > >>>>>> > >>>>>> This doesn't affect the metrics you proposed, but I'd suggest saying > >>>>>> something to this effect in whatever documentation or descriptions we > >>>>>> provide. > >>>>>> > >>>>>> Thanks, > >>>>>> -John > >>>>>> > >>>>>> On Wed, Jun 19, 2019 at 11:25 AM John Roesler <j...@confluent.io> > >>>> wrote: > >>>>>>> > >>>>>>> Thanks for the updates. > >>>>>>> > >>>>>>> Personally, I'd be in favor of not going out on a limb with > >>>>>>> unsupported metrics APIs. We should take care to make sure that > >>> what > >>>>>>> we add in KIP-471 is stable and well supported, even if it's not > >>> the > >>>>>>> complete picture. We can always do follow-on work to tackle complex > >>>>>>> metrics as an isolated design exercise. > >>>>>>> > >>>>>>> Just my two cents. > >>>>>>> Thanks, > >>>>>>> -John > >>>>>>> > >>>>>>> On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna <br...@confluent.io> > >>>> wrote: > >>>>>>>> > >>>>>>>> Hi Guozhang, > >>>>>>>> > >>>>>>>> Regarding your comments about the wiki page: > >>>>>>>> > >>>>>>>> 1) Exactly, I rephrased the paragraph to make it more clear. > >>>>>>>> > >>>>>>>> 2) Yes, I used the wrong term. All hit related metrics are > >>> ratios. > >>>> I > >>>>>>>> corrected the names of the affected metrics. > >>>>>>>> > >>>>>>>> Regarding your meta comments: > >>>>>>>> > >>>>>>>> 1) The plan is to expose the hit ratio. I used the wrong term. > >>> The > >>>>>>>> formulas compute ratios. Regarding your question about a metric > >>> to > >>>>>>>> know from where a read is served when it is not in the memtable, > >>>> there > >>>>>>>> are metrics in RocksDB that give you the number of get() queries > >>>> that > >>>>>>>> are served from L0, L1, and L2_AND_UP. I could not find any > >>> metric > >>>>>>>> that give you information about whether a query was served from > >>>> disk > >>>>>>>> vs. OS cache. One metric that could be used to indirectly measure > >>>>>>>> whether disk or OS cache is accessed seems to be > >>>> READ_BLOCK_GET_MICROS > >>>>>>>> that gives you the time for an IO read of a block. If it is high, > >>>> it > >>>>>>>> was read from disk, otherwise from the OS cache. A similar > >>>> strategy to > >>>>>>>> monitor the performance is described in [1]. DISCLAIMER: > >>>>>>>> READ_BLOCK_GET_MICROS is not documented. I had to look into the > >>> C++ > >>>>>>>> code to understand its meaning. I could have missed something. > >>>>>>>> > >>>>>>>> 2) There are some additional compaction statistics that contain > >>>> sizes > >>>>>>>> of files on disk and numbers about write amplification that you > >>> can > >>>>>>>> get programmatically in RocksDB, but they are for debugging > >>>> purposes > >>>>>>>> [2]. To get this data and publish it into a metric, one has to > >>>> parse a > >>>>>>>> string. Since this data is for debugging purposes, I do not know > >>>> how > >>>>>>>> stable the output format is. One thing, we could do, is to dump > >>> the > >>>>>>>> string with the compaction statistics into our log files at DEBUG > >>>>>>>> level. But that is outside of the scope of this KIP. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Bruno > >>>>>>>> > >>>>>>>> [1] > >>>> > >>> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency > >>>>>>>> [2] > >>>> > >>> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics > >>>>>>>> > >>>>>>>> On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang < > >>> wangg...@gmail.com> > >>>> wrote: > >>>>>>>>> > >>>>>>>>> Hello Bruno, > >>>>>>>>> > >>>>>>>>> I've read through the aggregation section and I think they look > >>>> good to me. > >>>>>>>>> There are a few minor comments about the wiki page itself: > >>>>>>>>> > >>>>>>>>> 1) A state store might consist of multiple state stores -> You > >>>> mean a > >>>>>>>>> `logical` state store be consistent of multiple `physical` > >>> store > >>>> instances? > >>>>>>>>> > >>>>>>>>> 2) The "Hit Rates" calculation seems to be referring to the > >>> `Hit > >>>> Ratio` > >>>>>>>>> (which is a percentage) than `Hit Rate`? > >>>>>>>>> > >>>>>>>>> And a couple further meta comments: > >>>>>>>>> > >>>>>>>>> 1) For memtable / block cache, instead of the hit-rate do you > >>>> think we > >>>>>>>>> should expose the hit-ratio? I felt it is more useful for users > >>>> to debug > >>>>>>>>> what's the root cause of unexpected slow performance. > >>>>>>>>> > >>>>>>>>> And for block cache misses, is it easy to provide a metric as > >>> of > >>>> "target > >>>>>>>>> read" of where a read is served (from which level, either in OS > >>>> cache or in > >>>>>>>>> SST files), similar to Fig.11 in > >>>>>>>>> http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf? > >>>>>>>>> > >>>>>>>>> 2) As @Patrik mentioned, is there a good way we can expose the > >>>> total amount > >>>>>>>>> of memory and disk usage for each state store as well? I think > >>>> it would > >>>>>>>>> also be very helpful for users to understand their capacity > >>>> needs and read > >>>>>>>>> / write amplifications. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Guozhang > >>>>>>>>> > >>>>>>>>> On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna < > >>>> br...@confluent.io> wrote: > >>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I decided to go for the option in which metrics are exposed > >>>> for each > >>>>>>>>>> logical state store. I revisited the KIP correspondingly and > >>>> added a > >>>>>>>>>> section on how to aggregate metrics over multiple physical > >>>> RocksDB > >>>>>>>>>> instances within one logical state store. Would be great, if > >>>> you could > >>>>>>>>>> take a look and give feedback. If nobody has complaints about > >>>> the > >>>>>>>>>> chosen option I would proceed with voting on this KIP since > >>>> this was > >>>>>>>>>> the last open question. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Bruno > >>>>>>>>>> > >>>>>>>>>> On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl < > >>>> pklei...@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Sophie > >>>>>>>>>>> This will be a good change, I have been thinking about > >>>> proposing > >>>>>>>>>> something similar or even passing the properties per store. > >>>>>>>>>>> RocksDB should probably know how much memory was reserved > >>>> but maybe does > >>>>>>>>>> not expose it. > >>>>>>>>>>> We are limiting it already as you suggested but this is a > >>>> rather crude > >>>>>>>>>> tool. > >>>>>>>>>>> Especially in a larger topology with mixed loads par topic > >>>> it would be > >>>>>>>>>> helpful to get more insights which store puts a lot of load > >>> on > >>>> memory. > >>>>>>>>>>> Regarding the limiting capability, I think I remember > >>>> reading that those > >>>>>>>>>> only affect some parts of the memory and others can still > >>>> exceed this > >>>>>>>>>> limit. I‘ll try to look up the difference. > >>>>>>>>>>> Best regards > >>>>>>>>>>> Patrik > >>>>>>>>>>> > >>>>>>>>>>>> Am 07.06.2019 um 21:03 schrieb Sophie Blee-Goldman < > >>>>>>>>>> sop...@confluent.io>: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Patrik, > >>>>>>>>>>>> > >>>>>>>>>>>> As of 2.3 you will be able to use the RocksDBConfigSetter > >>>> to > >>>>>>>>>> effectively > >>>>>>>>>>>> bound the total memory used by RocksDB for a single app > >>>> instance. You > >>>>>>>>>>>> should already be able to limit the memory used per > >>>> rocksdb store, > >>>>>>>>>> though > >>>>>>>>>>>> as you mention there can be a lot of them. I'm not sure > >>>> you can > >>>>>>>>>> monitor the > >>>>>>>>>>>> memory usage if you are not limiting it though. > >>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Jun 7, 2019 at 2:06 AM Patrik Kleindl < > >>>> pklei...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi > >>>>>>>>>>>>> Thanks Bruno for the KIP, this is a very good idea. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have one question, are there metrics available for the > >>>> memory > >>>>>>>>>> consumption > >>>>>>>>>>>>> of RocksDB? > >>>>>>>>>>>>> As they are running outside the JVM we have run into > >>>> issues because > >>>>>>>>>> they > >>>>>>>>>>>>> were using all the other memory. > >>>>>>>>>>>>> And with multiple streams applications on the same > >>>> machine, each with > >>>>>>>>>>>>> several KTables and 10+ partitions per topic the number > >>>> of stores can > >>>>>>>>>> get > >>>>>>>>>>>>> out of hand pretty easily. > >>>>>>>>>>>>> Or did I miss something obvious how those can be > >>>> monitored better? > >>>>>>>>>>>>> > >>>>>>>>>>>>> best regards > >>>>>>>>>>>>> > >>>>>>>>>>>>> Patrik > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, 17 May 2019 at 23:54, Bruno Cadonna < > >>>> br...@confluent.io> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> this KIP describes the extension of the Kafka Streams' > >>>> metrics to > >>>>>>>>>> include > >>>>>>>>>>>>>> RocksDB's internal statistics. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Please have a look at it and let me know what you > >>> think. > >>>> Since I am > >>>>>>>>>> not a > >>>>>>>>>>>>>> RocksDB expert, I am thankful for any additional pair > >>> of > >>>> eyes that > >>>>>>>>>>>>>> evaluates this KIP. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> -- Guozhang > >>>> > >>>