Hi John Translated it wrong, I meant ‚might‘ instead of ‚may‘. If I find the time I‘ll take a look at the code how they could be added as metrics. Thanks for your input. Regards Patrik
> Am 15.07.2019 um 18:03 schrieb John Roesler <j...@confluent.io>: > > Hey Patrik, > > Since KIP-471 is already accepted, and since this idea is not a > trivial extension of the KIP, I think we'd need to do a new KIP. > > Some points to consider: these additions could not be made to the > KeyValueStore interface, since they'd be only applicable to > RocksDB-backed stores, but there is no RocksDBKeyValueStore<K,V>, > which could be used to query the property. Rather, the rocks interface > is wrapped inside multiple layers that comprise the KeyValueStore. > Since these layers themselves are not public interfaces, it would be a > challenge to think of how to get access to these methods from > user-space code. > > On the other hand, could we just add new metrics to access those > properties? Offhand, it's unclear to me why you said they "may not be > exposed". > > Thanks, > -John > >> On Mon, Jul 15, 2019 at 5:11 AM Patrik Kleindl <pklei...@gmail.com> wrote: >> >> Hi >> Adding this here because I asked and may have found the answer: >> The memory consumption may not be exposed as RocksDB metrics, but they >> should be available as properties of the RocksDB instance itself. >> RocksDBStore could easily make this available by allowing access >> to db.getProperty. >> Available data (examples): >> rocksdb.block-cache-usage >> rocksdb.estimate-table-readers-mem >> rocksdb.cur-size-all-mem-tables >> >> Reference: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB >> and >> found the names of the properties here >> https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h >> >> Would it make sense to allow this kind of access? >> Should this be part of a follow-up KIP? >> >> best regards >> >> Patrik >> >>> On Wed, 19 Jun 2019 at 21:55, Bill Bejeck <bbej...@gmail.com> wrote: >>> >>> Hi Bruno, >>> >>> Just getting caught up on this KIP thread. Looks good to me, and I don't >>> have any additional comments to what's already been presented. >>> >>> Thanks, >>> Bill >>> >>>> On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna <br...@confluent.io> wrote: >>>> >>>> John and Guozhang, >>>> >>>> thank you for your comments. >>>> >>>> @Guozhang could you please also vote on the voting thread so that we >>>> have all votes in one place. >>>> >>>> @John, the only situation I can think of where a non-uniform >>>> configuration of segments would make sense is to account for >>>> seasonality. But this would be a really advanced configuration IMO. >>>> >>>> Best, >>>> Bruno >>>> >>>>> On Wed, Jun 19, 2019 at 7:18 PM John Roesler <j...@confluent.io> wrote: >>>>> >>>>> One last thought. I think it makes sense what you propose for merging >>>>> the metrics when a logical store is composed of multiple physical >>>>> stores. >>>>> >>>>> The basic standard for these metrics is that they should be relevant >>>>> to performance, and they should be controllable via configurations, >>>>> specifically via RocksDBConfigSetter. The config setter takes as input >>>>> the store name. For Key/Value stores, this name is visible in the tags >>>>> as "rocksdb-state-id", so the workflow is that I notice (e.g.) the >>>>> metric rocksdb-state-id=STORE_X is showing a low block cache hit >>>>> ratio, so I add a condition in my config setter to increase the block >>>>> cache size for stores named STORE_X. >>>>> >>>>> For this case where the logical store has multiple physical stores, >>>>> we're really talking about segmented stores, window stores and >>>>> segmented stores. In these stores, every segment has a different name >>>>> (logicalStoreName + "." + segmentId * segmentInterval), so the config >>>>> setter would need to use a prefix match with respect to the metric tag >>>>> (e.g.) rocksdb-window-state-id. But at the same time, this is totally >>>>> doable. >>>>> >>>>> It's also perfectly reasonable, since segments get rotated out all the >>>>> time, it's implausible that you'd ever want a non-uniform >>>>> configuration over the segments in a store. For this reason, >>>>> specifically, it makes more sense just to summarize the metrics than >>>>> to present them individually. Might be worth documenting the >>>>> relationship, though. >>>>> >>>>> Thanks again, I'll vote now, >>>>> -John >>>>> >>>>> On Wed, Jun 19, 2019 at 12:03 PM John Roesler <j...@confluent.io> >>> wrote: >>>>>> >>>>>> Just taking a look over the metrics again, I had one thought... >>>>>> >>>>>> Stuff that happens in a background thread (like compaction metrics) >>>>>> can't directly identify compactions as a bottleneck from Streams' >>>>>> perspective. I.e., a DB might do a lot of compactions, but if those >>>>>> compactions never delay a write or read, then they cannot be a >>>>>> bottleneck. >>>>>> >>>>>> Thus, the "stall" metric should be the starting point for bottleneck >>>>>> identification, and then the flush/compaction metrics can be used to >>>>>> secondarily identify what to do to relieve the bottleneck. >>>>>> >>>>>> This doesn't affect the metrics you proposed, but I'd suggest saying >>>>>> something to this effect in whatever documentation or descriptions we >>>>>> provide. >>>>>> >>>>>> Thanks, >>>>>> -John >>>>>> >>>>>> On Wed, Jun 19, 2019 at 11:25 AM John Roesler <j...@confluent.io> >>>> wrote: >>>>>>> >>>>>>> Thanks for the updates. >>>>>>> >>>>>>> Personally, I'd be in favor of not going out on a limb with >>>>>>> unsupported metrics APIs. We should take care to make sure that >>> what >>>>>>> we add in KIP-471 is stable and well supported, even if it's not >>> the >>>>>>> complete picture. We can always do follow-on work to tackle complex >>>>>>> metrics as an isolated design exercise. >>>>>>> >>>>>>> Just my two cents. >>>>>>> Thanks, >>>>>>> -John >>>>>>> >>>>>>> On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna <br...@confluent.io> >>>> wrote: >>>>>>>> >>>>>>>> Hi Guozhang, >>>>>>>> >>>>>>>> Regarding your comments about the wiki page: >>>>>>>> >>>>>>>> 1) Exactly, I rephrased the paragraph to make it more clear. >>>>>>>> >>>>>>>> 2) Yes, I used the wrong term. All hit related metrics are >>> ratios. >>>> I >>>>>>>> corrected the names of the affected metrics. >>>>>>>> >>>>>>>> Regarding your meta comments: >>>>>>>> >>>>>>>> 1) The plan is to expose the hit ratio. I used the wrong term. >>> The >>>>>>>> formulas compute ratios. Regarding your question about a metric >>> to >>>>>>>> know from where a read is served when it is not in the memtable, >>>> there >>>>>>>> are metrics in RocksDB that give you the number of get() queries >>>> that >>>>>>>> are served from L0, L1, and L2_AND_UP. I could not find any >>> metric >>>>>>>> that give you information about whether a query was served from >>>> disk >>>>>>>> vs. OS cache. One metric that could be used to indirectly measure >>>>>>>> whether disk or OS cache is accessed seems to be >>>> READ_BLOCK_GET_MICROS >>>>>>>> that gives you the time for an IO read of a block. If it is high, >>>> it >>>>>>>> was read from disk, otherwise from the OS cache. A similar >>>> strategy to >>>>>>>> monitor the performance is described in [1]. DISCLAIMER: >>>>>>>> READ_BLOCK_GET_MICROS is not documented. I had to look into the >>> C++ >>>>>>>> code to understand its meaning. I could have missed something. >>>>>>>> >>>>>>>> 2) There are some additional compaction statistics that contain >>>> sizes >>>>>>>> of files on disk and numbers about write amplification that you >>> can >>>>>>>> get programmatically in RocksDB, but they are for debugging >>>> purposes >>>>>>>> [2]. To get this data and publish it into a metric, one has to >>>> parse a >>>>>>>> string. Since this data is for debugging purposes, I do not know >>>> how >>>>>>>> stable the output format is. One thing, we could do, is to dump >>> the >>>>>>>> string with the compaction statistics into our log files at DEBUG >>>>>>>> level. But that is outside of the scope of this KIP. >>>>>>>> >>>>>>>> Best, >>>>>>>> Bruno >>>>>>>> >>>>>>>> [1] >>>> >>> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency >>>>>>>> [2] >>>> >>> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics >>>>>>>> >>>>>>>> On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang < >>> wangg...@gmail.com> >>>> wrote: >>>>>>>>> >>>>>>>>> Hello Bruno, >>>>>>>>> >>>>>>>>> I've read through the aggregation section and I think they look >>>> good to me. >>>>>>>>> There are a few minor comments about the wiki page itself: >>>>>>>>> >>>>>>>>> 1) A state store might consist of multiple state stores -> You >>>> mean a >>>>>>>>> `logical` state store be consistent of multiple `physical` >>> store >>>> instances? >>>>>>>>> >>>>>>>>> 2) The "Hit Rates" calculation seems to be referring to the >>> `Hit >>>> Ratio` >>>>>>>>> (which is a percentage) than `Hit Rate`? >>>>>>>>> >>>>>>>>> And a couple further meta comments: >>>>>>>>> >>>>>>>>> 1) For memtable / block cache, instead of the hit-rate do you >>>> think we >>>>>>>>> should expose the hit-ratio? I felt it is more useful for users >>>> to debug >>>>>>>>> what's the root cause of unexpected slow performance. >>>>>>>>> >>>>>>>>> And for block cache misses, is it easy to provide a metric as >>> of >>>> "target >>>>>>>>> read" of where a read is served (from which level, either in OS >>>> cache or in >>>>>>>>> SST files), similar to Fig.11 in >>>>>>>>> http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf? >>>>>>>>> >>>>>>>>> 2) As @Patrik mentioned, is there a good way we can expose the >>>> total amount >>>>>>>>> of memory and disk usage for each state store as well? I think >>>> it would >>>>>>>>> also be very helpful for users to understand their capacity >>>> needs and read >>>>>>>>> / write amplifications. >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna < >>>> br...@confluent.io> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I decided to go for the option in which metrics are exposed >>>> for each >>>>>>>>>> logical state store. I revisited the KIP correspondingly and >>>> added a >>>>>>>>>> section on how to aggregate metrics over multiple physical >>>> RocksDB >>>>>>>>>> instances within one logical state store. Would be great, if >>>> you could >>>>>>>>>> take a look and give feedback. If nobody has complaints about >>>> the >>>>>>>>>> chosen option I would proceed with voting on this KIP since >>>> this was >>>>>>>>>> the last open question. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Bruno >>>>>>>>>> >>>>>>>>>> On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl < >>>> pklei...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Sophie >>>>>>>>>>> This will be a good change, I have been thinking about >>>> proposing >>>>>>>>>> something similar or even passing the properties per store. >>>>>>>>>>> RocksDB should probably know how much memory was reserved >>>> but maybe does >>>>>>>>>> not expose it. >>>>>>>>>>> We are limiting it already as you suggested but this is a >>>> rather crude >>>>>>>>>> tool. >>>>>>>>>>> Especially in a larger topology with mixed loads par topic >>>> it would be >>>>>>>>>> helpful to get more insights which store puts a lot of load >>> on >>>> memory. >>>>>>>>>>> Regarding the limiting capability, I think I remember >>>> reading that those >>>>>>>>>> only affect some parts of the memory and others can still >>>> exceed this >>>>>>>>>> limit. I‘ll try to look up the difference. >>>>>>>>>>> Best regards >>>>>>>>>>> Patrik >>>>>>>>>>> >>>>>>>>>>>> Am 07.06.2019 um 21:03 schrieb Sophie Blee-Goldman < >>>>>>>>>> sop...@confluent.io>: >>>>>>>>>>>> >>>>>>>>>>>> Hi Patrik, >>>>>>>>>>>> >>>>>>>>>>>> As of 2.3 you will be able to use the RocksDBConfigSetter >>>> to >>>>>>>>>> effectively >>>>>>>>>>>> bound the total memory used by RocksDB for a single app >>>> instance. You >>>>>>>>>>>> should already be able to limit the memory used per >>>> rocksdb store, >>>>>>>>>> though >>>>>>>>>>>> as you mention there can be a lot of them. I'm not sure >>>> you can >>>>>>>>>> monitor the >>>>>>>>>>>> memory usage if you are not limiting it though. >>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jun 7, 2019 at 2:06 AM Patrik Kleindl < >>>> pklei...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi >>>>>>>>>>>>> Thanks Bruno for the KIP, this is a very good idea. >>>>>>>>>>>>> >>>>>>>>>>>>> I have one question, are there metrics available for the >>>> memory >>>>>>>>>> consumption >>>>>>>>>>>>> of RocksDB? >>>>>>>>>>>>> As they are running outside the JVM we have run into >>>> issues because >>>>>>>>>> they >>>>>>>>>>>>> were using all the other memory. >>>>>>>>>>>>> And with multiple streams applications on the same >>>> machine, each with >>>>>>>>>>>>> several KTables and 10+ partitions per topic the number >>>> of stores can >>>>>>>>>> get >>>>>>>>>>>>> out of hand pretty easily. >>>>>>>>>>>>> Or did I miss something obvious how those can be >>>> monitored better? >>>>>>>>>>>>> >>>>>>>>>>>>> best regards >>>>>>>>>>>>> >>>>>>>>>>>>> Patrik >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 17 May 2019 at 23:54, Bruno Cadonna < >>>> br...@confluent.io> >>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> this KIP describes the extension of the Kafka Streams' >>>> metrics to >>>>>>>>>> include >>>>>>>>>>>>>> RocksDB's internal statistics. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please have a look at it and let me know what you >>> think. >>>> Since I am >>>>>>>>>> not a >>>>>>>>>>>>>> RocksDB expert, I am thankful for any additional pair >>> of >>>> eyes that >>>>>>>>>>>>>> evaluates this KIP. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>> Bruno >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- Guozhang >>>> >>>