Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-07-16 Thread John Roesler
Hi Patrik, Ah, I guess that was an important distinction! Thanks for clarifying. If there's a method to call on the RocksDB, then you probably want to consider just registering the metric as a Gauge that delegates to RocksDB to get the value, as opposed to the normal pattern of using a Sensor with

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-07-15 Thread Patrik Kleindl
Hi John Translated it wrong, I meant ‚might‘ instead of ‚may‘. If I find the time I‘ll take a look at the code how they could be added as metrics. Thanks for your input. Regards Patrik > Am 15.07.2019 um 18:03 schrieb John Roesler : > > Hey Patrik, > > Since KIP-471 is already accepted, and s

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-07-15 Thread John Roesler
Hey Patrik, Since KIP-471 is already accepted, and since this idea is not a trivial extension of the KIP, I think we'd need to do a new KIP. Some points to consider: these additions could not be made to the KeyValueStore interface, since they'd be only applicable to RocksDB-backed stores, but the

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-07-15 Thread Patrik Kleindl
Hi Adding this here because I asked and may have found the answer: The memory consumption may not be exposed as RocksDB metrics, but they should be available as properties of the RocksDB instance itself. RocksDBStore could easily make this available by allowing access to db.getProperty. Available d

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bill Bejeck
Hi Bruno, Just getting caught up on this KIP thread. Looks good to me, and I don't have any additional comments to what's already been presented. Thanks, Bill On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna wrote: > John and Guozhang, > > thank you for your comments. > > @Guozhang could you ple

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bruno Cadonna
John and Guozhang, thank you for your comments. @Guozhang could you please also vote on the voting thread so that we have all votes in one place. @John, the only situation I can think of where a non-uniform configuration of segments would make sense is to account for seasonality. But this would

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
One last thought. I think it makes sense what you propose for merging the metrics when a logical store is composed of multiple physical stores. The basic standard for these metrics is that they should be relevant to performance, and they should be controllable via configurations, specifically via

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Guozhang Wang
Bruno, thanks for the clarification. I agree that we should not rely on parsing strings to expose as metrics since they are 1) not very reliable and also 2) may evolve its format / representation over time. I think we can potentially add some documentations aligned with your explanations above to

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
Just taking a look over the metrics again, I had one thought... Stuff that happens in a background thread (like compaction metrics) can't directly identify compactions as a bottleneck from Streams' perspective. I.e., a DB might do a lot of compactions, but if those compactions never delay a write

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
Thanks for the updates. Personally, I'd be in favor of not going out on a limb with unsupported metrics APIs. We should take care to make sure that what we add in KIP-471 is stable and well supported, even if it's not the complete picture. We can always do follow-on work to tackle complex metrics

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bruno Cadonna
Hi Guozhang, Regarding your comments about the wiki page: 1) Exactly, I rephrased the paragraph to make it more clear. 2) Yes, I used the wrong term. All hit related metrics are ratios. I corrected the names of the affected metrics. Regarding your meta comments: 1) The plan is to expose the hi

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-18 Thread Guozhang Wang
Hello Bruno, I've read through the aggregation section and I think they look good to me. There are a few minor comments about the wiki page itself: 1) A state store might consist of multiple state stores -> You mean a `logical` state store be consistent of multiple `physical` store instances? 2)

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-14 Thread Bruno Cadonna
Hi, I decided to go for the option in which metrics are exposed for each logical state store. I revisited the KIP correspondingly and added a section on how to aggregate metrics over multiple physical RocksDB instances within one logical state store. Would be great, if you could take a look and gi

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-07 Thread Patrik Kleindl
Hi Sophie This will be a good change, I have been thinking about proposing something similar or even passing the properties per store. RocksDB should probably know how much memory was reserved but maybe does not expose it. We are limiting it already as you suggested but this is a rather crude too

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-07 Thread Sophie Blee-Goldman
Hi Patrik, As of 2.3 you will be able to use the RocksDBConfigSetter to effectively bound the total memory used by RocksDB for a single app instance. You should already be able to limit the memory used per rocksdb store, though as you mention there can be a lot of them. I'm not sure you can monito

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-07 Thread Patrik Kleindl
Hi Thanks Bruno for the KIP, this is a very good idea. I have one question, are there metrics available for the memory consumption of RocksDB? As they are running outside the JVM we have run into issues because they were using all the other memory. And with multiple streams applications on the sam

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-06 Thread Sophie Blee-Goldman
I'm not sure we can safely assume only the most recent segment is hot. Anything within the current window size is still being actively queried, and users can independently set windowSize and retentionPeriod as long as windowSize <= retentionPeriod. But the default segmentInterval is max(retentionPe

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-06 Thread Bruno Cadonna
Hi, I like the idea of just exposing the metrics of the latest segment. I think it gives the most realistic picture of the current operations on the segmented RocksDB without exposing implementation details. The cons of this approach is that during the switch to a new segment the values of some me

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-05 Thread Guozhang Wang
I think Bruno's 2) is that for a segmented store, the access rate on different segments will very likely be different. And in fact, most of the access should be on the "latest" segment unless 1) very late arrived data, which should be captured on the higher-level `lateness` metrics already, and 2)

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-04 Thread Sophie Blee-Goldman
Hey Bruno, I tend to agree with Guozhang on this matter although you do bring up some good points that should be addressed. Regarding 1) I think it is probably fairly uncommon in practice for users to leverage the individual store names passed to RocksDBConfigSetter#setConfig in order to specify o

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-04 Thread Bruno Cadonna
Hi Guozhang, After some thoughts, I tend to be in favour of the option with metrics for each physical RocksDB instance for the following reasons: 1) A user already needs to be aware of segmented state stores when providing a custom RocksDBConfigSetter. In RocksDBConfigSetter one can specify setti

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-30 Thread Guozhang Wang
Hi Bruno: Regarding 2) I think either way has some shortcomings: exposing the metrics per rocksDB instance for window / session stores exposed some implementation internals (that we use segmented stores) to enforce users to be aware of them. E.g. what if we want to silently change the internal imp

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-28 Thread Bruno Cadonna
Hi, Thank you for your comments. @Bill: 1. It is like Guozhang wrote: - rocksdb-state-id is for key-value stores - rocksdb-session-state-id is for session stores - rocksdb-window-state-id is for window stores These tags are defined in the corresponding store builders and I think it is a good ide

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-21 Thread Sophie Blee-Goldman
I definitely agree with Guozhang's "meta" comment: if it's possible to allow users to pick and choose individual RocksDB metrics that would be ideal. One further question is whether these will be debug or info level metrics, or a separate level altogether? If there is a nontrivial overhead associat

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-21 Thread Guozhang Wang
Hello Bruno, Thanks for the KIP, I have a few minor comments and a meta one which are relatively aligned with other folks: Minor: 1) Regarding the "rocksdb-state-id = [store ID]", to be consistent with other state store metrics (see https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Au

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-21 Thread Dongjin Lee
Hi Bruno, I just read the KIP. I think this feature is great. As far as I know, most Kafka users monitor the host resources, JVM resources, and Kafka metrics only, not RocksDB for configuring the statistics feature is a little bit tiresome. Since RocksDB impacts the performance of Kafka Streams, I

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-20 Thread John Roesler
Hi Bruno, Looks really good overall. This is going to be an awesome addition. My only thought was that we have "bytes-flushed-(rate|total) and flush-time-(avg|min|max)" metrics, and the description states that these are specifically for Memtable flush operations. What do you think about calling i

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-20 Thread Bill Bejeck
Hi Bruno, Thanks for the KIP, this will be a useful addition. Overall the KIP looks good to me, and I have two minor comments. 1. For the tags should, I'm wondering if rocksdb-state-id should be rocksdb-store-id instead? 2. With the compaction metrics, would it be possible to add total compacti

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-20 Thread Bruno Cadonna
Hi Sophie, Thank you for your comments. It's a good idea to supplement the metrics with configuration option to change the metrics. I also had some thoughts about it. However, I think I need some experimentation to get this right. I added the block cache hit rates for index and filter blocks to

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-17 Thread Sophie Blee-Goldman
Actually I wonder if it might be useful to users to be able to break up the cache hit stats by type? Some people may choose to store index and filter blocks alongside data blocks, and it would probably be very helpful for them to know who is making more effective use of the cache in order to tune h

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-17 Thread Sophie Blee-Goldman
Hey Bruno! This all looks pretty good to me, but one suggestion I have is to supplement each of the metrics with some info on how the user can control them. In other words, which options could/should they set in RocksDBConfigSetter should they discover a particular bottleneck? I don't think this

[DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-05-17 Thread Bruno Cadonna
Hi all, this KIP describes the extension of the Kafka Streams' metrics to include RocksDB's internal statistics. Please have a look at it and let me know what you think. Since I am not a RocksDB expert, I am thankful for any additional pair of eyes that evaluates this KIP. https://cwiki.apache.o