Hi Jay, 1. Agree with your assessment. Let me know when you start writing the metrics library to rule them all, I'm interested :-)
2. If I understood you correctly, you need to give an indication of what range of values you expect for a metric (e.g. a latency might be in log-scale of buckets between 0.1ms and 30,000ms) -- that's what I meant with distribution. Or did I get it wrong? There are a bunch of interesting algorithms for estimating percentiles with a small memory footprint. This is probably getting too far into yak-shaving territory, but just in case it's useful, this short literature survey may be useful: Chiranjeeb Buragohain and Subhash Suri: "Quantiles on Streams" in Encyclopedia of Database Systems, Springer, pp 2235–2240, 2009. ISBN: 978-0-387-35544-3 http://www.cs.ucsb.edu/~suri/psdir/ency.pdf 3. I don't know all the purposes for which quotas will be used, but instinctively your approach sounds good to me. I would be inclined to increase N a bit (perhaps to 4 or 5), to reduce the uncertainty introduced by the incomplete window, if memory usage allows. 5. Looking into open-sourcing it. I will also take a look at your code. Best, Martin On 22 Feb 2014, at 18:53, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey Martin, > > Thanks for the great feedback. > > 1. I agree with the problems of mixing moving window statistics with fixed > window statistics. That was one of my rationales. The other is that > weighted statistics are very unintuitive for people compared to simple > things like averages and percentiles so they fail a bit as an intuitive > monitoring mechanism. I actually think the moving windows are technically > superior since they don't have hard boundaries, but a naive implementation > based solely on events is actually totally wrong for the reasons you > describe, the weighting needs to take into account the point in time of the > estimate in its contribution to the average. This is an interesting problem > and I started to think about it but then decided that if I kept thinking > about it I would never get anything finished. When I retire I plan to write > a metrics library based solely on continuously weighted averages. :-) > > 2. Fair. To be clear you needn't encode a distribution, just your > preference about accuracy in the measurement. You are saying "I care > equally about accuracy in the whole range" or "I don't care about fine > grained accuracy when the numbers themselves are large". > > 3. The reason the exception is good is because the actual quota may be low > down in some part of the system, but a quota violation always needs to > unwind all the way back up to the API layer to return the error to the > client. So an exception is actually just what you need because the catch > will actually potentially be in a different place than the record() call. > This let's you introduce quotaing without each subsystem really needing to > know about it. > > Your point about whether or not you should count the current event when a > quota violation occurs is a good one. I actually think the right answer > depends on the details of how you handle windowing. For example one > approach to windowing I have seen is to use the most recent COMPLETE window > as the estimate while you fill up the current window. In this model then > with a 30 second window the estimate you give out is always 0-30 seconds > old. In this case you have a real problem with quotas because once the > previous window is filled and you are in violation of your quota you will > keep throwing exceptions regardless of the client behavior for the duration > of the next window. But worse if you aren't counting the requests that got > rejected then even though the client behavior is still bad your next window > will record no values (because you rejected them all as quota violations). > This is clearly a mess. > > But that isn't quite how I'm doing windowing. The way I do it is I always > keep N windows, with the last window being partial) and the estimate is > overall all windows. So with N=2 (the default) when you complete the > current window the previous window is cleared and used for to record the > new values. The downside of this is that with a 30 second window and N=2 > your estimate is based on anything from 30 seconds to 60 seconds. The > upside is that the most recent data is always included. I feel this is > inherently important for monitoring. But it is particularly important for > Quotas. In this case I feel that it is always the right thing to NOT count > rejected measurements. Not that in this model let's say that the user goes > over their quota and stays that way for a sustained period of time. The > impact will not be the seesaw behavior I described where we reject all then > none of their requests, instead we will reject enough requests to keep them > under their quota. > > 5. I would definitely be interested to see the code if it is open source, > since I am interested in metrics. Overall since you went down this path I > would be interested to get your opinion on my code. If you think what you > did is better I would be open to discussing it as a third alternative too. > If we decide we do want to use this code for metrics then we may want to > implement a sampling histogram either in addition to or as a replacement > for the existing histograms and if you were up to contribute your > implementation that would be great. > > -Jay > > > On Sat, Feb 22, 2014 at 9:25 AM, Martin Kleppmann > <mkleppm...@linkedin.com>wrote: > >> Not sure if you want yet another opinion added to the pile -- but since I >> had a similar problem on another project recently, I thought I'd weigh in. >> (On that project we were originally using Coda's library, but then switched >> to rolling our own metrics implementation because we needed to do a few >> things differently.) >> >> 1. Problems we encountered with Coda's library: it uses an >> exponentially-weighted moving average (EMWA) for rates (eg. messages/sec), >> and exponentially biased reservoir sampling for histograms (percentiles, >> averages). Those methods of calculation work well for events with a >> consistently high volume, but they give strange and misleading results for >> events that are bursty or rare (eg error rates). We found that a fixed-size >> window gives more predictable, easier-to-interpret results. >> >> 2. In defence of Coda's library, I think its histogram implementation is a >> good trade-off of memory for accuracy; I'm not totally convinced that your >> proposal (counts of events in a fixed set of buckets) would be much better. >> Would have to do some math to work out the expected accuracy in each case. >> The reservoir sampling can be configured to use a smaller sample if the >> default of 1028 samples is too expensive. Reservoir sampling also has the >> advantage that you don't need to hard-code a bucket distribution. >> >> 3. Quotas are an interesting use case. However, I'm not wild about using a >> QuotaViolationException for control flow -- I think an explicit conditional >> would be nicer than having to catch an exception. One question in that >> context: if a quota is exceeded, do you still want to count the event >> towards the metric, or do you want to stop counting it until the quota is >> replenished? The answer may depend on the particular metric. >> >> 4. If you decide to go with Coda's library, I would advocate isolating the >> dependency into a separate module and using it via a facade -- somewhat >> like using SLF4J instead of Log4j directly. It's ok for Coda's library to >> be the default metrics implementation, but it should be easy to swap it out >> for something different in case someone has a version conflict or differing >> requirements. The facade should be at a low level (individual events), not >> at the reporter level (which deals with pre-aggregated values, and is >> already pluggable). >> >> 5. If it's useful, I can probably contribute my simple (but imho >> effective) metrics library, for embedding into Kafka. It uses reservoir >> sampling for percentiles, like Coda's library, but uses a fixed-size window >> instead of an exponential bias, which avoids weird behaviour on bursty >> metrics. >> >> In summary, I would advocate one of the following approaches: >> - Coda Hale library via facade (allowing it to be swapped for something >> else), or >> - Own metrics implementation, provided that we have confidence in its >> implementation of percentiles. >> >> Martin >> >> >> On 22 Feb 2014, at 01:06, Jay Kreps <jay.kr...@gmail.com> wrote: >>> Hey guys, >>> >>> Just picking up this thread again. I do want to drive a conclusion as I >>> will run out of work to do on the producer soon and will need to add >>> metrics of some sort. We can vote on it, but I'm not sure if we actually >>> got everything discussed. >>> >>> Joel, I wasn't fully sure how to interpret your comment. I think you are >>> saying you are cool with the new metrics package as long as it really is >>> better. Do you have any comment on whether you think the benefits I >>> outlined are worth it? I agree with you that we could hold off on a >> second >>> repo until someone else would actually want to use our code. >>> >>> Jun, I'm not averse to doing a sampling-based histogram and doing some >>> comparison between the two approaches if you think this approach is >>> otherwise better. >>> >>> Sriram, originally I thought you preferred just sticking to Coda Hale, >> but >>> after your follow-up email I wasn't really sure... >>> >>> Joe/Clark, yes this code allows pluggable reporting so you could have a >>> metrics reporter that just wraps each metric in a Coda Hale Gauge if that >>> is useful. Though obviously if enough people were doing that I would >> think >>> it would be worth just using the Coda Hale package directly... >>> >>> -Jay >>> >>> >>> >>> >>> On Thu, Feb 13, 2014 at 3:34 PM, Clark Breyman <cl...@breyman.com> >> wrote: >>> >>>> Not requiring the client to link Coda/Yammer metrics sounds like a >>>> compelling reason to pivot to new interfaces. If that's the agreed >>>> direction, I'm hoping that we'd get the choice of backend to provide >> (e.g. >>>> facade on Yammer metrics for those with an investment in that) rather >> than >>>> force the new backend. Having a metrics factory seems better for this >> than >>>> directly instantiating the singleton registry. >>>> >>>> >>>> On Thu, Feb 13, 2014 at 2:39 PM, Joe Stein <joe.st...@stealth.ly> >> wrote: >>>> >>>>> Can we leave metrics and have multiple supported KafkaMetricsGroup >>>>> implementing a yammer based implementation? >>>>> >>>>> ProducerRequestStats with your configured analytics group? >>>>> >>>>> On Thu, Feb 13, 2014 at 11:37 AM, Jay Kreps <jay.kr...@gmail.com> >> wrote: >>>>> >>>>>> I think we discussed the scala/java stuff more fully previously. >>>>>> Essentially the client is embedded everywhere. Scala is very >>>> incompatible >>>>>> with itself so this makes it very hard to use for people using >> anything >>>>>> else in scala. Also Scala stack traces are very confusing. Basically >> we >>>>>> thought plain java code would be a lot easier for people to use. Even >>>> if >>>>>> Scala is more fun to write, that isn't really what we are optimizing >>>> for. >>>>>> >>>>>> -Jay >>>>>> >>>>>> >>>>>> On Thu, Feb 13, 2014 at 8:09 AM, S Ahmed <sahmed1...@gmail.com> >> wrote: >>>>>> >>>>>>> Jay, pretty impressive how you just write a 'quick version' like that >>>>> :) >>>>>>> Not to get off-topic but why didn't you write this in scala? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 12, 2014 at 6:54 PM, Joel Koshy <jjkosh...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> I have not had a chance to review the new metrics code and its >>>>>>>> features carefully (apart from your write-up), but here are my >>>>> general >>>>>>>> thoughts: >>>>>>>> >>>>>>>> Implementing a metrics package correctly is difficult; more so for >>>>>>>> people like me, because I'm not a statistician. However, if this >>>> new >>>>>>>> package: {(i) functions correctly (and we need to define and prove >>>>>>>> correctness), (ii) is easy to use, (iii) serves all our current and >>>>>>>> anticipated monitoring needs, (iv) is not overly complex that it >>>>>>>> becomes a burden to maintain and we are better of with an available >>>>>>>> library;} then I think it makes sense to embed it and use it within >>>>>>>> the Kafka code. The main wins are: (i) predictability (no changing >>>>>>>> APIs and intimate knowledge of the code) and (ii) control with >>>>> respect >>>>>>>> to both functionality (e.g., there are hard-coded decay constants >>>> in >>>>>>>> metrics-core 2.x) and correctness (i.e., if we find a bug in the >>>>>>>> metrics package we have to submit a pull request and wait for it to >>>>>>>> become mainstream). I'm not sure it would help very much to pull >>>> it >>>>>>>> into a separate repo because that could potentially annul these >>>>>>>> benefits. >>>>>>>> >>>>>>>> Joel >>>>>>>> >>>>>>>> On Wed, Feb 12, 2014 at 02:50:43PM -0800, Jay Kreps wrote: >>>>>>>>> Sriram, >>>>>>>>> >>>>>>>>> Makes sense. I am cool moving this stuff into its own repo if >>>>> people >>>>>>>> think >>>>>>>>> that is better. I'm not sure it would get much contribution but >>>>> when >>>>>> I >>>>>>>>> started messing with this I did have a lot of grand ideas of >>>> making >>>>>>>> adding >>>>>>>>> metrics to a sensor dynamic so you could add more stuff in >>>>>>> real-time(via >>>>>>>>> jmx, say) and/or externalize all your metrics and config to a >>>>>> separate >>>>>>>> file >>>>>>>>> like log4j with only the points of instrumentation hard-coded. >>>>>>>>> >>>>>>>>> -Jay >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Feb 12, 2014 at 2:07 PM, Sriram Subramanian < >>>>>>>>> srsubraman...@linkedin.com> wrote: >>>>>>>>> >>>>>>>>>> I am actually neutral to this change. I found the replies were >>>>> more >>>>>>>>>> towards the implementation and features so far. I would like >>>> the >>>>>>>> community >>>>>>>>>> to think about the questions below before making a decision. My >>>>>>>> opinion on >>>>>>>>>> this is that it has potential to be its own project and it >>>> would >>>>>>>> attract >>>>>>>>>> developers who are specifically interested in contributing to >>>>>>> metrics. >>>>>>>> I >>>>>>>>>> am skeptical that the Kafka contributors would focus on >>>> improving >>>>>>> this >>>>>>>>>> library (apart from bug fixes) instead of >>>> developing/contributing >>>>>> to >>>>>>>> other >>>>>>>>>> core pieces. It would be useful to continue and keep it >>>> decoupled >>>>>>> from >>>>>>>>>> rest of Kafka (if it resides in the Kafka code base.) so that >>>> we >>>>>> can >>>>>>>> move >>>>>>>>>> it out anytime to its own project. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2/12/14 1:21 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Sriram, >>>>>>>>>>> >>>>>>>>>>> Not sure if these are actually meant as questions or more >>>> veiled >>>>>>>> comments. >>>>>>>>>>> In an case I tried to give my 2 cents inline. >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 11, 2014 at 11:12 PM, Sriram Subramanian < >>>>>>>>>>> srsubraman...@linkedin.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I think answering the questions below would help to make a >>>>>> better >>>>>>>>>>>> decision. I am all for writing better code and having >>>> superior >>>>>>>>>>>> functionalities but it is worth thinking about stuff outside >>>>>> just >>>>>>>> code >>>>>>>>>>>> in >>>>>>>>>>>> this case - >>>>>>>>>>>> >>>>>>>>>>>> 1. Does metric form a core piece of kafka? Does it help >>>> kafka >>>>>>>> greatly in >>>>>>>>>>>> providing better core functionalities? I would always like a >>>>>>>> project to >>>>>>>>>>>> do >>>>>>>>>>>> one thing really well. Metrics is a non trivial amount of >>>>> code. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Metrics are obviously important, and obviously improving our >>>>>> metrics >>>>>>>>>>> system >>>>>>>>>>> would be good. That said this may or may not be better, and >>>> even >>>>>> if >>>>>>>> it is >>>>>>>>>>> better that betterness might not outweigh other >>>> considerations. >>>>>> That >>>>>>>> is >>>>>>>>>>> what we are discussing. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2. Does it make sense to be part of Kafka or its own >>>> project? >>>>> If >>>>>>>> this >>>>>>>>>>>> metrics library has the potential to be better than >>>>>> metrics-core, >>>>>>> I >>>>>>>>>>>> would >>>>>>>>>>>> be interested in other projects take advantage of it. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It could be either. >>>>>>>>>>> >>>>>>>>>>> 3. Can Kafka maintain this library as new members join and old >>>>>>> members >>>>>>>>>>>> leave? Would this be a piece of code that no one (in Kafka) >>>> in >>>>>> the >>>>>>>>>>>> future >>>>>>>>>>>> spends time improving if the original author left? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am not going anywhere in the near term, but if I did, yes, >>>>> this >>>>>>>> would be >>>>>>>>>>> like any other code we have. As with yammer metrics or any >>>> other >>>>>>> code >>>>>>>> at >>>>>>>>>>> that point we would either use it as is or someone would >>>> improve >>>>>> it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 4. Does it affect the schedule of producer rewrite? This >>>> needs >>>>>> its >>>>>>>> own >>>>>>>>>>>> stabilization and modification to existing metric dashboards >>>>> if >>>>>>> the >>>>>>>>>>>> format >>>>>>>>>>>> is changed. Many times such cost are not factored in and a >>>>>> project >>>>>>>> loses >>>>>>>>>>>> time before realizing the extra time required to make a >>>>> library >>>>>> as >>>>>>>> this >>>>>>>>>>>> operational. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Probably not. The metrics are going to change regardless of >>>>>> whether >>>>>>>> we use >>>>>>>>>>> the same library or not. If we think this is better I don't >>>> mind >>>>>>>> putting >>>>>>>>>>> in >>>>>>>>>>> a little extra effort to get there. >>>>>>>>>>> >>>>>>>>>>> Irrespective I think this is probably not the right thing to >>>>>>> optimize >>>>>>>> for. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I am sure we can do better when we write code to a specific >>>>> use >>>>>>>> case (in >>>>>>>>>>>> this case, kafka) rather than building a generic library >>>> that >>>>>>> suits >>>>>>>> all >>>>>>>>>>>> (metrics-core) but I would like us to have answers to the >>>>>>> questions >>>>>>>>>>>> above >>>>>>>>>>>> and be prepared before we proceed to support this with the >>>>>>> producer >>>>>>>>>>>> rewrite. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Naturally we are all considering exactly these things, that is >>>>>>>> exactly the >>>>>>>>>>> reason I started the thread. >>>>>>>>>>> >>>>>>>>>>> -Jay >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 2/11/14 6:28 PM, "Jun Rao" <jun...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the detailed write-up. It's well thought >>>> through. >>>>> A >>>>>>> few >>>>>>>>>>>>> comments: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. I have a couple of concerns on the percentiles. The >>>> first >>>>>>> issue >>>>>>>> is >>>>>>>>>>>> that >>>>>>>>>>>>> It requires the user to know the value range. Since the >>>> range >>>>>> for >>>>>>>>>>>> things >>>>>>>>>>>>> like message size (in millions) is quite different from >>>> those >>>>>>> like >>>>>>>>>>>> request >>>>>>>>>>>>> time (less than 100), it's going to be hard to pick a good >>>>>> global >>>>>>>>>>>> default >>>>>>>>>>>>> range. Different apps could be dealing with different >>>> message >>>>>>>> size. So >>>>>>>>>>>>> they >>>>>>>>>>>>> probably will have to customize the range. Another issue is >>>>>> that >>>>>>>> it can >>>>>>>>>>>>> only report values at the bucket boundaries. So, if you >>>> have >>>>>> 1000 >>>>>>>>>>>> buckets >>>>>>>>>>>>> and a value range of 1 million, you will only see 1000 >>>>> possible >>>>>>>> values >>>>>>>>>>>> as >>>>>>>>>>>>> the quantile, which is probably too sparse. The >>>>> implementation >>>>>> of >>>>>>>>>>>>> histogram >>>>>>>>>>>>> in metrics-core keeps a fix size of samples, which avoids >>>>> both >>>>>>>> issues. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. We need to document the 3-part metrics names better >>>> since >>>>>> it's >>>>>>>> not >>>>>>>>>>>>> obvious what the convention is. Also, currently the name of >>>>> the >>>>>>>> sensor >>>>>>>>>>>> and >>>>>>>>>>>>> the metrics defined in it are independent. Would it make >>>>> sense >>>>>> to >>>>>>>> have >>>>>>>>>>>> the >>>>>>>>>>>>> sensor name be a prefix of the metric name? >>>>>>>>>>>>> >>>>>>>>>>>>> Overall, this approach seems to be cleaner than >>>> metrics-core >>>>> by >>>>>>>>>>>> decoupling >>>>>>>>>>>>> measuring and reporting. The main benefit of metrics-core >>>>> seems >>>>>>> to >>>>>>>> be >>>>>>>>>>>> the >>>>>>>>>>>>> existing reporters. Since not that many people voted for >>>>>>>> metrics-core, >>>>>>>>>>>> I >>>>>>>>>>>>> am >>>>>>>>>>>>> ok with going with the new implementation. My only >>>>>> recommendation >>>>>>>> is to >>>>>>>>>>>>> address the concern on percentiles. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Jun >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 6, 2014 at 12:51 PM, Jay Kreps < >>>>>> jay.kr...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wanted to kick off a quick discussion of metrics with >>>>>> respect >>>>>>>> to >>>>>>>>>>>> the >>>>>>>>>>>>>> new >>>>>>>>>>>>>> producer and consumer (and potentially the server). >>>>>>>>>>>>>> >>>>>>>>>>>>>> At a high level I think there are three approaches we >>>> could >>>>>>> take: >>>>>>>>>>>>>> 1. Plain vanilla JMX >>>>>>>>>>>>>> 2. Use Coda Hale (AKA Yammer) Metrics >>>>>>>>>>>>>> 3. Do our own metrics (with JMX as one output) >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Has the advantage that JMX is the most commonly used >>>>> java >>>>>>>> thing >>>>>>>>>>>> and >>>>>>>>>>>>>> plugs in reasonably to most metrics systems. JMX is >>>>> included >>>>>> in >>>>>>>> the >>>>>>>>>>>> JDK >>>>>>>>>>>>>> so >>>>>>>>>>>>>> it doesn't impose any additional dependencies on clients. >>>>> It >>>>>>> has >>>>>>>> the >>>>>>>>>>>>>> disadvantage that plain vanilla JMX is a pain to use. We >>>>>> would >>>>>>>> need a >>>>>>>>>>>>>> bunch >>>>>>>>>>>>>> of helper code for maintaining counters to make this >>>>>>> reasonable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Coda Hale metrics is pretty good and broadly used. It >>>>>>>> supports JMX >>>>>>>>>>>>>> output as well as direct output to many other types of >>>>>> systems. >>>>>>>> The >>>>>>>>>>>>>> primary >>>>>>>>>>>>>> downside we have had with Coda Hale has to do with the >>>>>> clients >>>>>>>> and >>>>>>>>>>>>>> library >>>>>>>>>>>>>> incompatibilities. We are currently on an older more >>>>> popular >>>>>>>> version. >>>>>>>>>>>>>> The >>>>>>>>>>>>>> newer version is a rewrite of the APIs and is >>>> incompatible. >>>>>>>>>>>> Originally >>>>>>>>>>>>>> these were totally incompatible and people had to choose >>>>> one >>>>>> or >>>>>>>> the >>>>>>>>>>>>>> other. >>>>>>>>>>>>>> I think that has been improved so now the new version is >>>> a >>>>>>>> totally >>>>>>>>>>>>>> different package. But even in this case you end up with >>>>> both >>>>>>>>>>>> versions >>>>>>>>>>>>>> if >>>>>>>>>>>>>> you use Kafka and we are on a different version than you >>>>>> which >>>>>>> is >>>>>>>>>>>> going >>>>>>>>>>>>>> to >>>>>>>>>>>>>> be pretty inconvenient. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3. Doing our own has the downside of potentially >>>>> reinventing >>>>>>> the >>>>>>>>>>>> wheel, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> potentially needing to work out any bugs in our code. The >>>>>>> upsides >>>>>>>>>>>> would >>>>>>>>>>>>>> depend on the how good the reinvention was. As it >>>> happens I >>>>>>> did a >>>>>>>>>>>> quick >>>>>>>>>>>>>> (~900 loc) version of a metrics library that is under >>>>>>>>>>>>>> kafka.common.metrics. >>>>>>>>>>>>>> I think it has some advantages over the Yammer metrics >>>>>> package >>>>>>>> for >>>>>>>>>>>> our >>>>>>>>>>>>>> usage beyond just not causing incompatibilities. I will >>>>>>> describe >>>>>>>> this >>>>>>>>>>>>>> code >>>>>>>>>>>>>> so we can discuss the pros and cons. Although I favor >>>> this >>>>>>>> approach I >>>>>>>>>>>>>> have >>>>>>>>>>>>>> no emotional attachment and wouldn't be too sad if I >>>> ended >>>>> up >>>>>>>>>>>> deleting >>>>>>>>>>>>>> it. >>>>>>>>>>>>>> Here are javadocs for this code, though I haven't written >>>>>> much >>>>>>>>>>>>>> documentation yet since I might end up deleting it: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is a quick overview of this library. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are three main public interfaces: >>>>>>>>>>>>>> Metrics - This is a repository of metrics being >>>> tracked. >>>>>>>>>>>>>> Metric - A single, named numerical value being measured >>>>>>> (i.e. a >>>>>>>>>>>>>> counter). >>>>>>>>>>>>>> Sensor - This is a thing that records values and >>>> updates >>>>>> zero >>>>>>>> or >>>>>>>>>>>> more >>>>>>>>>>>>>> metrics >>>>>>>>>>>>>> >>>>>>>>>>>>>> So let's say we want to track three values about message >>>>>> sizes; >>>>>>>>>>>>>> specifically say we want to record the average, the >>>>> maximum, >>>>>>> the >>>>>>>>>>>> total >>>>>>>>>>>>>> rate >>>>>>>>>>>>>> of bytes being sent, and a count of messages. Then we >>>> would >>>>>> do >>>>>>>>>>>> something >>>>>>>>>>>>>> like this: >>>>>>>>>>>>>> >>>>>>>>>>>>>> // setup code >>>>>>>>>>>>>> Metrics metrics = new Metrics(); // this is a global >>>>>>>> "singleton" >>>>>>>>>>>>>> Sensor sensor = >>>>>>>> metrics.sensor("kafka.producer.message.sizes"); >>>>>>>>>>>>>> sensor.add("kafka.producer.message-size.avg", new >>>>> Avg()); >>>>>>>>>>>>>> sensor.add("kafka.producer.message-size.max", new >>>>> Max()); >>>>>>>>>>>>>> sensor.add("kafka.producer.bytes-sent-per-sec", new >>>>>> Rate()); >>>>>>>>>>>>>> sensor.add("kafka.producer.message-count", new >>>> Count()); >>>>>>>>>>>>>> >>>>>>>>>>>>>> // now when we get a message we do this >>>>>>>>>>>>>> sensor.record(messageSize); >>>>>>>>>>>>>> >>>>>>>>>>>>>> The above code creates the global metrics repository, >>>>>> creates a >>>>>>>>>>>> single >>>>>>>>>>>>>> Sensor, and defines 5 named metrics that are updated by >>>>> that >>>>>>>> Sensor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Like Yammer Metrics (YM) I allow you to plug in >>>>> "reporters", >>>>>>>>>>>> including a >>>>>>>>>>>>>> JMX reporter. Unlike the Coda Hale JMX reporter the >>>>> reporter >>>>>> I >>>>>>>> have >>>>>>>>>>>> keys >>>>>>>>>>>>>> off the metric names not the Sensor names, which I think >>>> is >>>>>> an >>>>>>>>>>>>>> improvement--I just use the convention that the last >>>>> portion >>>>>> of >>>>>>>> the >>>>>>>>>>>>>> name is >>>>>>>>>>>>>> the attribute name, the second to last is the mbean name, >>>>> and >>>>>>> the >>>>>>>>>>>> rest >>>>>>>>>>>>>> is >>>>>>>>>>>>>> the package. So in the above example there is a producer >>>>>> mbean >>>>>>>> that >>>>>>>>>>>> has >>>>>>>>>>>>>> a >>>>>>>>>>>>>> avg and max attribute and a producer mbean that has a >>>>>>>>>>>> bytes-sent-per-sec >>>>>>>>>>>>>> and message-count attribute. This is nice because you can >>>>>>>> logically >>>>>>>>>>>>>> group >>>>>>>>>>>>>> the values reported irrespective of where in the program >>>>> they >>>>>>> are >>>>>>>>>>>>>> computed--that is an mbean can logically group attributes >>>>>>>> computed >>>>>>>>>>>> off >>>>>>>>>>>>>> different sensors. This means you can report values by >>>>>> logical >>>>>>>>>>>>>> subsystem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also allow the concept of hierarchical Sensors which I >>>>>> think >>>>>>>> is a >>>>>>>>>>>> good >>>>>>>>>>>>>> convenience. I have noticed a common pattern in systems >>>>> where >>>>>>> you >>>>>>>>>>>> need >>>>>>>>>>>>>> to >>>>>>>>>>>>>> roll up the same values along different dimensions. An >>>>> simple >>>>>>>>>>>> example is >>>>>>>>>>>>>> metrics about qps, data rate, etc on the broker. These we >>>>>> want >>>>>>> to >>>>>>>>>>>>>> capture >>>>>>>>>>>>>> in aggregate, but also broken down by topic-id. You can >>>> do >>>>>> this >>>>>>>>>>>> purely >>>>>>>>>>>>>> by >>>>>>>>>>>>>> defining the sensor hierarchy: >>>>>>>>>>>>>> Sensor allSizes = metrics.sensor("kafka.producer.sizes"); >>>>>>>>>>>>>> Sensor topicSizes = metrics.sensor("kafka.producer." + >>>>> topic >>>>>> + >>>>>>>>>>>>>> ".sizes", >>>>>>>>>>>>>> allSizes); >>>>>>>>>>>>>> Now each actual update will go to the appropriate >>>>> topicSizes >>>>>>>> sensor >>>>>>>>>>>>>> (based >>>>>>>>>>>>>> on the topic name), but allSizes metrics will get updated >>>>>> too. >>>>>>> I >>>>>>>> also >>>>>>>>>>>>>> support multiple parents for each sensor as well as >>>>> multiple >>>>>>>> layers >>>>>>>>>>>> of >>>>>>>>>>>>>> hiearchy, so you can define a more elaborate DAG of >>>>> sensors. >>>>>> An >>>>>>>>>>>> example >>>>>>>>>>>>>> of >>>>>>>>>>>>>> how this would be useful is if you wanted to record your >>>>>>> metrics >>>>>>>>>>>> broken >>>>>>>>>>>>>> down by topic AND client id as well as the global >>>>> aggregate. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Each metric can take a configurable Quota value which >>>>> allows >>>>>> us >>>>>>>> to >>>>>>>>>>>> limit >>>>>>>>>>>>>> the maximum value of that sensor. This is intended for >>>> use >>>>> on >>>>>>> the >>>>>>>>>>>>>> server as >>>>>>>>>>>>>> part of our Quota implementation. The way this works is >>>>> that >>>>>>> you >>>>>>>>>>>> record >>>>>>>>>>>>>> metrics as usual: >>>>>>>>>>>>>> mySensor.record(42.0) >>>>>>>>>>>>>> However if this event occurance causes one of the metrics >>>>> to >>>>>>>> exceed >>>>>>>>>>>> its >>>>>>>>>>>>>> maximum allowable value (the quota) this call will throw >>>> a >>>>>>>>>>>>>> QuotaViolationException. The cool thing about this is >>>> that >>>>> it >>>>>>>> means >>>>>>>>>>>> we >>>>>>>>>>>>>> can >>>>>>>>>>>>>> define quotas on anything we capture metrics for, which I >>>>>> think >>>>>>>> is >>>>>>>>>>>>>> pretty >>>>>>>>>>>>>> cool. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another question is how to handle windowing of the >>>> values? >>>>>>>> Metrics >>>>>>>>>>>> want >>>>>>>>>>>>>> to >>>>>>>>>>>>>> record the "current" value, but the definition of current >>>>> is >>>>>>>>>>>> inherently >>>>>>>>>>>>>> nebulous. A few of the obvious gotchas are that if you >>>>> define >>>>>>>>>>>> "current" >>>>>>>>>>>>>> to >>>>>>>>>>>>>> be a number of events you can end up measuring an >>>>> arbitrarily >>>>>>>> long >>>>>>>>>>>>>> window >>>>>>>>>>>>>> of time if the event rate is low (e.g. you think you are >>>>>>> getting >>>>>>>> 50 >>>>>>>>>>>>>> messages/sec because that was the rate yesterday when all >>>>>>> events >>>>>>>>>>>>>> topped). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is how I approach this. All the metrics use the same >>>>>>>> windowing >>>>>>>>>>>>>> approach. We define a single window by a length of time >>>> or >>>>>>>> number of >>>>>>>>>>>>>> values >>>>>>>>>>>>>> (you can use either or both--if both the window ends when >>>>>>>> *either* >>>>>>>>>>>> the >>>>>>>>>>>>>> time >>>>>>>>>>>>>> bound or event bound is hit). The typical problem with >>>> hard >>>>>>>> window >>>>>>>>>>>>>> boundaries is that at the beginning of the window you >>>> have >>>>> no >>>>>>>> data >>>>>>>>>>>> and >>>>>>>>>>>>>> the >>>>>>>>>>>>>> first few samples are too small to be a valid sample. >>>>>> (Consider >>>>>>>> if >>>>>>>>>>>> you >>>>>>>>>>>>>> were >>>>>>>>>>>>>> keeping an avg and the first value in the window happens >>>> to >>>>>> be >>>>>>>> very >>>>>>>>>>>> very >>>>>>>>>>>>>> high, if you check the avg at this exact time you will >>>>>> conclude >>>>>>>> the >>>>>>>>>>>> avg >>>>>>>>>>>>>> is >>>>>>>>>>>>>> very high but on a sample size of one). One simple fix >>>>> would >>>>>> be >>>>>>>> to >>>>>>>>>>>>>> always >>>>>>>>>>>>>> report the last complete window, however this is not >>>>>>> appropriate >>>>>>>> here >>>>>>>>>>>>>> because (1) we want to drive quotas off it so it needs to >>>>> be >>>>>>>> current, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> (2) since this is for monitoring you kind of care more >>>>> about >>>>>>> the >>>>>>>>>>>> current >>>>>>>>>>>>>> state. The ideal solution here would be to define a >>>>> backwards >>>>>>>> looking >>>>>>>>>>>>>> sliding window from the present, but many statistics are >>>>>>> actually >>>>>>>>>>>> very >>>>>>>>>>>>>> hard >>>>>>>>>>>>>> to compute in this model without retaining all the values >>>>>> which >>>>>>>>>>>> would be >>>>>>>>>>>>>> hopelessly inefficient. My solution to this is to keep a >>>>>>>> configurable >>>>>>>>>>>>>> number of windows (default is two) and combine them for >>>> the >>>>>>>> estimate. >>>>>>>>>>>>>> So in >>>>>>>>>>>>>> a two sample case depending on when you ask you have >>>>> between >>>>>>> one >>>>>>>> and >>>>>>>>>>>> two >>>>>>>>>>>>>> complete samples worth of data to base the answer off of. >>>>>>>> Provided >>>>>>>>>>>> the >>>>>>>>>>>>>> sample window is large enough to get a valid result this >>>>>>>> satisfies >>>>>>>>>>>> both >>>>>>>>>>>>>> of >>>>>>>>>>>>>> my criteria of incorporating the most recent data and >>>>> having >>>>>>>>>>>> reasonable >>>>>>>>>>>>>> variance at all times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another approach is to use an exponential weighting >>>> scheme >>>>> to >>>>>>>> combine >>>>>>>>>>>>>> all >>>>>>>>>>>>>> history but emphasize the recent past. I have not done >>>> this >>>>>> as >>>>>>> it >>>>>>>>>>>> has a >>>>>>>>>>>>>> lot >>>>>>>>>>>>>> of issues for practical operational metrics. I'd be happy >>>>> to >>>>>>>>>>>> elaborate >>>>>>>>>>>>>> on >>>>>>>>>>>>>> this if anyone cares... >>>>>>>>>>>>>> >>>>>>>>>>>>>> The window size for metrics has a global default which >>>> can >>>>> be >>>>>>>>>>>>>> overridden at >>>>>>>>>>>>>> either the sensor or individual metric level. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In addition to these time series values the user can >>>>> directly >>>>>>>> expose >>>>>>>>>>>>>> some >>>>>>>>>>>>>> method of their choosing JMX-style by implementing the >>>>>>> Measurable >>>>>>>>>>>>>> interface >>>>>>>>>>>>>> and registering that value. E.g. >>>>>>>>>>>>>> metrics.addMetric("my.metric", new Measurable() { >>>>>>>>>>>>>> public double measure(MetricConfg config, long now) { >>>>>>>>>>>>>> return this.calculateValueToExpose(); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> }); >>>>>>>>>>>>>> This is useful for exposing things like the accumulator >>>>> free >>>>>>>> memory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The set of metrics is extensible, new metrics can be >>>> added >>>>> by >>>>>>>> just >>>>>>>>>>>>>> implementing the appropriate interfaces and registering >>>>> with >>>>>> a >>>>>>>>>>>> sensor. I >>>>>>>>>>>>>> implement the following metrics: >>>>>>>>>>>>>> total - the sum of all values from the given sensor >>>>>>>>>>>>>> count - a windowed count of values from the sensor >>>>>>>>>>>>>> avg - the sample average within the windows >>>>>>>>>>>>>> max - the max over the windows >>>>>>>>>>>>>> min - the min over the windows >>>>>>>>>>>>>> rate - the rate in the windows (e.g. the total or count >>>>>>>> divided by >>>>>>>>>>>> the >>>>>>>>>>>>>> ellapsed time) >>>>>>>>>>>>>> percentiles - a collection of percentiles computed over >>>>> the >>>>>>>> window >>>>>>>>>>>>>> >>>>>>>>>>>>>> My approach to percentiles is a little different from the >>>>>>> yammer >>>>>>>>>>>> metrics >>>>>>>>>>>>>> package. My complaint about the yammer metrics approach >>>> is >>>>>> that >>>>>>>> it >>>>>>>>>>>> uses >>>>>>>>>>>>>> rather expensive sampling and uses kind of a lot of >>>> memory >>>>> to >>>>>>>> get a >>>>>>>>>>>>>> reasonable sample. This is problematic for per-topic >>>>>>>> measurements. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Instead I use a fixed range for the histogram (e.g. 0.0 >>>> to >>>>>>>> 30000.0) >>>>>>>>>>>>>> which >>>>>>>>>>>>>> directly allows you to specify the desired memory use. >>>> Any >>>>>>> value >>>>>>>>>>>> below >>>>>>>>>>>>>> the >>>>>>>>>>>>>> minimum is recorded as -Infinity and any value above the >>>>>>> maximum >>>>>>>> as >>>>>>>>>>>>>> +Infinity. I think this is okay as all metrics have an >>>>>> expected >>>>>>>> range >>>>>>>>>>>>>> except for latency which can be arbitrarily large, but >>>> for >>>>>> very >>>>>>>> high >>>>>>>>>>>>>> latency there is no need to model it exactly (e.g. 30 >>>>>> seconds + >>>>>>>>>>>> really >>>>>>>>>>>>>> is >>>>>>>>>>>>>> effectively infinite). Within the range values are >>>> recorded >>>>>> in >>>>>>>>>>>> buckets >>>>>>>>>>>>>> which can be either fixed width or increasing width. The >>>>>>>> increasing >>>>>>>>>>>>>> width >>>>>>>>>>>>>> is analogous to the idea of significant figures, that is >>>> if >>>>>>> your >>>>>>>>>>>> value >>>>>>>>>>>>>> is >>>>>>>>>>>>>> in the range 0-10 you might want to be accurate to within >>>>>> 1ms, >>>>>>>> but if >>>>>>>>>>>>>> it is >>>>>>>>>>>>>> 20000 there is no need to be so accurate. I implemented a >>>>>>> linear >>>>>>>>>>>> bucket >>>>>>>>>>>>>> size where the Nth bucket has width proportional to N. An >>>>>>>> exponential >>>>>>>>>>>>>> bucket size would also be sensible and could likely be >>>>>> derived >>>>>>>>>>>> directly >>>>>>>>>>>>>> from the floating point representation of a the value. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd like to get some feedback on this metrics code and >>>>> make a >>>>>>>>>>>> decision >>>>>>>>>>>>>> on >>>>>>>>>>>>>> whether we want to use it before I actually go ahead and >>>>> add >>>>>>> all >>>>>>>> the >>>>>>>>>>>>>> instrumentation in the code (otherwise I'll have to redo >>>> it >>>>>> if >>>>>>> we >>>>>>>>>>>> switch >>>>>>>>>>>>>> approaches). So the next topic of discussion will be >>>> which >>>>>>> actual >>>>>>>>>>>>>> metrics >>>>>>>>>>>>>> to add. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>