Hi Jay,

1. Agree with your assessment. Let me know when you start writing the metrics 
library to rule them all, I'm interested :-)

2. If I understood you correctly, you need to give an indication of what range 
of values you expect for a metric (e.g. a latency might be in log-scale of 
buckets between 0.1ms and 30,000ms) -- that's what I meant with distribution. 
Or did I get it wrong?

There are a bunch of interesting algorithms for estimating percentiles with a 
small memory footprint. This is probably getting too far into yak-shaving 
territory, but just in case it's useful, this short literature survey may be 
useful:

Chiranjeeb Buragohain and Subhash Suri: "Quantiles on Streams" in Encyclopedia 
of Database Systems, Springer, pp 2235–2240, 2009. ISBN: 978-0-387-35544-3 
http://www.cs.ucsb.edu/~suri/psdir/ency.pdf

3. I don't know all the purposes for which quotas will be used, but 
instinctively your approach sounds good to me. I would be inclined to increase 
N a bit (perhaps to 4 or 5), to reduce the uncertainty introduced by the 
incomplete window, if memory usage allows.

5. Looking into open-sourcing it. I will also take a look at your code.

Best,
Martin

On 22 Feb 2014, at 18:53, Jay Kreps <jay.kr...@gmail.com> wrote:
> Hey Martin,
> 
> Thanks for the great feedback.
> 
> 1. I agree with the problems of mixing moving window statistics with fixed
> window statistics. That was one of my rationales. The other is that
> weighted statistics are very unintuitive for people compared to simple
> things like averages and percentiles so they fail a bit as an intuitive
> monitoring mechanism. I actually think the moving windows are technically
> superior since they don't have hard boundaries, but a naive implementation
> based solely on events is actually totally wrong for the reasons you
> describe, the weighting needs to take into account the point in time of the
> estimate in its contribution to the average. This is an interesting problem
> and I started to think about it but then decided that if I kept thinking
> about it I would never get anything finished. When I retire I plan to write
> a metrics library based solely on continuously weighted averages. :-)
> 
> 2. Fair. To be clear you needn't encode a distribution, just your
> preference about accuracy in the measurement. You are saying "I care
> equally about accuracy in the whole range" or "I don't care about fine
> grained accuracy when the numbers themselves are large".
> 
> 3. The reason the exception is good is because the actual quota may be low
> down in some part of the system, but a quota violation always needs to
> unwind all the way back up to the API layer to return the error to the
> client. So an exception is actually just what you need because the catch
> will actually potentially be in a different place than the record() call.
> This let's you introduce quotaing without each subsystem really needing to
> know about it.
> 
> Your point about whether or not you should count the current event when a
> quota violation occurs is a good one. I actually think the right answer
> depends on the details of how you handle windowing. For example one
> approach to windowing I have seen is to use the most recent COMPLETE window
> as the estimate while you fill up the current window. In this model then
> with a 30 second window the estimate you give out is always 0-30 seconds
> old. In this case you have a real problem with quotas because once the
> previous window is filled and you are in violation of your quota you will
> keep throwing exceptions regardless of the client behavior for the duration
> of the next window. But worse if you aren't counting the requests that got
> rejected then even though the client behavior is still bad your next window
> will record no values (because you rejected them all as quota violations).
> This is clearly a mess.
> 
> But that isn't quite how I'm doing windowing. The way I do it is I always
> keep N windows, with the last window being partial) and the estimate is
> overall all windows. So with N=2 (the default) when you complete the
> current window the previous window is cleared and used for to record the
> new values. The downside of this is that with a 30 second window and N=2
> your estimate is based on anything from 30 seconds to 60 seconds. The
> upside is that the most recent data is always included. I feel this is
> inherently important for monitoring. But it is particularly important for
> Quotas. In this case I feel that it is always the right thing to NOT count
> rejected measurements. Not that in this model let's say that the user goes
> over their quota and stays that way for a sustained period of time. The
> impact will not be the seesaw behavior I described where we reject all then
> none of their requests, instead we will reject enough requests to keep them
> under their quota.
> 
> 5. I would definitely be interested to see the code if it is open source,
> since I am interested in metrics. Overall since you went down this path I
> would be interested to get your opinion on my code. If you think what you
> did is better I would be open to discussing it as a third alternative too.
> If we decide we do want to use this code for metrics then we may want to
> implement a sampling histogram either in addition to or as a replacement
> for the existing histograms and if you were up to contribute your
> implementation that would be great.
> 
> -Jay
> 
> 
> On Sat, Feb 22, 2014 at 9:25 AM, Martin Kleppmann
> <mkleppm...@linkedin.com>wrote:
> 
>> Not sure if you want yet another opinion added to the pile -- but since I
>> had a similar problem on another project recently, I thought I'd weigh in.
>> (On that project we were originally using Coda's library, but then switched
>> to rolling our own metrics implementation because we needed to do a few
>> things differently.)
>> 
>> 1. Problems we encountered with Coda's library: it uses an
>> exponentially-weighted moving average (EMWA) for rates (eg. messages/sec),
>> and exponentially biased reservoir sampling for histograms (percentiles,
>> averages). Those methods of calculation work well for events with a
>> consistently high volume, but they give strange and misleading results for
>> events that are bursty or rare (eg error rates). We found that a fixed-size
>> window gives more predictable, easier-to-interpret results.
>> 
>> 2. In defence of Coda's library, I think its histogram implementation is a
>> good trade-off of memory for accuracy; I'm not totally convinced that your
>> proposal (counts of events in a fixed set of buckets) would be much better.
>> Would have to do some math to work out the expected accuracy in each case.
>> The reservoir sampling can be configured to use a smaller sample if the
>> default of 1028 samples is too expensive. Reservoir sampling also has the
>> advantage that you don't need to hard-code a bucket distribution.
>> 
>> 3. Quotas are an interesting use case. However, I'm not wild about using a
>> QuotaViolationException for control flow -- I think an explicit conditional
>> would be nicer than having to catch an exception. One question in that
>> context: if a quota is exceeded, do you still want to count the event
>> towards the metric, or do you want to stop counting it until the quota is
>> replenished? The answer may depend on the particular metric.
>> 
>> 4. If you decide to go with Coda's library, I would advocate isolating the
>> dependency into a separate module and using it via a facade -- somewhat
>> like using SLF4J instead of Log4j directly. It's ok for Coda's library to
>> be the default metrics implementation, but it should be easy to swap it out
>> for something different in case someone has a version conflict or differing
>> requirements. The facade should be at a low level (individual events), not
>> at the reporter level (which deals with pre-aggregated values, and is
>> already pluggable).
>> 
>> 5. If it's useful, I can probably contribute my simple (but imho
>> effective) metrics library, for embedding into Kafka. It uses reservoir
>> sampling for percentiles, like Coda's library, but uses a fixed-size window
>> instead of an exponential bias, which avoids weird behaviour on bursty
>> metrics.
>> 
>> In summary, I would advocate one of the following approaches:
>> - Coda Hale library via facade (allowing it to be swapped for something
>> else), or
>> - Own metrics implementation, provided that we have confidence in its
>> implementation of percentiles.
>> 
>> Martin
>> 
>> 
>> On 22 Feb 2014, at 01:06, Jay Kreps <jay.kr...@gmail.com> wrote:
>>> Hey guys,
>>> 
>>> Just picking up this thread again. I do want to drive a conclusion as I
>>> will run out of work to do on the producer soon and will need to add
>>> metrics of some sort. We can vote on it, but I'm not sure if we actually
>>> got everything discussed.
>>> 
>>> Joel, I wasn't fully sure how to interpret your comment. I think you are
>>> saying you are cool with the new metrics package as long as it really is
>>> better. Do you have any comment on whether you think the benefits I
>>> outlined are worth it? I agree with you that we could hold off on a
>> second
>>> repo until someone else would actually want to use our code.
>>> 
>>> Jun, I'm not averse to doing a sampling-based histogram and doing some
>>> comparison between the two approaches if you think this approach is
>>> otherwise better.
>>> 
>>> Sriram, originally I thought you preferred just sticking to Coda Hale,
>> but
>>> after your follow-up email I wasn't really sure...
>>> 
>>> Joe/Clark, yes this code allows pluggable reporting so you could have a
>>> metrics reporter that just wraps each metric in a Coda Hale Gauge if that
>>> is useful. Though obviously if enough people were doing that I would
>> think
>>> it would be worth just using the Coda Hale package directly...
>>> 
>>> -Jay
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Feb 13, 2014 at 3:34 PM, Clark Breyman <cl...@breyman.com>
>> wrote:
>>> 
>>>> Not requiring the client to link Coda/Yammer metrics sounds like a
>>>> compelling reason to pivot to new interfaces. If that's the agreed
>>>> direction, I'm hoping that we'd get the choice of backend to provide
>> (e.g.
>>>> facade on Yammer metrics for those with an investment in that) rather
>> than
>>>> force the new backend.  Having a metrics factory seems better for this
>> than
>>>> directly instantiating the singleton registry.
>>>> 
>>>> 
>>>> On Thu, Feb 13, 2014 at 2:39 PM, Joe Stein <joe.st...@stealth.ly>
>> wrote:
>>>> 
>>>>> Can we leave metrics and have multiple supported KafkaMetricsGroup
>>>>> implementing a yammer based implementation?
>>>>> 
>>>>> ProducerRequestStats with your configured analytics group?
>>>>> 
>>>>> On Thu, Feb 13, 2014 at 11:37 AM, Jay Kreps <jay.kr...@gmail.com>
>> wrote:
>>>>> 
>>>>>> I think we discussed the scala/java stuff more fully previously.
>>>>>> Essentially the client is embedded everywhere. Scala is very
>>>> incompatible
>>>>>> with itself so this makes it very hard to use for people using
>> anything
>>>>>> else in scala. Also Scala stack traces are very confusing. Basically
>> we
>>>>>> thought plain java code would be a lot easier for people to use. Even
>>>> if
>>>>>> Scala is more fun to write, that isn't really what we are optimizing
>>>> for.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>>> 
>>>>>> On Thu, Feb 13, 2014 at 8:09 AM, S Ahmed <sahmed1...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Jay, pretty impressive how you just write a 'quick version' like that
>>>>> :)
>>>>>>> Not to get off-topic but why didn't you write this in scala?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 12, 2014 at 6:54 PM, Joel Koshy <jjkosh...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> I have not had a chance to review the new metrics code and its
>>>>>>>> features carefully (apart from your write-up), but here are my
>>>>> general
>>>>>>>> thoughts:
>>>>>>>> 
>>>>>>>> Implementing a metrics package correctly is difficult; more so for
>>>>>>>> people like me, because I'm not a statistician.  However, if this
>>>> new
>>>>>>>> package: {(i) functions correctly (and we need to define and prove
>>>>>>>> correctness), (ii) is easy to use, (iii) serves all our current and
>>>>>>>> anticipated monitoring needs, (iv) is not overly complex that it
>>>>>>>> becomes a burden to maintain and we are better of with an available
>>>>>>>> library;} then I think it makes sense to embed it and use it within
>>>>>>>> the Kafka code. The main wins are: (i) predictability (no changing
>>>>>>>> APIs and intimate knowledge of the code) and (ii) control with
>>>>> respect
>>>>>>>> to both functionality (e.g., there are hard-coded decay constants
>>>> in
>>>>>>>> metrics-core 2.x) and correctness (i.e., if we find a bug in the
>>>>>>>> metrics package we have to submit a pull request and wait for it to
>>>>>>>> become mainstream).  I'm not sure it would help very much to pull
>>>> it
>>>>>>>> into a separate repo because that could potentially annul these
>>>>>>>> benefits.
>>>>>>>> 
>>>>>>>> Joel
>>>>>>>> 
>>>>>>>> On Wed, Feb 12, 2014 at 02:50:43PM -0800, Jay Kreps wrote:
>>>>>>>>> Sriram,
>>>>>>>>> 
>>>>>>>>> Makes sense. I am cool moving this stuff into its own repo if
>>>>> people
>>>>>>>> think
>>>>>>>>> that is better. I'm not sure it would get much contribution but
>>>>> when
>>>>>> I
>>>>>>>>> started messing with this I did have a lot of grand ideas of
>>>> making
>>>>>>>> adding
>>>>>>>>> metrics to a sensor dynamic so you could add more stuff in
>>>>>>> real-time(via
>>>>>>>>> jmx, say) and/or externalize all your metrics and config to a
>>>>>> separate
>>>>>>>> file
>>>>>>>>> like log4j with only the points of instrumentation hard-coded.
>>>>>>>>> 
>>>>>>>>> -Jay
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 12, 2014 at 2:07 PM, Sriram Subramanian <
>>>>>>>>> srsubraman...@linkedin.com> wrote:
>>>>>>>>> 
>>>>>>>>>> I am actually neutral to this change. I found the replies were
>>>>> more
>>>>>>>>>> towards the implementation and features so far. I would like
>>>> the
>>>>>>>> community
>>>>>>>>>> to think about the questions below before making a decision. My
>>>>>>>> opinion on
>>>>>>>>>> this is that it has potential to be its own project and it
>>>> would
>>>>>>>> attract
>>>>>>>>>> developers who are specifically interested in contributing to
>>>>>>> metrics.
>>>>>>>> I
>>>>>>>>>> am skeptical that the Kafka contributors would focus on
>>>> improving
>>>>>>> this
>>>>>>>>>> library (apart from bug fixes) instead of
>>>> developing/contributing
>>>>>> to
>>>>>>>> other
>>>>>>>>>> core pieces. It would be useful to continue and keep it
>>>> decoupled
>>>>>>> from
>>>>>>>>>> rest of Kafka (if it resides in the Kafka code base.) so that
>>>> we
>>>>>> can
>>>>>>>> move
>>>>>>>>>> it out anytime to its own project.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 2/12/14 1:21 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hey Sriram,
>>>>>>>>>>> 
>>>>>>>>>>> Not sure if these are actually meant as questions or more
>>>> veiled
>>>>>>>> comments.
>>>>>>>>>>> In an case I tried to give my 2 cents inline.
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Feb 11, 2014 at 11:12 PM, Sriram Subramanian <
>>>>>>>>>>> srsubraman...@linkedin.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I think answering the questions below would help to make a
>>>>>> better
>>>>>>>>>>>> decision. I am all for writing better code and having
>>>> superior
>>>>>>>>>>>> functionalities but it is worth thinking about stuff outside
>>>>>> just
>>>>>>>> code
>>>>>>>>>>>> in
>>>>>>>>>>>> this case -
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Does metric form a core piece of kafka? Does it help
>>>> kafka
>>>>>>>> greatly in
>>>>>>>>>>>> providing better core functionalities? I would always like a
>>>>>>>> project to
>>>>>>>>>>>> do
>>>>>>>>>>>> one thing really well. Metrics is a non trivial amount of
>>>>> code.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Metrics are obviously important, and obviously improving our
>>>>>> metrics
>>>>>>>>>>> system
>>>>>>>>>>> would be good. That said this may or may not be better, and
>>>> even
>>>>>> if
>>>>>>>> it is
>>>>>>>>>>> better that betterness might not outweigh other
>>>> considerations.
>>>>>> That
>>>>>>>> is
>>>>>>>>>>> what we are discussing.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 2. Does it make sense to be part of Kafka or its own
>>>> project?
>>>>> If
>>>>>>>> this
>>>>>>>>>>>> metrics library has the potential to be better than
>>>>>> metrics-core,
>>>>>>> I
>>>>>>>>>>>> would
>>>>>>>>>>>> be interested in other projects take advantage of it.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It could be either.
>>>>>>>>>>> 
>>>>>>>>>>> 3. Can Kafka maintain this library as new members join and old
>>>>>>> members
>>>>>>>>>>>> leave? Would this be a piece of code that no one (in Kafka)
>>>> in
>>>>>> the
>>>>>>>>>>>> future
>>>>>>>>>>>> spends time improving if the original author left?
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I am not going anywhere in the near term, but if I did, yes,
>>>>> this
>>>>>>>> would be
>>>>>>>>>>> like any other code we have. As with yammer metrics or any
>>>> other
>>>>>>> code
>>>>>>>> at
>>>>>>>>>>> that point we would either use it as is or someone would
>>>> improve
>>>>>> it.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 4. Does it affect the schedule of producer rewrite? This
>>>> needs
>>>>>> its
>>>>>>>> own
>>>>>>>>>>>> stabilization and modification to existing metric dashboards
>>>>> if
>>>>>>> the
>>>>>>>>>>>> format
>>>>>>>>>>>> is changed. Many times such cost are not factored in and a
>>>>>> project
>>>>>>>> loses
>>>>>>>>>>>> time before realizing the extra time required to make a
>>>>> library
>>>>>> as
>>>>>>>> this
>>>>>>>>>>>> operational.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Probably not. The metrics are going to change regardless of
>>>>>> whether
>>>>>>>> we use
>>>>>>>>>>> the same library or not. If we think this is better I don't
>>>> mind
>>>>>>>> putting
>>>>>>>>>>> in
>>>>>>>>>>> a little extra effort to get there.
>>>>>>>>>>> 
>>>>>>>>>>> Irrespective I think this is probably not the right thing to
>>>>>>> optimize
>>>>>>>> for.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> I am sure we can do better when we write code to a specific
>>>>> use
>>>>>>>> case (in
>>>>>>>>>>>> this case, kafka) rather than building a generic library
>>>> that
>>>>>>> suits
>>>>>>>> all
>>>>>>>>>>>> (metrics-core) but I would like us to have answers to the
>>>>>>> questions
>>>>>>>>>>>> above
>>>>>>>>>>>> and be prepared before we proceed to support this with the
>>>>>>> producer
>>>>>>>>>>>> rewrite.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Naturally we are all considering exactly these things, that is
>>>>>>>> exactly the
>>>>>>>>>>> reason I started the thread.
>>>>>>>>>>> 
>>>>>>>>>>> -Jay
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 2/11/14 6:28 PM, "Jun Rao" <jun...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the detailed write-up. It's well thought
>>>> through.
>>>>> A
>>>>>>> few
>>>>>>>>>>>>> comments:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. I have a couple of concerns on the percentiles. The
>>>> first
>>>>>>> issue
>>>>>>>> is
>>>>>>>>>>>> that
>>>>>>>>>>>>> It requires the user to know the value range. Since the
>>>> range
>>>>>> for
>>>>>>>>>>>> things
>>>>>>>>>>>>> like message size (in millions) is quite different from
>>>> those
>>>>>>> like
>>>>>>>>>>>> request
>>>>>>>>>>>>> time (less than 100), it's going to be hard to pick a good
>>>>>> global
>>>>>>>>>>>> default
>>>>>>>>>>>>> range. Different apps could be dealing with different
>>>> message
>>>>>>>> size. So
>>>>>>>>>>>>> they
>>>>>>>>>>>>> probably will have to customize the range. Another issue is
>>>>>> that
>>>>>>>> it can
>>>>>>>>>>>>> only report values at the bucket boundaries. So, if you
>>>> have
>>>>>> 1000
>>>>>>>>>>>> buckets
>>>>>>>>>>>>> and a value range of 1 million, you will only see 1000
>>>>> possible
>>>>>>>> values
>>>>>>>>>>>> as
>>>>>>>>>>>>> the quantile, which is probably too sparse. The
>>>>> implementation
>>>>>> of
>>>>>>>>>>>>> histogram
>>>>>>>>>>>>> in metrics-core keeps a fix size of samples, which avoids
>>>>> both
>>>>>>>> issues.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2. We need to document the 3-part metrics names better
>>>> since
>>>>>> it's
>>>>>>>> not
>>>>>>>>>>>>> obvious what the convention is. Also, currently the name of
>>>>> the
>>>>>>>> sensor
>>>>>>>>>>>> and
>>>>>>>>>>>>> the metrics defined in it are independent. Would it make
>>>>> sense
>>>>>> to
>>>>>>>> have
>>>>>>>>>>>> the
>>>>>>>>>>>>> sensor name be a prefix of the metric name?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Overall, this approach seems to be cleaner than
>>>> metrics-core
>>>>> by
>>>>>>>>>>>> decoupling
>>>>>>>>>>>>> measuring and reporting. The main benefit of metrics-core
>>>>> seems
>>>>>>> to
>>>>>>>> be
>>>>>>>>>>>> the
>>>>>>>>>>>>> existing reporters. Since not that many people voted for
>>>>>>>> metrics-core,
>>>>>>>>>>>> I
>>>>>>>>>>>>> am
>>>>>>>>>>>>> ok with going with the new implementation. My only
>>>>>> recommendation
>>>>>>>> is to
>>>>>>>>>>>>> address the concern on percentiles.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jun
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Feb 6, 2014 at 12:51 PM, Jay Kreps <
>>>>>> jay.kr...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey guys,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I wanted to kick off a quick discussion of metrics with
>>>>>> respect
>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>>> new
>>>>>>>>>>>>>> producer and consumer (and potentially the server).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> At a high level I think there are three approaches we
>>>> could
>>>>>>> take:
>>>>>>>>>>>>>> 1. Plain vanilla JMX
>>>>>>>>>>>>>> 2. Use Coda Hale (AKA Yammer) Metrics
>>>>>>>>>>>>>> 3. Do our own metrics (with JMX as one output)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. Has the advantage that JMX is the most commonly used
>>>>> java
>>>>>>>> thing
>>>>>>>>>>>> and
>>>>>>>>>>>>>> plugs in reasonably to most metrics systems. JMX is
>>>>> included
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>> JDK
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>> it doesn't impose any additional dependencies on clients.
>>>>> It
>>>>>>> has
>>>>>>>> the
>>>>>>>>>>>>>> disadvantage that plain vanilla JMX is a pain to use. We
>>>>>> would
>>>>>>>> need a
>>>>>>>>>>>>>> bunch
>>>>>>>>>>>>>> of helper code for maintaining counters to make this
>>>>>>> reasonable.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. Coda Hale metrics is pretty good and broadly used. It
>>>>>>>> supports JMX
>>>>>>>>>>>>>> output as well as direct output to many other types of
>>>>>> systems.
>>>>>>>> The
>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>> downside we have had with Coda Hale has to do with the
>>>>>> clients
>>>>>>>> and
>>>>>>>>>>>>>> library
>>>>>>>>>>>>>> incompatibilities. We are currently on an older more
>>>>> popular
>>>>>>>> version.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> newer version is a rewrite of the APIs and is
>>>> incompatible.
>>>>>>>>>>>> Originally
>>>>>>>>>>>>>> these were totally incompatible and people had to choose
>>>>> one
>>>>>> or
>>>>>>>> the
>>>>>>>>>>>>>> other.
>>>>>>>>>>>>>> I think that has been improved so now the new version is
>>>> a
>>>>>>>> totally
>>>>>>>>>>>>>> different package. But even in this case you end up with
>>>>> both
>>>>>>>>>>>> versions
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> you use Kafka and we are on a different version than you
>>>>>> which
>>>>>>> is
>>>>>>>>>>>> going
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be pretty inconvenient.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. Doing our own has the downside of potentially
>>>>> reinventing
>>>>>>> the
>>>>>>>>>>>> wheel,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> potentially needing to work out any bugs in our code. The
>>>>>>> upsides
>>>>>>>>>>>> would
>>>>>>>>>>>>>> depend on the how good the reinvention was. As it
>>>> happens I
>>>>>>> did a
>>>>>>>>>>>> quick
>>>>>>>>>>>>>> (~900 loc) version of a metrics library that is under
>>>>>>>>>>>>>> kafka.common.metrics.
>>>>>>>>>>>>>> I think it has some advantages over the Yammer metrics
>>>>>> package
>>>>>>>> for
>>>>>>>>>>>> our
>>>>>>>>>>>>>> usage beyond just not causing incompatibilities. I will
>>>>>>> describe
>>>>>>>> this
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>> so we can discuss the pros and cons. Although I favor
>>>> this
>>>>>>>> approach I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> no emotional attachment and wouldn't be too sad if I
>>>> ended
>>>>> up
>>>>>>>>>>>> deleting
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>> Here are javadocs for this code, though I haven't written
>>>>>> much
>>>>>>>>>>>>>> documentation yet since I might end up deleting it:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here is a quick overview of this library.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are three main public interfaces:
>>>>>>>>>>>>>> Metrics - This is a repository of metrics being
>>>> tracked.
>>>>>>>>>>>>>> Metric - A single, named numerical value being measured
>>>>>>> (i.e. a
>>>>>>>>>>>>>> counter).
>>>>>>>>>>>>>> Sensor - This is a thing that records values and
>>>> updates
>>>>>> zero
>>>>>>>> or
>>>>>>>>>>>> more
>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So let's say we want to track three values about message
>>>>>> sizes;
>>>>>>>>>>>>>> specifically say we want to record the average, the
>>>>> maximum,
>>>>>>> the
>>>>>>>>>>>> total
>>>>>>>>>>>>>> rate
>>>>>>>>>>>>>> of bytes being sent, and a count of messages. Then we
>>>> would
>>>>>> do
>>>>>>>>>>>> something
>>>>>>>>>>>>>> like this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  // setup code
>>>>>>>>>>>>>>  Metrics metrics = new Metrics(); // this is a global
>>>>>>>> "singleton"
>>>>>>>>>>>>>>  Sensor sensor =
>>>>>>>> metrics.sensor("kafka.producer.message.sizes");
>>>>>>>>>>>>>>  sensor.add("kafka.producer.message-size.avg", new
>>>>> Avg());
>>>>>>>>>>>>>>  sensor.add("kafka.producer.message-size.max", new
>>>>> Max());
>>>>>>>>>>>>>>  sensor.add("kafka.producer.bytes-sent-per-sec", new
>>>>>> Rate());
>>>>>>>>>>>>>>  sensor.add("kafka.producer.message-count", new
>>>> Count());
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  // now when we get a message we do this
>>>>>>>>>>>>>>  sensor.record(messageSize);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The above code creates the global metrics repository,
>>>>>> creates a
>>>>>>>>>>>> single
>>>>>>>>>>>>>> Sensor, and defines 5 named metrics that are updated by
>>>>> that
>>>>>>>> Sensor.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Like Yammer Metrics (YM) I allow you to plug in
>>>>> "reporters",
>>>>>>>>>>>> including a
>>>>>>>>>>>>>> JMX reporter. Unlike the Coda Hale JMX reporter the
>>>>> reporter
>>>>>> I
>>>>>>>> have
>>>>>>>>>>>> keys
>>>>>>>>>>>>>> off the metric names not the Sensor names, which I think
>>>> is
>>>>>> an
>>>>>>>>>>>>>> improvement--I just use the convention that the last
>>>>> portion
>>>>>> of
>>>>>>>> the
>>>>>>>>>>>>>> name is
>>>>>>>>>>>>>> the attribute name, the second to last is the mbean name,
>>>>> and
>>>>>>> the
>>>>>>>>>>>> rest
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> the package. So in the above example there is a producer
>>>>>> mbean
>>>>>>>> that
>>>>>>>>>>>> has
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> avg and max attribute and a producer mbean that has a
>>>>>>>>>>>> bytes-sent-per-sec
>>>>>>>>>>>>>> and message-count attribute. This is nice because you can
>>>>>>>> logically
>>>>>>>>>>>>>> group
>>>>>>>>>>>>>> the values reported irrespective of where in the program
>>>>> they
>>>>>>> are
>>>>>>>>>>>>>> computed--that is an mbean can logically group attributes
>>>>>>>> computed
>>>>>>>>>>>> off
>>>>>>>>>>>>>> different sensors. This means you can report values by
>>>>>> logical
>>>>>>>>>>>>>> subsystem.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also allow the concept of hierarchical Sensors which I
>>>>>> think
>>>>>>>> is a
>>>>>>>>>>>> good
>>>>>>>>>>>>>> convenience. I have noticed a common pattern in systems
>>>>> where
>>>>>>> you
>>>>>>>>>>>> need
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> roll up the same values along different dimensions. An
>>>>> simple
>>>>>>>>>>>> example is
>>>>>>>>>>>>>> metrics about qps, data rate, etc on the broker. These we
>>>>>> want
>>>>>>> to
>>>>>>>>>>>>>> capture
>>>>>>>>>>>>>> in aggregate, but also broken down by topic-id. You can
>>>> do
>>>>>> this
>>>>>>>>>>>> purely
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> defining the sensor hierarchy:
>>>>>>>>>>>>>> Sensor allSizes = metrics.sensor("kafka.producer.sizes");
>>>>>>>>>>>>>> Sensor topicSizes = metrics.sensor("kafka.producer." +
>>>>> topic
>>>>>> +
>>>>>>>>>>>>>> ".sizes",
>>>>>>>>>>>>>> allSizes);
>>>>>>>>>>>>>> Now each actual update will go to the appropriate
>>>>> topicSizes
>>>>>>>> sensor
>>>>>>>>>>>>>> (based
>>>>>>>>>>>>>> on the topic name), but allSizes metrics will get updated
>>>>>> too.
>>>>>>> I
>>>>>>>> also
>>>>>>>>>>>>>> support multiple parents for each sensor as well as
>>>>> multiple
>>>>>>>> layers
>>>>>>>>>>>> of
>>>>>>>>>>>>>> hiearchy, so you can define a more elaborate DAG of
>>>>> sensors.
>>>>>> An
>>>>>>>>>>>> example
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> how this would be useful is if you wanted to record your
>>>>>>> metrics
>>>>>>>>>>>> broken
>>>>>>>>>>>>>> down by topic AND client id as well as the global
>>>>> aggregate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Each metric can take a configurable Quota value which
>>>>> allows
>>>>>> us
>>>>>>>> to
>>>>>>>>>>>> limit
>>>>>>>>>>>>>> the maximum value of that sensor. This is intended for
>>>> use
>>>>> on
>>>>>>> the
>>>>>>>>>>>>>> server as
>>>>>>>>>>>>>> part of our Quota implementation. The way this works is
>>>>> that
>>>>>>> you
>>>>>>>>>>>> record
>>>>>>>>>>>>>> metrics as usual:
>>>>>>>>>>>>>>  mySensor.record(42.0)
>>>>>>>>>>>>>> However if this event occurance causes one of the metrics
>>>>> to
>>>>>>>> exceed
>>>>>>>>>>>> its
>>>>>>>>>>>>>> maximum allowable value (the quota) this call will throw
>>>> a
>>>>>>>>>>>>>> QuotaViolationException. The cool thing about this is
>>>> that
>>>>> it
>>>>>>>> means
>>>>>>>>>>>> we
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> define quotas on anything we capture metrics for, which I
>>>>>> think
>>>>>>>> is
>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>> cool.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another question is how to handle windowing of the
>>>> values?
>>>>>>>> Metrics
>>>>>>>>>>>> want
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> record the "current" value, but the definition of current
>>>>> is
>>>>>>>>>>>> inherently
>>>>>>>>>>>>>> nebulous. A few of the obvious gotchas are that if you
>>>>> define
>>>>>>>>>>>> "current"
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be a number of events you can end up measuring an
>>>>> arbitrarily
>>>>>>>> long
>>>>>>>>>>>>>> window
>>>>>>>>>>>>>> of time if the event rate is low (e.g. you think you are
>>>>>>> getting
>>>>>>>> 50
>>>>>>>>>>>>>> messages/sec because that was the rate yesterday when all
>>>>>>> events
>>>>>>>>>>>>>> topped).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here is how I approach this. All the metrics use the same
>>>>>>>> windowing
>>>>>>>>>>>>>> approach. We define a single window by a length of time
>>>> or
>>>>>>>> number of
>>>>>>>>>>>>>> values
>>>>>>>>>>>>>> (you can use either or both--if both the window ends when
>>>>>>>> *either*
>>>>>>>>>>>> the
>>>>>>>>>>>>>> time
>>>>>>>>>>>>>> bound or event bound is hit). The typical problem with
>>>> hard
>>>>>>>> window
>>>>>>>>>>>>>> boundaries is that at the beginning of the window you
>>>> have
>>>>> no
>>>>>>>> data
>>>>>>>>>>>> and
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> first few samples are too small to be a valid sample.
>>>>>> (Consider
>>>>>>>> if
>>>>>>>>>>>> you
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>> keeping an avg and the first value in the window happens
>>>> to
>>>>>> be
>>>>>>>> very
>>>>>>>>>>>> very
>>>>>>>>>>>>>> high, if you check the avg at this exact time you will
>>>>>> conclude
>>>>>>>> the
>>>>>>>>>>>> avg
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> very high but on a sample size of one). One simple fix
>>>>> would
>>>>>> be
>>>>>>>> to
>>>>>>>>>>>>>> always
>>>>>>>>>>>>>> report the last complete window, however this is not
>>>>>>> appropriate
>>>>>>>> here
>>>>>>>>>>>>>> because (1) we want to drive quotas off it so it needs to
>>>>> be
>>>>>>>> current,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> (2) since this is for monitoring you kind of care more
>>>>> about
>>>>>>> the
>>>>>>>>>>>> current
>>>>>>>>>>>>>> state. The ideal solution here would be to define a
>>>>> backwards
>>>>>>>> looking
>>>>>>>>>>>>>> sliding window from the present, but many statistics are
>>>>>>> actually
>>>>>>>>>>>> very
>>>>>>>>>>>>>> hard
>>>>>>>>>>>>>> to compute in this model without retaining all the values
>>>>>> which
>>>>>>>>>>>> would be
>>>>>>>>>>>>>> hopelessly inefficient. My solution to this is to keep a
>>>>>>>> configurable
>>>>>>>>>>>>>> number of windows (default is two) and combine them for
>>>> the
>>>>>>>> estimate.
>>>>>>>>>>>>>> So in
>>>>>>>>>>>>>> a two sample case depending on when you ask you have
>>>>> between
>>>>>>> one
>>>>>>>> and
>>>>>>>>>>>> two
>>>>>>>>>>>>>> complete samples worth of data to base the answer off of.
>>>>>>>> Provided
>>>>>>>>>>>> the
>>>>>>>>>>>>>> sample window is large enough to get a valid result this
>>>>>>>> satisfies
>>>>>>>>>>>> both
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> my criteria of incorporating the most recent data and
>>>>> having
>>>>>>>>>>>> reasonable
>>>>>>>>>>>>>> variance at all times.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another approach is to use an exponential weighting
>>>> scheme
>>>>> to
>>>>>>>> combine
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> history but emphasize the recent past. I have not done
>>>> this
>>>>>> as
>>>>>>> it
>>>>>>>>>>>> has a
>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>> of issues for practical operational metrics. I'd be happy
>>>>> to
>>>>>>>>>>>> elaborate
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> this if anyone cares...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The window size for metrics has a global default which
>>>> can
>>>>> be
>>>>>>>>>>>>>> overridden at
>>>>>>>>>>>>>> either the sensor or individual metric level.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In addition to these time series values the user can
>>>>> directly
>>>>>>>> expose
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> method of their choosing JMX-style by implementing the
>>>>>>> Measurable
>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>> and registering that value. E.g.
>>>>>>>>>>>>>> metrics.addMetric("my.metric", new Measurable() {
>>>>>>>>>>>>>>   public double measure(MetricConfg config, long now) {
>>>>>>>>>>>>>>      return this.calculateValueToExpose();
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>> });
>>>>>>>>>>>>>> This is useful for exposing things like the accumulator
>>>>> free
>>>>>>>> memory.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The set of metrics is extensible, new metrics can be
>>>> added
>>>>> by
>>>>>>>> just
>>>>>>>>>>>>>> implementing the appropriate interfaces and registering
>>>>> with
>>>>>> a
>>>>>>>>>>>> sensor. I
>>>>>>>>>>>>>> implement the following metrics:
>>>>>>>>>>>>>> total - the sum of all values from the given sensor
>>>>>>>>>>>>>> count - a windowed count of values from the sensor
>>>>>>>>>>>>>> avg - the sample average within the windows
>>>>>>>>>>>>>> max - the max over the windows
>>>>>>>>>>>>>> min - the min over the windows
>>>>>>>>>>>>>> rate - the rate in the windows (e.g. the total or count
>>>>>>>> divided by
>>>>>>>>>>>> the
>>>>>>>>>>>>>> ellapsed time)
>>>>>>>>>>>>>> percentiles - a collection of percentiles computed over
>>>>> the
>>>>>>>> window
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My approach to percentiles is a little different from the
>>>>>>> yammer
>>>>>>>>>>>> metrics
>>>>>>>>>>>>>> package. My complaint about the yammer metrics approach
>>>> is
>>>>>> that
>>>>>>>> it
>>>>>>>>>>>> uses
>>>>>>>>>>>>>> rather expensive sampling and uses kind of a lot of
>>>> memory
>>>>> to
>>>>>>>> get a
>>>>>>>>>>>>>> reasonable sample. This is problematic for per-topic
>>>>>>>> measurements.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Instead I use a fixed range for the histogram (e.g. 0.0
>>>> to
>>>>>>>> 30000.0)
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> directly allows you to specify the desired memory use.
>>>> Any
>>>>>>> value
>>>>>>>>>>>> below
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> minimum is recorded as -Infinity and any value above the
>>>>>>> maximum
>>>>>>>> as
>>>>>>>>>>>>>> +Infinity. I think this is okay as all metrics have an
>>>>>> expected
>>>>>>>> range
>>>>>>>>>>>>>> except for latency which can be arbitrarily large, but
>>>> for
>>>>>> very
>>>>>>>> high
>>>>>>>>>>>>>> latency there is no need to model it exactly (e.g. 30
>>>>>> seconds +
>>>>>>>>>>>> really
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> effectively infinite). Within the range values are
>>>> recorded
>>>>>> in
>>>>>>>>>>>> buckets
>>>>>>>>>>>>>> which can be either fixed width or increasing width. The
>>>>>>>> increasing
>>>>>>>>>>>>>> width
>>>>>>>>>>>>>> is analogous to the idea of significant figures, that is
>>>> if
>>>>>>> your
>>>>>>>>>>>> value
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> in the range 0-10 you might want to be accurate to within
>>>>>> 1ms,
>>>>>>>> but if
>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>> 20000 there is no need to be so accurate. I implemented a
>>>>>>> linear
>>>>>>>>>>>> bucket
>>>>>>>>>>>>>> size where the Nth bucket has width proportional to N. An
>>>>>>>> exponential
>>>>>>>>>>>>>> bucket size would also be sensible and could likely be
>>>>>> derived
>>>>>>>>>>>> directly
>>>>>>>>>>>>>> from the floating point representation of a the value.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'd like to get some feedback on this metrics code and
>>>>> make a
>>>>>>>>>>>> decision
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> whether we want to use it before I actually go ahead and
>>>>> add
>>>>>>> all
>>>>>>>> the
>>>>>>>>>>>>>> instrumentation in the code (otherwise I'll have to redo
>>>> it
>>>>>> if
>>>>>>> we
>>>>>>>>>>>> switch
>>>>>>>>>>>>>> approaches). So the next topic of discussion will be
>>>> which
>>>>>>> actual
>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>> to add.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to