+1 for the thread safe metrics. This should be a rather low hanging fruit and easily added.
If we decide to add a histogram, then I would also be in favour of implementing our own version of a histogram. This avoids adding a hard dependency on Dropwizard or another metrics library to Flink core. Adding our own implementation would of course entail to also add a Dropwizard wrapper for reporting. Thus, if a user component required a Dropwizard histogram, then one could simply use the wrapper. Alternatively, one could also rely on external system to compute histograms. For example, statsD supports the generation of histograms from a stream of measurements. However, these histograms couldn't be used within Flink. Implementation wise, the histogram would most likely follow the implementation of Dropwizard's histogram: The basic idea is that a histogram can add samples to a reservoir which it uses to calculate a set of statistics when queried. The statistics comprises percentiles, mean, standard deviation and number of elements, for example. The reservoir defines the strategy of how to sample the input stream. There are different strategies conceivable: Uniform sampling, which constructs a long term distribution of the seen elements, exponentially decaying sampling, which favours more recent elements, sliding window or buckets. The question is now whether such an implementation already covers most use cases or whether histograms should support more functionaly. Feedback is highly appreciated :-) Cheers, Till On Mon, Jun 13, 2016 at 6:37 PM, Stephan Ewen <se...@apache.org> wrote: > I think it is totally fine to add a "ThreadsafeCounter" that uses an atomic > long internally > > On Sat, Jun 11, 2016 at 7:25 PM, Steve Cosenza <scose...@twitter.com> > wrote: > > > Also, we may be able to avoid the need for concurrent metrics by > > configuring each Finagle source to use a single thread. We'll investigate > > the performance implications this week and update you with the results. > > > > Thanks, > > Steve > > > > > > On Friday, June 10, 2016, Steve Cosenza <scose...@twitter.com> wrote: > > > >> +Chris Hogue who is also working on operationalizing Flink with me > >> > >> Hi Stephan, > >> > >> Thanks for the background on your current implementations! > >> > >> While we don't require a specific implementation for histogram, counter, > >> or gauge, it just became clear that we'll need threadsafe versions of > all > >> three of these metrics. This is because our messaging source is > implemented > >> using Finagle, and Finagle expects to be able to emit stats concurrently > >> from its managed threads. > >> > >> That being said, if adding threadsafe versions of the Flink counters is > >> not an option, we'd also be fine with directly reading and writing from > the > >> singleton Codahale MetricsRegistry that you start up in each > TaskManager. > >> > >> Thanks, > >> Steve > >> > >> On Fri, Jun 10, 2016 at 7:10 AM, Stephan Ewen <se...@apache.org> wrote: > >> > >>> A recent discussion brought up the point of adding a "histogram" metric > >>> type to Flink. This open thread is to gather some more of the > requirements > >>> for that metric. > >>> > >>> The most important question is whether users need Flink to offer > >>> specific implementations of "Histogram", like for example the " > >>> com.codahale.metrics.Histogram", or if a " > >>> org.apache.flink.metrics.Histogram" interface would work as well. > >>> The histogram could still be reported for example via dropwizard > >>> reporters. > >>> > >>> *Option (1):* If a Flink Histogram works as well, it would be simple to > >>> add one. The dropwizard reporter would need to wrap the Flink > Histogram for > >>> reporting. > >>> > >>> *Option (2)*: If the code needs the specific Dropwizard Histogram type, > >>> then one would need a wrapper class that makes a Flink Histogram look > like > >>> a dropwizard histogram. > >>> > >>> ---------- > >>> > >>> As a bit of background for the discussion, here are some thoughts > behind > >>> the way that Metrics are currently implemented in Flink. > >>> > >>> - The metric types in Flink are independent from libraries like > >>> "dropwizard" to reduce dependencies and retain freedom to swap > >>> implementations. > >>> > >>> - Metric reporting allows to reuse reporters from dropwizard > >>> > >>> - Some Flink metric implementations are also more lightweight than > for > >>> example in dropwizard. Counters for example are not thread safe, but > do not > >>> impose memory barriers. That is important for metrics deep in the > streaming > >>> runtime. > >>> > >>> > >>> > >> > > > > -- > > -Steve > > > > Sent from Gmail Mobile > > >