I'm much more interested in as-they-happening metrics than job completion summaries as these are stream processing jobs that should "never end". Ufuk's suggestion of a subtask-unique counter, combined with rate-of-change functions in a tool like InfluxDB will probably work for my needs. So too does managing my own dropwizard MetricRegistry.
An observation: routing all online metrics through the heartbeat mechanism to a single host for display sounds like a scalability bottleneck. Doesn't this design limit the practical volume of metrics that can be exposed by the runtime and user applications? On Thu, Nov 12, 2015 at 6:12 AM, Ufuk Celebi <u...@apache.org> wrote: > Hey Nick, > > you can do the following for per task stats (this is kind of an > workaround): > > Create an Accumulator with the subtask index in the name, e.g. > > int subtaskIndex = getRuntimeContext().getIndexOfThisSubtask(); > IntCounter counter = getRuntimeContext().getIntCounter("counter-" + > subtaskIndex); > > This way you have one accumulator per subtask. > > The web interface will display the values as they are set (I’m not sure if > it is in yet). You can also gather the stats from the execution result, e.g. > ExecutionResult res = env.execute(); > res.getAllAccumulatorResults(); > > > You can furthermore add a custom Accumulator variant, which simple sets > one value if this is what you need. > > Does this help? > > In any case, I agree that it would be nice to expose a special > API/accumulator for this via the runtime context. > > – Ufuk > > > On 12 Nov 2015, at 11:55, Maximilian Michels <m...@apache.org> wrote: > > > > Hi Nick, > > > > I don't know if you have already come across the Rest Api. If not, > > please have a look here: > > > https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html > > > > I know that Christian Kreutzfeldt (cc) has been working on a > > monitoring service which uses Akka messages to query the JobManager on > > a job's status and accumulators. I'm wondering if you two could engage > > in any way. > > > > Cheers, > > Max > > > > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk <ndimi...@gmail.com> > wrote: > >> Hello, > >> > >> I'm interested in exposing metrics from my UDFs. I see FLINK-1501 > exposes > >> task manager metrics via a UI; it would be nice to plug into the same > >> MetricRegistry to register my own (ie, gauges). I don't see this > exposed via > >> runtime context. This did lead me to discovering the Accumulators API. > This > >> looks more oriented to simple counts, which are summed across > components of > >> a batch job. In my case, I'd like to expose details of my stream > processing > >> vertices so that I can monitor their correctness and health re: runtime > >> decisions. For instance, referring back to my previous thread, I would > like > >> to expose the number of filters loaded into my custom RichCoFlatMap so > that > >> I can easily monitor this value. > >> > >> Thanks, > >> Nick > >