Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Alex Amato Tue, 17 Apr 2018 10:53:53 -0700

Hello,

I just wanted to give an update .


After some discussion, I've realized that its best to break up the two
concepts, with two separate way of reporting monitoring data. These two
categories are:

   1. Metrics - Counters, Gauges, Distributions. These are well defined
   concepts for monitoring information and ned to integrate with existing
   metrics collection systems such as Dropwizard and Stackdriver. Most metrics
   will go through this model, which will allow runners to process new metrics
   without adding extra code to support them, forwarding them to metric
   collection systems.
   2. Monitoring State - This supports general monitoring data which may
   not fit into the standard model for Metrics. For example an I/O source may
   provide a table of filenames+metadata, for files which are old and blocking
   the system. I will propose a general approach, similar to the URN+payload
   approach used in the doc right now.

I will rewrite most of the doc and propose separating these two very
different use cases, one which optimizes for integration with existing
monitoring systems. The other which optimizes for flexibility, allowing
more complex and custom metrics formats for other debugging scenarios.

I just wanted to give a brief update on the direction of this change,
before writing it up in full detail.


On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw <[email protected]>
wrote:

> I agree that the user/system dichotomy is false, the real question of how
> counters can be scoped to avoid accidental (or even intentional)
> interference. A system that entirely controls the interaction between the
> "user" (from its perspective) and the underlying system can do this by
> prefixing all requested "user" counters with a prefix it will not use
> itself. Of course this breaks down whenever the wrapping isn't complete
> (either on the production or consumption side), but may be worth doing for
> some components (like the SDKs that value being able to provide this
> isolation for better behavior). Actual (human) end users are likely to be
> much less careful about avoiding conflicts than library authors who in turn
> are generally less careful than authors of the system itself.
>
> We could alternatively allow for specifying fully qualified URNs for
> counter names in the SDK APIs, and letting "normal" user counters be in the
> empty namespace rather than something like beam:metrics:{user,other,...},
> perhaps with SDKs prohibiting certain conflicting prefixes (which is less
> than ideal). A layer above the SDK that has similar absolute control over
> its "users" would have a similar decision to make.
>
>
> On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles <[email protected]> wrote:
>
>> One reason I resist the user/system distinction is that Beam is a
>> multi-party system with at least SDK, runner, and pipeline. Often there may
>> be a DSL like SQL or Scio, or similarly someone may be building a platform
>> for their company where there is no user authoring the pipeline. Should
>> Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to
>> tack on the prefix? It looks like it is the SDK harness? Are there just
>> three namespaces "runner", "sdk", and "user"?  Most of what you'd think
>> of as "user" version "system" should simply be the different between
>> dynamically defined & typed metrics and fields in control plane protos. If
>> that layer of the namespaces is not finite and limited, who can extend make
>> a valid extension? Just some questions that I think would flesh out the
>> meaning of the "user" prefix.
>>
>> Kenn
>>
>> On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks, Robert!
>>>>>
>>>>> I think my lack of clarity is around the MetricSpec.  Maybe what's in
>>>>> my head and what's being proposed are the same thing.  When I read that 
>>>>> the
>>>>> MetricSpec describes the proto structure, that sound kind of complicated 
>>>>> to
>>>>> me.  But I may be misinterpreting it.  What I picture is something like a
>>>>> MetricSpec that looks like (note: my picture looks a lot like Stackdriver
>>>>> :):
>>>>>
>>>>> {
>>>>> name: "my_timer"
>>>>>
>>>>
>>>> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to
>>>> keep requiring namespaces). Or "beam:metric:[some non-user designation]"
>>>>
>>>
>>> Sure. Looks good.
>>>
>>>
>>>>
>>>> labels: { "ptransform" }
>>>>>
>>>>
>>>> How does an SDK act on this information?
>>>>
>>>
>>> The SDK is obligated to submit any metric values for that spec with a
>>> "ptransform" -> "transformName" in the labels field.  Autogenerating code
>>> from the spec to avoid typos should be easy.
>>>
>>>
>>>>
>>>>
>>>>> type: GAUGE
>>>>> value_type: int64
>>>>>
>>>>
>>>> I was lumping type and value_type into the same field, as a urn for
>>>> possibly extensibility, as they're tightly coupled (e.g. quantiles,
>>>> distributions).
>>>>
>>>
>>> My inclination is that keeping this set relatively small and fixed to a
>>> set that can be readily exported to external monitoring systems is more
>>> useful than the added indirection to support extensibility.  Lumping
>>> together seems reasonable.
>>>
>>>
>>>>
>>>>
>>>>> units: SECONDS
>>>>> description: "Times my stuff"
>>>>>
>>>>
>>>> Are both of these optional metadata, in the form of key-value field,
>>>> for flattened into the field itself (along with every other kind of
>>>> metadata you may want to attach)?
>>>>
>>>
>>> Optional metadata in the form of fixed fields.  Is there a use case for
>>> arbitrary metadata?  What would you do with it when exporting?
>>>
>>>
>>>>
>>>>
>>>>> }
>>>>>
>>>>> Then metrics submitted would look like:
>>>>> {
>>>>> name: "my_timer"
>>>>> labels: {"ptransform": "MyTransform"}
>>>>> int_value: 100
>>>>> }
>>>>>
>>>>
>>>> Yes, or value could be a bytes field that is encoded according to
>>>> [value_]type above, if we want that extensibility (e.g. if we want to
>>>> bundle the pardo sub-timings together, we'd need a proto for the value, but
>>>> that seems to specific to hard code into the basic structure).
>>>>
>>>>
>>> The simplicity coming from the fact that there's only one proto format
>>>>> for the spec and for the value.  The only thing that varies are the 
>>>>> entries
>>>>> in the map and the value field set.  It's pretty easy to establish
>>>>> contracts around this type of spec and even generate protos for use the in
>>>>> SDK that make the expectations explicit.
>>>>>
>>>>>
>>>>> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Or just "beam:counter:<namespace>:<name>" or even
>>>>>>> "beam:metric:<namespace>:<name>" since metrics have a type separate from
>>>>>>> their name.
>>>>>>>
>>>>>>
>>>>>> I proposed keeping the "user" in there to avoid possible clashes with
>>>>>> the system namespaces. (No preference on counter vs. metric, I wasn't
>>>>>> trying to imply counter = SumInts)
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I like the generalization from entity -> labels.  I view the purpose
>>>>>>> of those fields to provide context.  And labels feel like they supports 
>>>>>>> a
>>>>>>> richer set of contexts.
>>>>>>>
>>>>>>
>>>>>> If we think such a generalization provides value, I'm fine with doing
>>>>>> that now, as sets or key-value maps, if we have good enough examples to
>>>>>> justify this.
>>>>>>
>>>>>>
>>>>>>> The URN concept gets a little tricky.  I totally agree that the
>>>>>>> context fields should not be embedded in the name.
>>>>>>> There's a "name" which is the identifier that can be used to
>>>>>>> communicate what context values are supported / allowed for metrics with
>>>>>>> that name (for example, element_count expects a ptransform ID).  But 
>>>>>>> then
>>>>>>> there's the context.  In Stackdriver, this context is a map of key-value
>>>>>>> pairs; the type is considered metadata associated with the name, but not
>>>>>>> communicated with the value.
>>>>>>>
>>>>>>
>>>>>> I'm not quite following you here. If context contains a ptransform
>>>>>> id, then it cannot be associated with a single name.
>>>>>>
>>>>>>
>>>>>>> Could the URN be "beam:namespace:name" and every metric have a map
>>>>>>> of key-value pairs for context?
>>>>>>>
>>>>>>
>>>>>> The URN is the name. Something like
>>>>>> "beam:metric:ptransform_execution_times:v1."
>>>>>>
>>>>>>
>>>>>>> Not sure where this fits in the discussion or if this is handled
>>>>>>> somewhere, but allowing for a metric configuration that's provided
>>>>>>> independently of the value allows for configuring "type", "units", etc 
>>>>>>> in a
>>>>>>> uniform way without having to encode them in the metric name / value.
>>>>>>> Stackdriver expects each metric type has been configured ahead of time 
>>>>>>> with
>>>>>>> these annotations / metadata.  Then values are reported separately.  For
>>>>>>> system metrics, the definitions can be packaged with the SDK.  For user
>>>>>>> metrics, they'd be defined at runtime.
>>>>>>>
>>>>>>
>>>>>> This feels like the metrics spec, that specifies that the metric with
>>>>>> name/URN X has this type plus a bunch of other metadata (e.g. units, if
>>>>>> they're not implicit in the type? This gets into whether the type should 
>>>>>> be
>>>>>> Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...}
>>>>>> + units metadata).
>>>>>>
>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to