Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Andrea Foegler Fri, 13 Apr 2018 17:26:26 -0700

On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw <[email protected]> wrote:


> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler <[email protected]> wrote:
>
>> Thanks, Robert!
>>
>> I think my lack of clarity is around the MetricSpec.  Maybe what's in my
>> head and what's being proposed are the same thing.  When I read that the
>> MetricSpec describes the proto structure, that sound kind of complicated to
>> me.  But I may be misinterpreting it.  What I picture is something like a
>> MetricSpec that looks like (note: my picture looks a lot like Stackdriver
>> :):
>>
>> {
>> name: "my_timer"
>>
>
> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep
> requiring namespaces). Or "beam:metric:[some non-user designation]"
>

Sure. Looks good.


>
> labels: { "ptransform" }
>>
>
> How does an SDK act on this information?
>

The SDK is obligated to submit any metric values for that spec with a
"ptransform" -> "transformName" in the labels field.  Autogenerating code
from the spec to avoid typos should be easy.


>
>
>> type: GAUGE
>> value_type: int64
>>
>
> I was lumping type and value_type into the same field, as a urn for
> possibly extensibility, as they're tightly coupled (e.g. quantiles,
> distributions).
>

My inclination is that keeping this set relatively small and fixed to a set
that can be readily exported to external monitoring systems is more useful
than the added indirection to support extensibility.  Lumping together
seems reasonable.


>
>
>> units: SECONDS
>> description: "Times my stuff"
>>
>
> Are both of these optional metadata, in the form of key-value field, for
> flattened into the field itself (along with every other kind of metadata
> you may want to attach)?
>

Optional metadata in the form of fixed fields.  Is there a use case for
arbitrary metadata?  What would you do with it when exporting?


>
>
>> }
>>
>> Then metrics submitted would look like:
>> {
>> name: "my_timer"
>> labels: {"ptransform": "MyTransform"}
>> int_value: 100
>> }
>>
>
> Yes, or value could be a bytes field that is encoded according to
> [value_]type above, if we want that extensibility (e.g. if we want to
> bundle the pardo sub-timings together, we'd need a proto for the value, but
> that seems to specific to hard code into the basic structure).
>
>
The simplicity coming from the fact that there's only one proto format for
>> the spec and for the value.  The only thing that varies are the entries in
>> the map and the value field set.  It's pretty easy to establish contracts
>> around this type of spec and even generate protos for use the in SDK that
>> make the expectations explicit.
>>
>>
>> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles <[email protected]> wrote:
>>>
>>>>
>>>> Or just "beam:counter:<namespace>:<name>" or even
>>>> "beam:metric:<namespace>:<name>" since metrics have a type separate from
>>>> their name.
>>>>
>>>
>>> I proposed keeping the "user" in there to avoid possible clashes with
>>> the system namespaces. (No preference on counter vs. metric, I wasn't
>>> trying to imply counter = SumInts)
>>>
>>>
>>> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler <[email protected]>
>>> wrote:
>>>
>>>> I like the generalization from entity -> labels.  I view the purpose of
>>>> those fields to provide context.  And labels feel like they supports a
>>>> richer set of contexts.
>>>>
>>>
>>> If we think such a generalization provides value, I'm fine with doing
>>> that now, as sets or key-value maps, if we have good enough examples to
>>> justify this.
>>>
>>>
>>>> The URN concept gets a little tricky.  I totally agree that the context
>>>> fields should not be embedded in the name.
>>>> There's a "name" which is the identifier that can be used to
>>>> communicate what context values are supported / allowed for metrics with
>>>> that name (for example, element_count expects a ptransform ID).  But then
>>>> there's the context.  In Stackdriver, this context is a map of key-value
>>>> pairs; the type is considered metadata associated with the name, but not
>>>> communicated with the value.
>>>>
>>>
>>> I'm not quite following you here. If context contains a ptransform id,
>>> then it cannot be associated with a single name.
>>>
>>>
>>>> Could the URN be "beam:namespace:name" and every metric have a map of
>>>> key-value pairs for context?
>>>>
>>>
>>> The URN is the name. Something like
>>> "beam:metric:ptransform_execution_times:v1."
>>>
>>>
>>>> Not sure where this fits in the discussion or if this is handled
>>>> somewhere, but allowing for a metric configuration that's provided
>>>> independently of the value allows for configuring "type", "units", etc in a
>>>> uniform way without having to encode them in the metric name / value.
>>>> Stackdriver expects each metric type has been configured ahead of time with
>>>> these annotations / metadata.  Then values are reported separately.  For
>>>> system metrics, the definitions can be packaged with the SDK.  For user
>>>> metrics, they'd be defined at runtime.
>>>>
>>>
>>> This feels like the metrics spec, that specifies that the metric with
>>> name/URN X has this type plus a bunch of other metadata (e.g. units, if
>>> they're not implicit in the type? This gets into whether the type should be
>>> Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} +
>>> units metadata).
>>>
>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to