Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Robert Bradshaw Fri, 13 Apr 2018 17:00:49 -0700

On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler <[email protected]> wrote:


> Thanks, Robert!
>
> I think my lack of clarity is around the MetricSpec.  Maybe what's in my
> head and what's being proposed are the same thing.  When I read that the
> MetricSpec describes the proto structure, that sound kind of complicated to
> me.  But I may be misinterpreting it.  What I picture is something like a
> MetricSpec that looks like (note: my picture looks a lot like Stackdriver
> :):
>
> {
> name: "my_timer"
>

name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep
requiring namespaces). Or "beam:metric:[some non-user designation]"

labels: { "ptransform" }
>

How does an SDK act on this information?


> type: GAUGE
> value_type: int64
>

I was lumping type and value_type into the same field, as a urn for
possibly extensibility, as they're tightly coupled (e.g. quantiles,
distributions).


> units: SECONDS
> description: "Times my stuff"
>

Are both of these optional metadata, in the form of key-value field, for
flattened into the field itself (along with every other kind of metadata
you may want to attach)?


> }
>
> Then metrics submitted would look like:
> {
> name: "my_timer"
> labels: {"ptransform": "MyTransform"}
> int_value: 100
> }
>

Yes, or value could be a bytes field that is encoded according to
[value_]type above, if we want that extensibility (e.g. if we want to
bundle the pardo sub-timings together, we'd need a proto for the value, but
that seems to specific to hard code into the basic structure).


> The simplicity coming from the fact that there's only one proto format for
> the spec and for the value.  The only thing that varies are the entries in
> the map and the value field set.  It's pretty easy to establish contracts
> around this type of spec and even generate protos for use the in SDK that
> make the expectations explicit.
>
>
> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles <[email protected]> wrote:
>>
>>>
>>> Or just "beam:counter:<namespace>:<name>" or even
>>> "beam:metric:<namespace>:<name>" since metrics have a type separate from
>>> their name.
>>>
>>
>> I proposed keeping the "user" in there to avoid possible clashes with the
>> system namespaces. (No preference on counter vs. metric, I wasn't trying to
>> imply counter = SumInts)
>>
>>
>> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler <[email protected]>
>> wrote:
>>
>>> I like the generalization from entity -> labels.  I view the purpose of
>>> those fields to provide context.  And labels feel like they supports a
>>> richer set of contexts.
>>>
>>
>> If we think such a generalization provides value, I'm fine with doing
>> that now, as sets or key-value maps, if we have good enough examples to
>> justify this.
>>
>>
>>> The URN concept gets a little tricky.  I totally agree that the context
>>> fields should not be embedded in the name.
>>> There's a "name" which is the identifier that can be used to communicate
>>> what context values are supported / allowed for metrics with that name (for
>>> example, element_count expects a ptransform ID).  But then there's the
>>> context.  In Stackdriver, this context is a map of key-value pairs; the
>>> type is considered metadata associated with the name, but not communicated
>>> with the value.
>>>
>>
>> I'm not quite following you here. If context contains a ptransform id,
>> then it cannot be associated with a single name.
>>
>>
>>> Could the URN be "beam:namespace:name" and every metric have a map of
>>> key-value pairs for context?
>>>
>>
>> The URN is the name. Something like
>> "beam:metric:ptransform_execution_times:v1."
>>
>>
>>> Not sure where this fits in the discussion or if this is handled
>>> somewhere, but allowing for a metric configuration that's provided
>>> independently of the value allows for configuring "type", "units", etc in a
>>> uniform way without having to encode them in the metric name / value.
>>> Stackdriver expects each metric type has been configured ahead of time with
>>> these annotations / metadata.  Then values are reported separately.  For
>>> system metrics, the definitions can be packaged with the SDK.  For user
>>> metrics, they'd be defined at runtime.
>>>
>>
>> This feels like the metrics spec, that specifies that the metric with
>> name/URN X has this type plus a bunch of other metadata (e.g. units, if
>> they're not implicit in the type? This gets into whether the type should be
>> Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} +
>> units metadata).
>>
>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to