Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Kenneth Knowles Fri, 13 Apr 2018 13:32:50 -0700

On Fri, Apr 13, 2018 at 1:27 PM Robert Bradshaw <[email protected]> wrote:


> On Fri, Apr 13, 2018 at 1:19 PM Kenneth Knowles <[email protected]> wrote:
>
>> On Fri, Apr 13, 2018 at 1:07 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> Also, the only use for payloads is because "User Counter" is currently a
>>> single URN, rather than using the namespacing characteristics of URNs to
>>> map user names onto distinct metric names.
>>>
>>
>> Can they be URNs? I don't see value in having a "user metric" URN where
>> you then have to look elsewhere for what the real name is.
>>
>
> Yes, that was my point with the parenthetical statement. I would rather
> have "beam:counter:user:use_provide_namespace:user_provide_name" than use
> the payload field for this. So if we're going to keep the payload field, we
> need more compelling usecases.
>

Or just "beam:counter:<namespace>:<name>" or even
"beam:metric:<namespace>:<name>" since metrics have a type separate from
their name.

Kenn


> A payload avoids the messiness of having to pack (and parse) arbitrary
>> parameters into a name though.) If we're going to choose names that the
>> system and sdks agree to have specific meanings, and to avoid accidental
>> collisions, making them full-fledged documented URNs has value.
>>
>
>> Value is the "payload". Likely worth changing the name to avoid confusion
>> with the payload above. It's bytes because it depends on the type. I would
>> try to avoid nesting it too deeply (e.g. a payload within a payload). If we
>> thing the types are generally limited, another option would be a oneof
>> field (with a bytes option just in case) for transparency. There are pros
>> and cons going this route.
>>
>> Type is what I proposed we add, instead of it being implicit in the name
>> (and unknowable if one does not recognize the name). This makes things more
>> open-ended and easier to evolve and work with.
>>
>> Entity could be generalized to Label, or LabelSet if desired. But as
>> mentioned I think it makes sense to pull this out as a separate field,
>> especially when it makes sense to aggregate a single named counter across
>> labels as well as for a single label (e.g. execution time of composite
>> transforms).
>>
>> - Robert
>>
>>
>>
>> On Fri, Apr 13, 2018 at 12:36 PM Andrea Foegler <[email protected]>
>> wrote:
>>
>>> Hi folks -
>>>
>>> Before we totally go down the path of highly structured metric protos,
>>> I'd like to propose considering a simple metrics interface between the SDK
>>> and the runner.  Something more generic and closer to what most monitoring
>>> systems would use.
>>>
>>> To use Spark as an example, the Metric system uses a simple metric
>>> format of name, value and type to report all metrics in a single structure,
>>> regardless of the source or context of the metric.
>>>
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MetricsSystem.html
>>>
>>> The subsystems have contracts for what metrics they will expose and how
>>> they are calculated:
>>>
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-taskmetrics-ShuffleWriteMetrics.html
>>>
>>> Codifying the system metrics in the SDK seems perfectly reasonable - no
>>> reason to make the notion of metric generic at that level.  But at the
>>> point the metric is leaving the SDK and going to the runner, a simpler,
>>> generic encoding of the metrics might make it easier to adapt and maintain
>>> system.  The generic format can include information about downstream
>>> consumers, if that's useful.
>>>
>>> Spark supports a number of Metric Sinks - external monitoring systems.
>>> If runners receive a simple list of metrics, implementing any number of
>>> Sinks for Beam would be straightforward and would generally be a one time
>>> implementation.  If instead all system metrics are sent embedded in a
>>> highly structured, semantically meaningful structure, runner code would
>>> need to be updated to support exporting the new metric. We seem to be
>>> heading in the direction of "if you don't understand this metric, you can't
>>> use it / export it".  But most systems seem to assume metrics are really
>>> simple named values that can be handled a priori.
>>>
>>> So I guess my primary question is:  Is it necessary for Beam to treat
>>> metrics as highly semantic, arbitrarily complex data?  Or could they
>>> possibly be the sort of simple named values as they are in most monitoring
>>> systems and in Spark?  With the SDK potentially providing scaffolding to
>>> add meaning and structure, but simplifying that out before leaving SDK
>>> code.  Is the coupling to a semantically meaningful structure between the
>>> SDK and runner and necessary complexity?
>>>
>>> Andrea
>>>
>>>
>>>
>>> On Fri, Apr 13, 2018 at 10:20 AM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> On Fri, Apr 13, 2018 at 10:10 AM Alex Amato <[email protected]> wrote:
>>>>
>>>>>
>>>>> *Thank you for this clarification. I think the table of files fits
>>>>> into the model as one of type string-set (with union as aggregation). *
>>>>> Its not a list of files, its a list of metadata for each file, several
>>>>> pieces of data per file.
>>>>>
>>>>> Are you proposing that there would be separate URNs as well for each
>>>>> entity being measured then, so the the URN defines the type of entity 
>>>>> being
>>>>> measured.
>>>>> "urn.beam.metrics.PCollectionByteCount" is a URN for always for
>>>>> PCollection entities
>>>>> "urn.beam.metrics.PTransformExecutionTime" is a URN is always for
>>>>> PTransform entities
>>>>>
>>>>
>>>> Yes. FWIW, it may not even be needed to put this in the name, e.g.
>>>> execution times are never for PCollections, and even if they were it'd be
>>>> semantically a very different beast (which should not re-use the same URN).
>>>>
>>>> *message MetricSpec {*
>>>>> *  // (Required) A URN that describes the accompanying payload.*
>>>>> *  // For any URN that is not recognized (by whomever is inspecting*
>>>>> *  // it) the parameter payload should be treated as opaque and*
>>>>> *  // passed as-is.*
>>>>> *  string urn = 1;*
>>>>>
>>>>> *  // (Optional) The data specifying any parameters to the URN. If*
>>>>> *  // the URN does not require any arguments, this may be omitted.*
>>>>> *  bytes parameters_payload = 2;*
>>>>>
>>>>> *  // (Required) A URN that describes the type of values this metric*
>>>>> *  // records (e.g. durations that should be summed).*
>>>>> *}*
>>>>>
>>>>> *message Metric[Values] {*
>>>>> * // (Required) The original requesting MetricSpec.*
>>>>> * MetricSpec metric_spec = 1;*
>>>>>
>>>>> * // A mapping of entities to (encoded) values.*
>>>>> * map<string, bytes> values;*
>>>>> This ignores the non-unqiueness of entity identifiers. This is why in
>>>>> my doc, I have specified the entity type and its string identifier
>>>>> @Ken, I believe you have pointed this out in the past, that uniqueness
>>>>> is only guaranteed within a type of entity (all PCollections), but not
>>>>> between entities (A Pcollection and PTransform may have the same
>>>>> identifier).
>>>>>
>>>>
>>>> See above for why this is not an issue. The extra complexity (in protos
>>>> and code), the inability to use them as map keys, and the fact that they'll
>>>> be 100% redundant for all entities for a given metric convinces me that
>>>> it's not worth creating and tracking an enum for the type alongside the id.
>>>>
>>>>
>>>>> *}*
>>>>>
>>>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> To Robert's proto:
>>>>>>>
>>>>>>>  // A mapping of entities to (encoded) values.
>>>>>>>>  map<string, bytes> values;
>>>>>>>>
>>>>>>>
>>>>>>> Are the keys here the names of the metrics, aka what is used for
>>>>>>> URNs in the doc?
>>>>>>>
>>>>>>>>
>>>>>> They're the entities to which a metric is attached, e.g. a
>>>>>> PTransform, a PCollection, or perhaps a process/worker.
>>>>>>
>>>>>>
>>>>>>> }
>>>>>>>>
>>>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I was
>>>>>>>>>> going to bring here. Let's keep it simple and use concrete use cases 
>>>>>>>>>> to
>>>>>>>>>> drive additional abstraction if/when it becomes compelling.
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics of
>>>>>>>>>>> supported type" didn't include new ways of aggregating ints. As 
>>>>>>>>>>> long as
>>>>>>>>>>> that means we have a fixed set of aggregations (that align with 
>>>>>>>>>>> what what
>>>>>>>>>>> users want and metrics back end support) it seems like we are doing 
>>>>>>>>>>> user
>>>>>>>>>>> metrics right.
>>>>>>>>>>>
>>>>>>>>>>> - Ben
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are
>>>>>>>>>>>> used a lot but others are less mainstream so being too fine from 
>>>>>>>>>>>> the start
>>>>>>>>>>>> can just add complexity and bugs in impls IMHO.
>>>>>>>>>>>>
>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" <[email protected]>
>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including
>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a 
>>>>>>>>>>>>> type, as
>>>>>>>>>>>>> would double-distribution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> When you say type do you mean accumulator type, result type,
>>>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of 
>>>>>>>>>>>>>> sumint,
>>>>>>>>>>>>>> sumlong, meanlong, etc?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and
>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road 
>>>>>>>>>>>>>>> (and may
>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom 
>>>>>>>>>>>>>>> metrics of
>>>>>>>>>>>>>>> standard type.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric
>>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to 
>>>>>>>>>>>>>>>> justify support.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back?
>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus 
>>>>>>>>>>>>>>>> any special
>>>>>>>>>>>>>>>> system metrivs?
>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating
>>>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would 
>>>>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field 
>>>>>>>>>>>>>>>>> "type" which is
>>>>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents 
>>>>>>>>>>>>>>>>> of its
>>>>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or 
>>>>>>>>>>>>>>>>> maxing over ints
>>>>>>>>>>>>>>>>> would be a typical example. Though we could pursue making 
>>>>>>>>>>>>>>>>> this opaque to
>>>>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and 
>>>>>>>>>>>>>>>>> difficult)
>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least 
>>>>>>>>>>>>>>>>> aggregate and
>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself 
>>>>>>>>>>>>>>>>> understand the
>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the 
>>>>>>>>>>>>>>>>> specialization
>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as 
>>>>>>>>>>>>>>>>> well.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for
>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric and 
>>>>>>>>>>>>>>>>> it spec would
>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far.
>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics
>>>>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of 
>>>>>>>>>>>>>>>>>> the protos a
>>>>>>>>>>>>>>>>>> little bit.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the
>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom 
>>>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>>>> functions for its metric updates.
>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated
>>>>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an 
>>>>>>>>>>>>>>>>>> approach which
>>>>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without 
>>>>>>>>>>>>>>>>>> deserializing them. These
>>>>>>>>>>>>>>>>>> are forwarded to user provided code which then would 
>>>>>>>>>>>>>>>>>> deserialize the metric
>>>>>>>>>>>>>>>>>> update payloads and perform the custom aggregations.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric
>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform 
>>>>>>>>>>>>>>>>>> names inside
>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the 
>>>>>>>>>>>>>>>>>> metrics as the
>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw
>>>>>>>>>>>>>>>>>> attention to this particular revision.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make
>>>>>>>>>>>>>>>>>>> a decision by Friday, and begin working on related PRs next 
>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before
>>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this 
>>>>>>>>>>>>>>>>>>> thread Friday
>>>>>>>>>>>>>>>>>>> afternoon.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it
>>>>>>>>>>>>>>>>>>>> easily in
>>>>>>>>>>>>>>>>>>>> future discussions, website, etc.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for sharing.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments.
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Hello beam community,
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this
>>>>>>>>>>>>>>>>>>>> proposal so far. I
>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback.
>>>>>>>>>>>>>>>>>>>> There were some larger
>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of
>>>>>>>>>>>>>>>>>>>> these I have added a
>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my
>>>>>>>>>>>>>>>>>>>> recommendation as well
>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised
>>>>>>>>>>>>>>>>>>>> proposal. Please take
>>>>>>>>>>>>>>>>>>>> >> another look and let me know
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please
>>>>>>>>>>>>>>>>>>>> take another look after
>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well.
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >> Thanks again,
>>>>>>>>>>>>>>>>>>>> >> Alex
>>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics

Reply via email to