Okay. That makes sense. Using runtime validation and protos is what I was
thinking as well.
I'll include you as a reviewer in my PRs.

As for the choice of a builder/constructor/factory. If we go with factory
methods/constructor then we will need a method for each metric type
(intCounter, latestInt64, ...). Plus, then I don't think we can have any
constructor parameters for labels, we will just need to accept a dictionary
for label keys+values like you say. This is because we cannot offer a
method for each URN and its combination of labels, and we should avoid such
static detection, as you say.

The builder however, allows us to add a method for setting each label.
Since there are a small number of labels I think this should be fine. A
specific metric URN will have a specific set of labels which we can
validate being set. Any variant of this should use a different label (or a
new version in the label). Thus, the builder can give an advantage, making
it easier to set label values without needing to provide the correct key
for the label, when its set. You just need to call the correct method.

Perhaps it might be best to leave this open to each SDK based on its
language style (Builder, Factory, etc.) , but make sure we use the
protos+runtime validation.

On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw <[email protected]> wrote:

> Thanks for bringing this to the list; it's a good question.
>
> I think the difficulty comes from trying to statically define a lists
> of possibilities that should instead be runtime values. E.g. we
> currently we're up to about a dozen distinct types, and having a
> setter for each is both verbose and not very extensible (especially
> replicated cross language). I'm not sure the set of possible labels is
> fixed either. Generally in the FnAPI we've been using shared constants
> for this kind of thing instead. (I was wary about the protos for the
> same reasons; it would be good to avoid leaking this through to each
> of the various SDKs.) The amount of static typing/validation one gets
> depends on how much logic you build into each of these methods (e.g.
> does it (almost?) always "metric" which is assumed to already be
> encoded correctly, or a specific type that has tradeoffs with the
> amount you can do generically (e.g. we have code that first loops over
> counters, then over distributions, then over gauges, and I don't think
> we want continue that pattern all M places for all N types)).
>
> I would probably lean towards mostly doing runtime validation here.
> Specifically, one would have a data file that defines, for each known
> URN, its type along with the set of permitted/expected/required
> labels. On construction a MonitoringInfo data point, one would provide
> a URN + value + labelMap, and optionally a type. On construction, the
> constructor (method, factory, whatever) would look up the URN to
> determine the type (or throw an error if it's both not known and not
> provided), which could be then used to fetch an encoder of sorts (that
> can go from value <-> proto encoding, possibly with some validation).
> If labels and/or annotations are provided and the URN is known, we can
> validate these as well.
>
> As for proto/enums vs. yaml, the former is nice because code
> generation comes for free, but has turned out to be much more verbose
> (both the definition and the use) and I'm still on the fence whether
> it's a net win.
>
> (Unfortunately AutoValue won't help much here, as the ultimate goal is
> to wrap a very nested proto OneOf, ideally with some validation.
> (Also, in Python, generally, having optional, named arguments makes
> this kind of builder pattern less needed.))
>
> On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles <[email protected]> wrote:
> >
> > FWIW AutoValue will build most of that class for you, if it is as you
> say.
> >
> > Kenn
> >
> > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato <[email protected]> wrote:
> >>
> >> Hi Robert + beam dev list,
> >>
> >> I was thinking about your feedback in PR#6205, and agree that this
> monitoring_infos.py became a bit big.
> >>
> >> I'm working on the Java Implementation of this now, and want to
> incorporate some of these ideas and improve on this.
> >>
> >> I that that I should make something like a MonitoringInfoBuilder class.
> With a few methods
> >>
> >> setUrn
> >> setTimestamp
> >> setting the value (One method for each Type we support. Setting this
> will also set the type string)
> >>
> >> setInt64CounterValue
> >> setDoubleCounterValue
> >> setLatestInt64
> >> setTopNInt64
> >> setMonitoringDataTable
> >> setDistributionInt64
> >> ...
> >>
> >> setting labels (will set the key and value properly for the label)
> >>
> >> setPTransform(value)
> >> setPcollection(value)
> >> ...
> >>
> >>
> >> I think this will make building a metric much easier, you would just
> call the 4 methods and the .build(). These builders are common in Java. (I
> guess there is a similar thing way we could do in python? I'd like to go
> back and refactor that as well when I am done)
> >>
> >> -------------
> >>
> >> As for your other suggestion to define metrics in the proto/enum file
> instead of the yaml file. I am not too sure about the best strategy for
> this. My initial thoughts are:
> >>
> >> Make a proto extension allowing you to describe/define a
> MonitoringInfo's (the same info as the metric_definitions.yaml file):
> >>
> >> URN
> >> Type
> >> Labels required
> >> Annotations: Description, Units, etc.
> >>
> >> Make the builder read in that MonitoringInfo definision/description
> assert everything is set properly? I think this would be a decent data
> driven approach.
> >>
> >> I was wondering if you had something else in mind?
> >>
> >> Thanks
> >> Alex
> >>
> >>
>

Reply via email to