Hi Robert and community, :) I was starting to code this up, but I wasn't sure exactly how to do some of the proto syntax. Would you mind taking a look at this doc <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing> and let me know if you know how to resolve any of these issues:
- Using a map in an option. - Referring to string "constants" from other enum options in .proto files. Please see the comments I have listed in the doc <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>, and a few alternative suggestions. Thanks On Wed, Oct 24, 2018 at 10:08 AM Alex Amato <[email protected]> wrote: > Okay. That makes sense. Using runtime validation and protos is what I was > thinking as well. > I'll include you as a reviewer in my PRs. > > As for the choice of a builder/constructor/factory. If we go with factory > methods/constructor then we will need a method for each metric type > (intCounter, latestInt64, ...). Plus, then I don't think we can have any > constructor parameters for labels, we will just need to accept a dictionary > for label keys+values like you say. This is because we cannot offer a > method for each URN and its combination of labels, and we should avoid such > static detection, as you say. > > The builder however, allows us to add a method for setting each label. > Since there are a small number of labels I think this should be fine. A > specific metric URN will have a specific set of labels which we can > validate being set. Any variant of this should use a different label (or a > new version in the label). Thus, the builder can give an advantage, making > it easier to set label values without needing to provide the correct key > for the label, when its set. You just need to call the correct method. > > Perhaps it might be best to leave this open to each SDK based on its > language style (Builder, Factory, etc.) , but make sure we use the > protos+runtime validation. > > On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw <[email protected]> > wrote: > >> Thanks for bringing this to the list; it's a good question. >> >> I think the difficulty comes from trying to statically define a lists >> of possibilities that should instead be runtime values. E.g. we >> currently we're up to about a dozen distinct types, and having a >> setter for each is both verbose and not very extensible (especially >> replicated cross language). I'm not sure the set of possible labels is >> fixed either. Generally in the FnAPI we've been using shared constants >> for this kind of thing instead. (I was wary about the protos for the >> same reasons; it would be good to avoid leaking this through to each >> of the various SDKs.) The amount of static typing/validation one gets >> depends on how much logic you build into each of these methods (e.g. >> does it (almost?) always "metric" which is assumed to already be >> encoded correctly, or a specific type that has tradeoffs with the >> amount you can do generically (e.g. we have code that first loops over >> counters, then over distributions, then over gauges, and I don't think >> we want continue that pattern all M places for all N types)). >> >> I would probably lean towards mostly doing runtime validation here. >> Specifically, one would have a data file that defines, for each known >> URN, its type along with the set of permitted/expected/required >> labels. On construction a MonitoringInfo data point, one would provide >> a URN + value + labelMap, and optionally a type. On construction, the >> constructor (method, factory, whatever) would look up the URN to >> determine the type (or throw an error if it's both not known and not >> provided), which could be then used to fetch an encoder of sorts (that >> can go from value <-> proto encoding, possibly with some validation). >> If labels and/or annotations are provided and the URN is known, we can >> validate these as well. >> >> As for proto/enums vs. yaml, the former is nice because code >> generation comes for free, but has turned out to be much more verbose >> (both the definition and the use) and I'm still on the fence whether >> it's a net win. >> >> (Unfortunately AutoValue won't help much here, as the ultimate goal is >> to wrap a very nested proto OneOf, ideally with some validation. >> (Also, in Python, generally, having optional, named arguments makes >> this kind of builder pattern less needed.)) >> >> On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles <[email protected]> wrote: >> > >> > FWIW AutoValue will build most of that class for you, if it is as you >> say. >> > >> > Kenn >> > >> > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato <[email protected]> wrote: >> >> >> >> Hi Robert + beam dev list, >> >> >> >> I was thinking about your feedback in PR#6205, and agree that this >> monitoring_infos.py became a bit big. >> >> >> >> I'm working on the Java Implementation of this now, and want to >> incorporate some of these ideas and improve on this. >> >> >> >> I that that I should make something like a MonitoringInfoBuilder >> class. With a few methods >> >> >> >> setUrn >> >> setTimestamp >> >> setting the value (One method for each Type we support. Setting this >> will also set the type string) >> >> >> >> setInt64CounterValue >> >> setDoubleCounterValue >> >> setLatestInt64 >> >> setTopNInt64 >> >> setMonitoringDataTable >> >> setDistributionInt64 >> >> ... >> >> >> >> setting labels (will set the key and value properly for the label) >> >> >> >> setPTransform(value) >> >> setPcollection(value) >> >> ... >> >> >> >> >> >> I think this will make building a metric much easier, you would just >> call the 4 methods and the .build(). These builders are common in Java. (I >> guess there is a similar thing way we could do in python? I'd like to go >> back and refactor that as well when I am done) >> >> >> >> ------------- >> >> >> >> As for your other suggestion to define metrics in the proto/enum file >> instead of the yaml file. I am not too sure about the best strategy for >> this. My initial thoughts are: >> >> >> >> Make a proto extension allowing you to describe/define a >> MonitoringInfo's (the same info as the metric_definitions.yaml file): >> >> >> >> URN >> >> Type >> >> Labels required >> >> Annotations: Description, Units, etc. >> >> >> >> Make the builder read in that MonitoringInfo definision/description >> assert everything is set properly? I think this would be a decent data >> driven approach. >> >> >> >> I was wondering if you had something else in mind? >> >> >> >> Thanks >> >> Alex >> >> >> >> >> >
