I see and don't know how to help you beyond what your already
suggesting. From what I remember, maps were added as syntactic sugar of
lists of key value pairs.
On Tue, Oct 30, 2018 at 5:37 PM Alex Amato <[email protected]
<mailto:[email protected]>> wrote:
I am not sure on the correct syntax to populate the instances of my
MonitoringInfoSpec messages
message MonitoringInfoSpec {
string urn = 1;
string type_urn = 2;
repeated string required_labels = 3;
*map<string, string> annotations = 4;*
}
Notice how the annotations field is not used anywhere. I was unable
to get this to compile and could find no examples of this on the
proto github. Perhaps I'll have to reach out to them. I was
wondering if anyone here was familiar first.
message MonitoringInfoSpecs {
enum MonitoringInfoSpecsEnum {
USER_COUNTER = 0 [(monitoring_info_spec) = {
urn: "beam:metric:user",
type_urn: "beam:metrics:sum_int_64",
}];
ELEMENT_COUNT = 1 [(monitoring_info_spec) = {
urn: "beam:metric:element_count:v1",
type_urn: "beam:metrics:sum_int_64",
required_labels: ["PTRANSFORM"],
}];
START_BUNDLE_MSECS = 2 [(monitoring_info_spec) = {
urn: "beam:metric:pardo_execution_time:start_bundle_msecs:v1",
type_urn: "beam:metrics:sum_int_64",
required_labels: ["PTRANSFORM"],
}];
PROCESS_BUNDLE_MSECS = 3 [(monitoring_info_spec) = {
urn: "beam:metric:pardo_execution_time:process_bundle_msecs:v1",
type_urn: "beam:metrics:sum_int_64",
required_labels: ["PTRANSFORM"],
}];
FINISH_BUNDLE_MSECS = 4 [(monitoring_info_spec) = {
urn: "beam:metric:pardo_execution_time:finish_bundle_msecs:v1",
type_urn: "beam:metrics:sum_int_64",
required_labels: ["PTRANSFORM"],
}];
TOTAL_MSECS = 5 [(monitoring_info_spec) = {
urn: "beam:metric:ptransform_execution_time:total_msecs:v1",
type_urn: "beam:metrics:sum_int_64",
required_labels: ["PTRANSFORM"],
}];
}
}
On Tue, Oct 30, 2018 at 2:01 PM Lukasz Cwik <[email protected]
<mailto:[email protected]>> wrote:
I'm not sure what you mean by "Using a map in an option."
For your second issue, the google docs around this show[1]:
Note that if you want to use a custom option in a package other
than the one in which it was defined, you must prefix the option
name with the package name, just as you would for type names.
For example:
// foo.proto
import "google/protobuf/descriptor.proto";
package foo;
extend google.protobuf.MessageOptions {
optional string my_option = 51234;
}
// bar.proto
import "foo.proto";
package bar;
message MyMessage {
option (foo.my_option) = "Hello world!";
}
1:
https://developers.google.com/protocol-buffers/docs/proto#customoptions
On Mon, Oct 29, 2018 at 5:19 PM Alex Amato <[email protected]
<mailto:[email protected]>> wrote:
Hi Robert and community, :)
I was starting to code this up, but I wasn't sure exactly
how to do some of the proto syntax. Would you mind taking a
look at this doc
<https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>
and let me know if you know how to resolve any of these issues:
* Using a map in an option.
* Referring to string "constants" from other enum options
in .proto files.
Please see the comments I have listed in the doc
<https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>,
and a few alternative suggestions.
Thanks
On Wed, Oct 24, 2018 at 10:08 AM Alex Amato
<[email protected] <mailto:[email protected]>> wrote:
Okay. That makes sense. Using runtime validation and
protos is what I was thinking as well.
I'll include you as a reviewer in my PRs.
As for the choice of a builder/constructor/factory. If
we go with factory methods/constructor then we will need
a method for each metric type (intCounter, latestInt64,
...). Plus, then I don't think we can have any
constructor parameters for labels, we will just need to
accept a dictionary for label keys+values like you say.
This is because we cannot offer a method for each URN
and its combination of labels, and we should avoid such
static detection, as you say.
The builder however, allows us to add a method for
setting each label. Since there are a small number of
labels I think this should be fine. A specific metric
URN will have a specific set of labels which we can
validate being set. Any variant of this should use a
different label (or a new version in the label). Thus,
the builder can give an advantage, making it easier to
set label values without needing to provide the correct
key for the label, when its set. You just need to call
the correct method.
Perhaps it might be best to leave this open to each SDK
based on its language style (Builder, Factory, etc.) ,
but make sure we use the protos+runtime validation.
On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw
<[email protected] <mailto:[email protected]>> wrote:
Thanks for bringing this to the list; it's a good
question.
I think the difficulty comes from trying to
statically define a lists
of possibilities that should instead be runtime
values. E.g. we
currently we're up to about a dozen distinct types,
and having a
setter for each is both verbose and not very
extensible (especially
replicated cross language). I'm not sure the set of
possible labels is
fixed either. Generally in the FnAPI we've been
using shared constants
for this kind of thing instead. (I was wary about
the protos for the
same reasons; it would be good to avoid leaking this
through to each
of the various SDKs.) The amount of static
typing/validation one gets
depends on how much logic you build into each of
these methods (e.g.
does it (almost?) always "metric" which is assumed
to already be
encoded correctly, or a specific type that has
tradeoffs with the
amount you can do generically (e.g. we have code
that first loops over
counters, then over distributions, then over gauges,
and I don't think
we want continue that pattern all M places for all N
types)).
I would probably lean towards mostly doing runtime
validation here.
Specifically, one would have a data file that
defines, for each known
URN, its type along with the set of
permitted/expected/required
labels. On construction a MonitoringInfo data point,
one would provide
a URN + value + labelMap, and optionally a type. On
construction, the
constructor (method, factory, whatever) would look
up the URN to
determine the type (or throw an error if it's both
not known and not
provided), which could be then used to fetch an
encoder of sorts (that
can go from value <-> proto encoding, possibly with
some validation).
If labels and/or annotations are provided and the
URN is known, we can
validate these as well.
As for proto/enums vs. yaml, the former is nice
because code
generation comes for free, but has turned out to be
much more verbose
(both the definition and the use) and I'm still on
the fence whether
it's a net win.
(Unfortunately AutoValue won't help much here, as
the ultimate goal is
to wrap a very nested proto OneOf, ideally with some
validation.
(Also, in Python, generally, having optional, named
arguments makes
this kind of builder pattern less needed.))
On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles
<[email protected] <mailto:[email protected]>> wrote:
>
> FWIW AutoValue will build most of that class for
you, if it is as you say.
>
> Kenn
>
> On Tue, Oct 23, 2018 at 6:04 PM Alex Amato
<[email protected] <mailto:[email protected]>> wrote:
>>
>> Hi Robert + beam dev list,
>>
>> I was thinking about your feedback in PR#6205,
and agree that this monitoring_infos.py became a bit
big.
>>
>> I'm working on the Java Implementation of this
now, and want to incorporate some of these ideas and
improve on this.
>>
>> I that that I should make something like a
MonitoringInfoBuilder class. With a few methods
>>
>> setUrn
>> setTimestamp
>> setting the value (One method for each Type we
support. Setting this will also set the type string)
>>
>> setInt64CounterValue
>> setDoubleCounterValue
>> setLatestInt64
>> setTopNInt64
>> setMonitoringDataTable
>> setDistributionInt64
>> ...
>>
>> setting labels (will set the key and value
properly for the label)
>>
>> setPTransform(value)
>> setPcollection(value)
>> ...
>>
>>
>> I think this will make building a metric much
easier, you would just call the 4 methods and the
.build(). These builders are common in Java. (I
guess there is a similar thing way we could do in
python? I'd like to go back and refactor that as
well when I am done)
>>
>> -------------
>>
>> As for your other suggestion to define metrics
in the proto/enum file instead of the yaml file. I
am not too sure about the best strategy for this. My
initial thoughts are:
>>
>> Make a proto extension allowing you to
describe/define a MonitoringInfo's (the same info as
the metric_definitions.yaml file):
>>
>> URN
>> Type
>> Labels required
>> Annotations: Description, Units, etc.
>>
>> Make the builder read in that MonitoringInfo
definision/description assert everything is set
properly? I think this would be a decent data driven
approach.
>>
>> I was wondering if you had something else in mind?
>>
>> Thanks
>> Alex
>>
>>