Hi all,

We are using Grafana/Influxdb for metrics and current Samza's model does
not fit it particularly well.
Influxdb recently introduced so called "tags" and Grafana UI offers gret
value when using them. The idea is to keep metric name very simple, for
example cpu.use, and supply the measure with tags, for example {datacenter:
vegas,  environment: staging, machine: vm-003, application: myApp}
>From what I can read, OpenTSDB use tags too.

Having tags instead of long metric names is much more convenient and in
some cases the only way to perform some desired operations. For example, I
want to have an alert for throughput of samza job. With tags encoded in
metric name it is impossible because I would have to have a list of all
machine names and samza job names in influxdb select statement, and even
after that, there is no way to group them properly. With tags it is as
simple as SELECT ... GROUP BY [[tag_host]],[[tag_samza_job_name]]. You can
add new machines to the cluster and jobs to yarn, and they will appear with
zero configuration effort in your metrics.

Currently, I partially mitigated the issue by ripping out 1st part of
metric name (dot-separated parts) and making it "samza-src" tag, with the
assumption that it is going to be container name. But in many metrics,
partition number is encoded as part of metric name too. Its location is not
consistent and not all metrics have it, I can not build alerting system on
top of samza metrics.

Proposal:
Change samza internal metrics to use tags (string key-value pairs) and
leave the job of constructing metric name to the output metric plugin.
This would allow to preserve backward compatibility and JMX reporter would
construct metric name the same it is today, but Influxdb plugin would not
modify the name and add list of tags to the measure.

If this approach seems reasonable to the core team, I could work on the
patch.

Thanks,
Vadym.


-- 
>From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
explicitly specified

Reply via email to