In these scenarios it's fairly standard to report the metrics either
directly or through accumulators (
http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka)
to a time series database such as Graphite (http://graphite.wikidot.com/)
or OpenTSDB (http://opentsdb.ne
Hi,
My limited understanding of Spark tells me that a task is the least
possible working unit and Spark itself won't give you much. It
wouldn't expect so since "acount" is a business entity not Spark's
one.
What about using mapPartitions* to know the details of partitions and
do whatever you want