Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Alex Rovner
In these scenarios it's fairly standard to report the metrics either directly or through accumulators ( http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka) to a time series database such as Graphite (http://graphite.wikidot.com/) or OpenTSDB (http://opentsdb.ne

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Jacek Laskowski
Hi, My limited understanding of Spark tells me that a task is the least possible working unit and Spark itself won't give you much. It wouldn't expect so since "acount" is a business entity not Spark's one. What about using mapPartitions* to know the details of partitions and do whatever you want