Hi Robert,
A lot of task metrics are already available for individual tasks. You can
get these programmatically by registering a SparkListener, and you van also
view them in the UI. Eg., for each task, you can see runtime,
serialization time, amount of shuffle data read, etc. I'm working on als
Guys,
Do you have any thoughts on this ?
Thanks,Robert
On Sunday, April 12, 2015 5:35 PM, Grandl Robert
wrote:
Hi guys,
I was trying to figure out some counters in Spark, related to the amount of CPU
or Memory used (in some metric), used by a task/stage/job, but I could not find
this is more-or-less the best you can do now, but as has been pointed out,
accumulators don't quite fit the bill for counters. There is an open issue
to do something better, but no progress on that so far
https://issues.apache.org/jira/browse/SPARK-603
On Fri, Feb 13, 2015 at 11:12 AM, Mark Hams
Except that transformations don't have an exactly-once guarantee, so this
way of doing counters may produce different answers across various forms of
failures and speculative execution.
On Fri, Feb 13, 2015 at 8:56 AM, Sean McNamara
wrote:
> .map is just a transformation, so no work will actual
.map is just a transformation, so no work will actually be performed until
something takes action against it. Try adding a .count(), like so:
inputRDD.map { x => {
counter += 1
} }.count()
In case it is helpful, here are the docs on what exactly the transformations
and actions are:
htt