I've had very good success troubleshooting this type of thing by using the
Spark Web UI, which will depict a breakdown of all tasks. This also
includes the RDDs being used, as well as any cached data. Additional
information about this tool can be found at
http://spark.apache.org/docs/latest/monitor
Hi,
Currently I am trying to optimize my spark application and in that
process, I am trying to figure out if at any stage in the code, I am
recomputing a large RDD (so that I can optimize it by
persisting/checkpointing it).
Is there any indication in the event logs that tells us about an RDD bein