Take a look at the SparkListener API included in Spark, you can use it to capture various events. There’s also this pull request: https://github.com/apache/spark/pull/42 that will persist application logs and let you rebuild the web UI after the app runs. It uses the same API to log events.
Matei On Mar 17, 2014, at 7:35 AM, Roman Pastukhov <metaignat...@gmail.com> wrote: > Hi. > > We're thinking about writing a tool that would read Spark logs and output > cache contents at some point in time (e.g. if you want to see what data fills > the cache and whether some of it may be unpersisted to improve performance). > > Are there similar projects that already exist? Is there a list of > Spark-related tools? There is Spark debugger/SRD > (https://github.com/mesos/spark/wiki/Spark-Debugger, > http://spark-replay-debugger-overview.readthedocs.org/en/latest/) but I > couldn't find any links to them on the Spark project site.