Thanks. That seems to work great, except EMR doesn't always copy the logs
to S3. The behavior seems inconsistent and I am debugging it now.
On Fri, Mar 31, 2017 at 7:46 AM, Vadim Semenov
wrote:
> You can provide your own log directory, where Spark log will be saved, and
> that you could replay
Modifying spark.eventLog.dir to point to a S3 path, you will encounter the
following exception in Spark history log on path:
/var/log/spark/spark-history-server.out
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
a
You can provide your own log directory, where Spark log will be saved, and
that you could replay afterwards.
Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
run it.
Note! The path `s3://bucket/some/directory` must exist before you run your
job, it'll not be created automa
I am looking for tips on evaluating my Spark job after it has run.
I know that right now I can look at the history of jobs through the web ui.
I also know how to look at the current resources being used by a similar
web ui.
However, I would like to look at the logs after the job is finished to
ev