On Thu, Feb 25, 2016 at 5:23 PM, Vasiliki Kalavri
<vasilikikala...@gmail.com> wrote:
> - HA: tested on a 6-node cluster with 2 masters.
> Issues:
> 1. After new leader election, the job history is cleaned up (at least in
> the WebUI). Is this on purpose?

Yes, the job history is part of the job manager.

> 2. After cluster restart, the jobmanager remembers and tries to re-submit
> previously failed resubmissions.
> This is one is a bit tricky:
> I had a batch job running and killed the master. After the new master took
> over, job resubmission failed because the HDFS output directory already
> existed. After re-starting the whole cluster and removing the HDFS
> directory, the new jobmanager re-submitted the previously failed batch job.

I think for this you have to set the write mode to overwrite at the moment.

> 3. Upon starting the cluster I get the following warning message "[WARNING]
> 1 instance(s) of jobmanager are already running", when jps shows no
> existing jobmanager process.

This is part of the bash script. It currently checks a PID file to
determine the running processes, but it does not actually check
whether the PIDs are valid or not. I think it's a good idea to
actually check this. Let me open an issue for this...

Reply via email to