On Thu, Feb 25, 2016 at 5:23 PM, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote: > - HA: tested on a 6-node cluster with 2 masters. > Issues: > 1. After new leader election, the job history is cleaned up (at least in > the WebUI). Is this on purpose?
Yes, the job history is part of the job manager. > 2. After cluster restart, the jobmanager remembers and tries to re-submit > previously failed resubmissions. > This is one is a bit tricky: > I had a batch job running and killed the master. After the new master took > over, job resubmission failed because the HDFS output directory already > existed. After re-starting the whole cluster and removing the HDFS > directory, the new jobmanager re-submitted the previously failed batch job. I think for this you have to set the write mode to overwrite at the moment. > 3. Upon starting the cluster I get the following warning message "[WARNING] > 1 instance(s) of jobmanager are already running", when jps shows no > existing jobmanager process. This is part of the bash script. It currently checks a PID file to determine the running processes, but it does not actually check whether the PIDs are valid or not. I think it's a good idea to actually check this. Let me open an issue for this...