The scripts and the masters/slaves files are only relevant to the scripts which SSH to the machines to start/stop the processes. They have not really an impact on how the processes find each other.
Calling them repeatedly and editing them can start additional processes, or not stop all processes. In that case, you can try and repeatedly call stop-cluster.sh to stop remaining processes, or SSH to the nodes and kill the processes manually. Also: The files are only relevant on the machine where you execute the shell scripts. If you edit them on other machines, it has no impact. On Mon, Aug 14, 2017 at 11:46 AM, Nico Kruber <n...@data-artisans.com> wrote: > Hi Marc, > the master, i.e. JobManager, does not need to know which clients, i.e. > TaskManager, are supposed to connect to it. Indeed, only the task managers > need to know where to connect to and they will try to establish that > connection and re-connect when losing it. > > > Nico > > On Friday, 11 August 2017 22:24:29 CEST Kaepke, Marc wrote: > > Hi Greg, > > > > I guess I restarted the cluster too fast. Combined with a high cpu inside > > the cluster. > I tested it again few minutes ago and there was no issue! > > With „$ jps“ I checked if there any Java process -> there wasn’t > > But if the master don’t know slave5, how can slave5 reconnect to the > > JobManager? That mean the JobManager will „adopt a child“. > > > Marc > > > > > > > Am 11.08.2017 um 20:27 schrieb Greg Hogan <c...@greghogan.com>: > > > > > > Hi Marc, > > > > > > By chance did you edit the slaves file before shutting down the > cluster? > > > If so, then the removed worker would not be stopped and would reconnect > > > to the restarted JobManager. > > > > Greg > > > > > > > > > > > >> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc < > marc.kae...@haw-hamburg.de> > > >> wrote: > > > >> Hi, > > >> > > >> I have a cluster of 4 dedicated machines (no VMs). My previous config > > >> was: 1 master and 3 slaves. Each machine provides a task- or > > >> jobmanager. > > > >> Now I want to reduce my cluster and have 1 master and 3 slaves, but > one > > >> machine provides a jobmanager and one task manager in parallel. I > > >> changed all conf/slaves files. While I start my cluster everything > seems > > >> well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two > > >> seconds later I see 4 taskmanger and one JM. I also can run a job with > > >> 32 slots (4 TM * 8 slots) without any errors. > > > >> Why does my cluster has 4 task manager?! All slaves files are cleaned > and > > >> contains 3 inputs > > > >> > > >> Thanks! > > >> > > >> Marc > > > > > >