Hi Marc,
the master, i.e. JobManager, does not need to know which clients, i.e. 
TaskManager, are supposed to connect to it. Indeed, only the task managers 
need to know where to connect to and they will try to establish that 
connection and re-connect when losing it.


Nico

On Friday, 11 August 2017 22:24:29 CEST Kaepke, Marc wrote:
> Hi Greg,
> 
> I guess I restarted the cluster too fast. Combined with a high cpu inside
> the cluster.
 I tested it again few minutes ago and there was no issue!
> With „$ jps“ I checked if there any Java process -> there wasn’t 
> But if the master don’t know slave5, how can slave5 reconnect to the
> JobManager? That mean the JobManager will „adopt a child“.
 
> Marc
> 
> 
> > Am 11.08.2017 um 20:27 schrieb Greg Hogan <c...@greghogan.com>:
> > 
> > Hi Marc,
> > 
> > By chance did you edit the slaves file before shutting down the cluster?
> > If so, then the removed worker would not be stopped and would reconnect
> > to the restarted JobManager.
 
> > Greg
> > 
> > 
> > 
> >> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc <marc.kae...@haw-hamburg.de>
> >> wrote:
 
> >> Hi,
> >> 
> >> I have a cluster of 4 dedicated machines (no VMs). My previous config
> >> was: 1 master and 3 slaves. Each machine provides a task- or
> >> jobmanager.
 
> >> Now I want to reduce my cluster and have 1 master and 3 slaves, but one
> >> machine provides a jobmanager and one task manager in parallel. I
> >> changed all conf/slaves files. While I start my cluster everything seems
> >> well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two
> >> seconds later I see 4 taskmanger and one JM. I also can run a job with
> >> 32 slots (4 TM * 8 slots) without any errors.
 
> >> Why does my cluster has 4 task manager?! All slaves files are cleaned and
> >> contains 3 inputs
 
> >> 
> >> Thanks!
> >> 
> >> Marc
> 
> 

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to