I guess it should work if you installed a systemd service which simply calls `jobmanager.sh start` or `taskmanager.sh start`.
Cheers, Till On Tue, Jun 18, 2019 at 4:29 PM John Smith <java.dev....@gmail.com> wrote: > Yes, that is understood. But I don't see why we cannot call jobmanager.sh > and taskmanager.sh to build the cluster and have them run as systemd units. > > I looked at start-cluster.sh and all it does is SSH and call jobmanager.sh > which then cascades to taskmanager.sh I just have to pin point what's > missing to have systemd service working. In fact calling jobmanager.sh as > systemd service actually sees the shared masters, slaves and > flink-conf.yaml. But it binds to local host. > > Maybe one way to do it would be to bootstrap the cluster with > ./start-cluster.sh and then install systemd services for jobmanager.sh and > tsakmanager.sh > > Like I said I don't want to have some process in place to remind admins > they need to manually start a node every time they patch or a host goes > down for what ever reason. > > On Tue, 18 Jun 2019 at 04:31, Till Rohrmann <trohrm...@apache.org> wrote: > >> When a single machine fails you should rather call `taskmanager.sh >> start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` >> will start multiple processes on different machines. >> >> Cheers, >> Till >> >> On Mon, Jun 17, 2019 at 4:30 PM John Smith <java.dev....@gmail.com> >> wrote: >> >>> Well some reasons, machine reboots/maintenance etc... Host/VM crashes >>> and restarts. And same goes for the job manager. I don't want/need to have >>> to document/remember some start process for sys admins/devops. >>> >>> So far I have looked at ./start-cluster.sh and all it seems to do is SSH >>> into all the specified nodes and starts the processes using the jobmanager >>> and taskmanager scripts. I don't see anything special in any of the sh >>> scripts. >>> I configured passwordless ssh through terraform and all that works great >>> only when trying to do the manual start through systemd. I may have >>> something missing... >>> >>> >>> >>> On Mon, 17 Jun 2019 at 09:41, Till Rohrmann <trohrm...@apache.org> >>> wrote: >>> >>>> Hi John, >>>> >>>> I have not much experience wrt setting Flink up via systemd services. >>>> Why do you want to do it like that? >>>> >>>> 1. In standalone mode, Flink won't automatically restart TaskManagers. >>>> This only works on Yarn and Mesos atm. >>>> 2. In case of a lost TaskManager, you should run `taskmanager.sh >>>> start`. This script simply starts a new TaskManager process. >>>> 3. I guess you could use systemd to bring up a Flink TaskManager >>>> process on start up. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Fri, Jun 14, 2019 at 5:56 PM John Smith <java.dev....@gmail.com> >>>> wrote: >>>> >>>>> I looked into the start-cluster.sh and I don't see anything special. >>>>> So technically it should be as easy as installing Systemd services to run >>>>> jobamanger.sh and taskmanager.sh respectively? >>>>> >>>>> On Wed, 12 Jun 2019 at 13:02, John Smith <java.dev....@gmail.com> >>>>> wrote: >>>>> >>>>>> The installation instructions do not indicate how to create systemd >>>>>> services. >>>>>> >>>>>> 1- When task nodes fail, will the job leader detect this and ssh and >>>>>> restart the task node? From my testing it doesn't seem like it. >>>>>> 2- How do we recover a lost node? Do we simply go back to the master >>>>>> node and run start-cluster.sh and the script is smart enough to figure >>>>>> out >>>>>> what is missing? >>>>>> 3- Or do we need to create systemd services and if so on which >>>>>> command do we start the service on? >>>>>> >>>>>