Ok I tried it works! I can setup my cluster with terraform and enable systemd services! i think I got confused when I looked and it was doing leader election because all service came up quick!
On Tue, 18 Jun 2019 at 22:35, John Smith <java.dev....@gmail.com> wrote: > Ah ok we need to pass --host. The command line help sais jobmanager.sh > <host>?!?! If I recall. I have to go check tomorrow... > > On Tue., Jun. 18, 2019, 10:05 p.m. PoolakkalMukkath, Shakir, < > shakir_poolakkalmukk...@comcast.com> wrote: > >> Hi Nick, >> >> >> >> It works that way by explicitly setting the –host. I got mislead by the >> *“only”* word in doc and did not try. Thanks for the help >> >> >> >> Thanks, >> >> Shakir >> >> *From: *"Martin, Nick" <nick.mar...@ngc.com> >> *Date: *Tuesday, June 18, 2019 at 6:31 PM >> *To: *"PoolakkalMukkath, Shakir" <shakir_poolakkalmukk...@comcast.com>, >> Till Rohrmann <trohrm...@apache.org>, John Smith <java.dev....@gmail.com> >> *Cc: *user <user@flink.apache.org> >> *Subject: *RE: [EXTERNAL] Re: How to restart/recover on reboot? >> >> >> >> Jobmanager.sh takes an optional argument for the hostname to bind to, and >> start-cluster uses it. If you leave it blank it, the script will use >> whatever is in flink-conf.yaml (localhost is the default value that ships >> with flink). >> >> >> >> The dockerized version of flink runs pretty much the way you’re trying to >> operate (i.e. each node starts itself), so the entrypoint script out of >> that is probably a good source of information about how to set it up. >> >> >> >> *From:* PoolakkalMukkath, Shakir [mailto: >> shakir_poolakkalmukk...@comcast.com] >> *Sent:* Tuesday, June 18, 2019 2:15 PM >> *To:* Till Rohrmann <trohrm...@apache.org>; John Smith < >> java.dev....@gmail.com> >> *Cc:* user <user@flink.apache.org> >> *Subject:* EXT :Re: [EXTERNAL] Re: How to restart/recover on reboot? >> >> >> >> Hi Tim,John, >> >> >> >> I do agree with the issue John mentioned and have the same problem. >> >> >> >> We can only *start* a standalone HA cluster with ./start-cluster.sh >> script. And then when there are failures, we can *restart* those >> components individually by calling jobmanager.sh/ jobmanager.sh. This >> works great >> >> But , Like John mentioned, If we want to start the cluster initially >> itself by running the jobmanager.sh on each JobManager nodes, it is not >> working. It binds to local and not forming the HA cluster. >> >> >> >> Thanks, >> >> Shakir >> >> >> >> *From: *Till Rohrmann <trohrm...@apache.org> >> *Date: *Tuesday, June 18, 2019 at 4:23 PM >> *To: *John Smith <java.dev....@gmail.com> >> *Cc: *user <user@flink.apache.org> >> *Subject: *[EXTERNAL] Re: How to restart/recover on reboot? >> >> >> >> I guess it should work if you installed a systemd service which simply >> calls `jobmanager.sh start` or `taskmanager.sh start`. >> >> >> >> Cheers, >> >> Till >> >> >> >> On Tue, Jun 18, 2019 at 4:29 PM John Smith <java.dev....@gmail.com> >> wrote: >> >> Yes, that is understood. But I don't see why we cannot call jobmanager.sh >> and taskmanager.sh to build the cluster and have them run as systemd units. >> >> I looked at start-cluster.sh and all it does is SSH and call >> jobmanager.sh which then cascades to taskmanager.sh I just have to pin >> point what's missing to have systemd service working. In fact calling >> jobmanager.sh as systemd service actually sees the shared masters, slaves >> and flink-conf.yaml. But it binds to local host. >> >> >> >> Maybe one way to do it would be to bootstrap the cluster with >> ./start-cluster.sh and then install systemd services for jobmanager.sh and >> tsakmanager.sh >> >> >> >> Like I said I don't want to have some process in place to remind admins >> they need to manually start a node every time they patch or a host goes >> down for what ever reason. >> >> >> >> On Tue, 18 Jun 2019 at 04:31, Till Rohrmann <trohrm...@apache.org> wrote: >> >> When a single machine fails you should rather call `taskmanager.sh >> start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` >> will start multiple processes on different machines. >> >> >> >> Cheers, >> >> Till >> >> >> >> On Mon, Jun 17, 2019 at 4:30 PM John Smith <java.dev....@gmail.com> >> wrote: >> >> Well some reasons, machine reboots/maintenance etc... Host/VM crashes and >> restarts. And same goes for the job manager. I don't want/need to have to >> document/remember some start process for sys admins/devops. >> >> So far I have looked at ./start-cluster.sh and all it seems to do is SSH >> into all the specified nodes and starts the processes using the jobmanager >> and taskmanager scripts. I don't see anything special in any of the sh >> scripts. >> I configured passwordless ssh through terraform and all that works great >> only when trying to do the manual start through systemd. I may have >> something missing... >> >> >> >> On Mon, 17 Jun 2019 at 09:41, Till Rohrmann <trohrm...@apache.org> wrote: >> >> Hi John, >> >> >> >> I have not much experience wrt setting Flink up via systemd services. Why >> do you want to do it like that? >> >> >> >> 1. In standalone mode, Flink won't automatically restart TaskManagers. >> This only works on Yarn and Mesos atm. >> >> 2. In case of a lost TaskManager, you should run `taskmanager.sh start`. >> This script simply starts a new TaskManager process. >> >> 3. I guess you could use systemd to bring up a Flink TaskManager process >> on start up. >> >> >> >> Cheers, >> >> Till >> >> >> >> On Fri, Jun 14, 2019 at 5:56 PM John Smith <java.dev....@gmail.com> >> wrote: >> >> I looked into the start-cluster.sh and I don't see anything special. So >> technically it should be as easy as installing Systemd services to run >> jobamanger.sh and taskmanager.sh respectively? >> >> >> >> On Wed, 12 Jun 2019 at 13:02, John Smith <java.dev....@gmail.com> wrote: >> >> The installation instructions do not indicate how to create systemd >> services. >> >> >> >> 1- When task nodes fail, will the job leader detect this and ssh and >> restart the task node? From my testing it doesn't seem like it. >> >> 2- How do we recover a lost node? Do we simply go back to the master node >> and run start-cluster.sh and the script is smart enough to figure out what >> is missing? >> >> 3- Or do we need to create systemd services and if so on which command do >> we start the service on? >> >> >> ------------------------------ >> >> Notice: This e-mail is intended solely for use of the individual or >> entity to which it is addressed and may contain information that is >> proprietary, privileged and/or exempt from disclosure under applicable law. >> If the reader is not the intended recipient or agent responsible for >> delivering the message to the intended recipient, you are hereby notified >> that any dissemination, distribution or copying of this communication is >> strictly prohibited. This communication may also contain data subject to >> U.S. export laws. If so, data subject to the International Traffic in Arms >> Regulation cannot be disseminated, distributed, transferred, or copied, >> whether incorporated or in its original form, to foreign nationals residing >> in the U.S. or abroad, absent the express prior approval of the U.S. >> Department of State. Data subject to the Export Administration Act may not >> be disseminated, distributed, transferred or copied contrary to U. S. >> Department of Commerce regulations. If you have received this communication >> in error, please notify the sender by reply e-mail and destroy the e-mail >> message and any physical copies made of the communication. >> Thank you. >> ********************* >> >