Ah ok we need to pass --host. The command line help sais jobmanager.sh <host>?!?! If I recall. I have to go check tomorrow...
On Tue., Jun. 18, 2019, 10:05 p.m. PoolakkalMukkath, Shakir, < shakir_poolakkalmukk...@comcast.com> wrote: > Hi Nick, > > > > It works that way by explicitly setting the –host. I got mislead by the > *“only”* word in doc and did not try. Thanks for the help > > > > Thanks, > > Shakir > > *From: *"Martin, Nick" <nick.mar...@ngc.com> > *Date: *Tuesday, June 18, 2019 at 6:31 PM > *To: *"PoolakkalMukkath, Shakir" <shakir_poolakkalmukk...@comcast.com>, > Till Rohrmann <trohrm...@apache.org>, John Smith <java.dev....@gmail.com> > *Cc: *user <user@flink.apache.org> > *Subject: *RE: [EXTERNAL] Re: How to restart/recover on reboot? > > > > Jobmanager.sh takes an optional argument for the hostname to bind to, and > start-cluster uses it. If you leave it blank it, the script will use > whatever is in flink-conf.yaml (localhost is the default value that ships > with flink). > > > > The dockerized version of flink runs pretty much the way you’re trying to > operate (i.e. each node starts itself), so the entrypoint script out of > that is probably a good source of information about how to set it up. > > > > *From:* PoolakkalMukkath, Shakir [mailto: > shakir_poolakkalmukk...@comcast.com] > *Sent:* Tuesday, June 18, 2019 2:15 PM > *To:* Till Rohrmann <trohrm...@apache.org>; John Smith < > java.dev....@gmail.com> > *Cc:* user <user@flink.apache.org> > *Subject:* EXT :Re: [EXTERNAL] Re: How to restart/recover on reboot? > > > > Hi Tim,John, > > > > I do agree with the issue John mentioned and have the same problem. > > > > We can only *start* a standalone HA cluster with ./start-cluster.sh > script. And then when there are failures, we can *restart* those > components individually by calling jobmanager.sh/ jobmanager.sh. This > works great > > But , Like John mentioned, If we want to start the cluster initially > itself by running the jobmanager.sh on each JobManager nodes, it is not > working. It binds to local and not forming the HA cluster. > > > > Thanks, > > Shakir > > > > *From: *Till Rohrmann <trohrm...@apache.org> > *Date: *Tuesday, June 18, 2019 at 4:23 PM > *To: *John Smith <java.dev....@gmail.com> > *Cc: *user <user@flink.apache.org> > *Subject: *[EXTERNAL] Re: How to restart/recover on reboot? > > > > I guess it should work if you installed a systemd service which simply > calls `jobmanager.sh start` or `taskmanager.sh start`. > > > > Cheers, > > Till > > > > On Tue, Jun 18, 2019 at 4:29 PM John Smith <java.dev....@gmail.com> wrote: > > Yes, that is understood. But I don't see why we cannot call jobmanager.sh > and taskmanager.sh to build the cluster and have them run as systemd units. > > I looked at start-cluster.sh and all it does is SSH and call jobmanager.sh > which then cascades to taskmanager.sh I just have to pin point what's > missing to have systemd service working. In fact calling jobmanager.sh as > systemd service actually sees the shared masters, slaves and > flink-conf.yaml. But it binds to local host. > > > > Maybe one way to do it would be to bootstrap the cluster with > ./start-cluster.sh and then install systemd services for jobmanager.sh and > tsakmanager.sh > > > > Like I said I don't want to have some process in place to remind admins > they need to manually start a node every time they patch or a host goes > down for what ever reason. > > > > On Tue, 18 Jun 2019 at 04:31, Till Rohrmann <trohrm...@apache.org> wrote: > > When a single machine fails you should rather call `taskmanager.sh > start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` > will start multiple processes on different machines. > > > > Cheers, > > Till > > > > On Mon, Jun 17, 2019 at 4:30 PM John Smith <java.dev....@gmail.com> wrote: > > Well some reasons, machine reboots/maintenance etc... Host/VM crashes and > restarts. And same goes for the job manager. I don't want/need to have to > document/remember some start process for sys admins/devops. > > So far I have looked at ./start-cluster.sh and all it seems to do is SSH > into all the specified nodes and starts the processes using the jobmanager > and taskmanager scripts. I don't see anything special in any of the sh > scripts. > I configured passwordless ssh through terraform and all that works great > only when trying to do the manual start through systemd. I may have > something missing... > > > > On Mon, 17 Jun 2019 at 09:41, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi John, > > > > I have not much experience wrt setting Flink up via systemd services. Why > do you want to do it like that? > > > > 1. In standalone mode, Flink won't automatically restart TaskManagers. > This only works on Yarn and Mesos atm. > > 2. In case of a lost TaskManager, you should run `taskmanager.sh start`. > This script simply starts a new TaskManager process. > > 3. I guess you could use systemd to bring up a Flink TaskManager process > on start up. > > > > Cheers, > > Till > > > > On Fri, Jun 14, 2019 at 5:56 PM John Smith <java.dev....@gmail.com> wrote: > > I looked into the start-cluster.sh and I don't see anything special. So > technically it should be as easy as installing Systemd services to run > jobamanger.sh and taskmanager.sh respectively? > > > > On Wed, 12 Jun 2019 at 13:02, John Smith <java.dev....@gmail.com> wrote: > > The installation instructions do not indicate how to create systemd > services. > > > > 1- When task nodes fail, will the job leader detect this and ssh and > restart the task node? From my testing it doesn't seem like it. > > 2- How do we recover a lost node? Do we simply go back to the master node > and run start-cluster.sh and the script is smart enough to figure out what > is missing? > > 3- Or do we need to create systemd services and if so on which command do > we start the service on? > > > ------------------------------ > > Notice: This e-mail is intended solely for use of the individual or entity > to which it is addressed and may contain information that is proprietary, > privileged and/or exempt from disclosure under applicable law. If the > reader is not the intended recipient or agent responsible for delivering > the message to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this communication is strictly > prohibited. This communication may also contain data subject to U.S. export > laws. If so, data subject to the International Traffic in Arms Regulation > cannot be disseminated, distributed, transferred, or copied, whether > incorporated or in its original form, to foreign nationals residing in the > U.S. or abroad, absent the express prior approval of the U.S. Department of > State. Data subject to the Export Administration Act may not be > disseminated, distributed, transferred or copied contrary to U. S. > Department of Commerce regulations. If you have received this communication > in error, please notify the sender by reply e-mail and destroy the e-mail > message and any physical copies made of the communication. > Thank you. > ********************* >