Re: How to restart/recover on reboot?

John Smith Mon, 17 Jun 2019 07:31:01 -0700

Well some reasons, machine reboots/maintenance etc... Host/VM crashes and
restarts. And same goes for the job manager. I don't want/need to have to
document/remember some start process for sys admins/devops.


So far I have looked at ./start-cluster.sh and all it seems to do is SSH
into all the specified nodes and starts the processes using the jobmanager
and taskmanager scripts. I don't see anything special in any of the sh
scripts.
I configured passwordless ssh through terraform and all that works great
only when trying to do the manual start through systemd. I may have
something missing...



On Mon, 17 Jun 2019 at 09:41, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi John,
>
> I have not much experience wrt setting Flink up via systemd services. Why
> do you want to do it like that?
>
> 1. In standalone mode, Flink won't automatically restart TaskManagers.
> This only works on Yarn and Mesos atm.
> 2. In case of a lost TaskManager, you should run `taskmanager.sh start`.
> This script simply starts a new TaskManager process.
> 3. I guess you could use systemd to bring up a Flink TaskManager process
> on start up.
>
> Cheers,
> Till
>
> On Fri, Jun 14, 2019 at 5:56 PM John Smith <java.dev....@gmail.com> wrote:
>
>> I looked into the start-cluster.sh and I don't see anything special. So
>> technically it should be as easy as installing Systemd services to run
>> jobamanger.sh and taskmanager.sh respectively?
>>
>> On Wed, 12 Jun 2019 at 13:02, John Smith <java.dev....@gmail.com> wrote:
>>
>>> The installation instructions do not indicate how to create systemd
>>> services.
>>>
>>> 1- When task nodes fail, will the job leader detect this and ssh and
>>> restart the task node? From my testing it doesn't seem like it.
>>> 2- How do we recover a lost node? Do we simply go back to the master
>>> node and run start-cluster.sh and the script is smart enough to figure out
>>> what is missing?
>>> 3- Or do we need to create systemd services and if so on which command
>>> do we start the service on?
>>>
>>

Re: How to restart/recover on reboot?

Reply via email to