Oh, I have figured out the problem, which has something to do with my ~/.profile, i cannot remember when i added one line in the ~/.profile, which sources my .zshrc, leading to the login shell always goes to zsh.
On Wed, Mar 7, 2018 at 2:13 AM, Yesheng Ma <kimi.y...@gmail.com> wrote: > Related source code: https://github.com/apache/flink/blob/master/ > flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40 > > On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma <kimi.y...@gmail.com> wrote: > >> Hi Nico, >> >> Thanks for your reply. My major concern is actually the `-l` argument. >> The command I executed is: `nohup /bin/bash -x -l >> "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster >> dell-01.epcc 8091`, with and without the `-l` argument (the script in >> Flink's bin directory uses the `-l` argument). >> >> 1) with the `-l` argument: the log is quite messy, but there are some >> clue, the last executed command starts a zsh shell: >> ``` >> + . /home/ysma/.bashrc >> ++ case $- in >> ++ return >> + PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/ >> ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/ >> home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/ >> bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/ >> sbin:/bin:/usr/games:/usr/local/games:/snap/bin >> + '[' -f /bin/zsh ']' >> + exec /bin/zsh -l >> ``` >> I guess the bash -l arguments detects the user's login shell and then >> logs in a zsh shell (which I'm currently using) and never back. >> >> 2) without the `-l` argument, everything just goes fine. >> >> Therefore I suspect there might be something wrong with the `-l` >> argument, or something wrong with my bash config? Any ideas? Thanks! >> >> >> On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <n...@data-artisans.com> >> wrote: >> >>> Hi Yesheng, >>> `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit >>> strange since it should (imho) be an absolute path towards flink. >>> >>> What you could do to diagnose further, is to try to run the ssh command >>> manually, i.e. figure out what is being executed by calling >>> bash -x ./bin/start-cluster.sh >>> and then run the ssh command without "-n" and not in background "&". >>> Then you should also see the JobManager stdout to diagnose further. >>> >>> If that does not help yet, please log into the master manually and >>> execute the "nohup /bin/bash..." command there to see what is going on. >>> >>> Depending on where the failure was, there may even be logs on the master >>> machine. >>> >>> >>> Nico >>> >>> On 04/03/18 15:52, Yesheng Ma wrote: >>> > Hi all, >>> > >>> > When I execute bin/start-cluster.sh on the master machine, actually >>> > the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is >>> > exexuted, which does not open the job manager properly. >>> > >>> > I think there might be something wrong with the `-l` argument, since >>> > when I use the `bin/jobmanager.sh start` command, everything is fine. >>> > Kindly point out if I've done any configuration wrong. Thanks! >>> > >>> > Best, >>> > Yesheng >>> > >>> > >>> >>> >> >