I had to read through your email a few times to fully understand it. You
provided lots of useful information; thank you!

I tried changing the code in my .bash_profile to what you suggested; after
logging out and logging back in, zsh was my shell in interactive mode. I
then submitted a job via jsub and that also seemed to work correctly. In
short, it seems like what you suggested takes care of my problem. I will
let you know if I find any evidence otherwise.

On Tue, Nov 23, 2021 at 12:06 PM YiFei Zhu <zhuyifei1...@gmail.com> wrote:

> On Wed, Nov 17, 2021 at 1:04 AM YiFei Zhu <zhuyifei1...@gmail.com> wrote:
> > On Tue, Nov 16, 2021 at 6:38 PM Huji Lee <huji.h...@gmail.com> wrote:
> > >
> > > I went back and reactivated the line in .bash_profile which enabled
> zsh ("exec zsh" as the last line of .bash_profile)
> > >
> > > Then I submitted the job to the grid, using a command like this:
> > >
> > > jsub -N "n"  -once -o ~/err/nightly.out -e ~/err/nightly.err
> ~/grid/jobs/nightly.sh
> > >
> > > I did it three ways. First, I used the nightly.sh file as is (see
> source). Second, I replaced "source" with "." and third I replaced "source"
> with "bash". In all three cases, it failed, without even producing an
> output or error. The nightly.out and nightly.err files were created of
> course, but were empty.
> > >
> > > Next, I added a "#!/bin/bash" shabang and ran it again all three ways.
> Result was the same.
> > >
> > > Running qstat many times shows that the job gets into a queued state
> ("qw") and after a few seconds, it goes into the run state ("r") and
> immediately stops.
> > >
> > > Removing the "exec zsh" command from .bash_profile will make things
> work again.
> > >
> > > Finally, I decided maybe the problem is that zsh is available for me,
> but not on the grid. So I change the .bash_profile ending from a single
> "exec zsh" command to this:
> > >
> > > if [ -f /usr/bin/zsh ]; then
> > >     zsh
> > > fi
> > >
> > > Under this config, jobs on the grid worked, and when I used "become"
> to login as my tool, I ended with zsh. Obviously, I am happy with this
> workaround. But I am still curious as to the root cause.
> > >
> > > Is it really that zsh is not available on the grid, and the grid tries
> to replicate my environment first and reaches the "exec zsh" command and
> falls apart somehow?
> > >
> >
> > This is consistent with what I described earlier:
> >
> > > Since you have "exec zsh" in your
> > > .bash_profile, bash will run it as startup as a login shell, which in
> > > theory would immediately replace itself with zsh with no arguments.
> > > zsh will then see it has no arguments, attempts to read script from
> > > stdin and get nothing, and immediately exit, stopping the job in grid.
> >
> > However, now that you have "zsh" instead of "exec zsh", the "replace"
> > is not done. bash as the login shell executes zsh as a subshell, and
> > zsh, having no inputs, immediately exits. The execution continues as
> > if nothing had ever happened.
> >
> > I just tested the behavior of a how bash invokes .bash_profile by
> > adding a sleep 60 to .bash_profile, and have my test.sh have a
> > shebang, a a job is submitted for both with explicit 'bash' and
> > without, and it looks like .bash_profile is executed in bath cases:
> >
> >   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> >   sgeadmin   762  0.4  0.1 111020 16056 ?        Sl   Mar25 1383:08
> > /usr/lib/gridengine/sge_execd
> >   [...]
> >   sgeadmin 20388  0.0  0.1  51468  8540 ?        S    07:57   0:00  \_
> > /usr/lib/gridengine/sge_shepherd -bg
> >   tools.z+ 20390  0.0  0.0  23580  3196 ?        Ss   07:57   0:00
> >  \_ -bash -c /data/project/zhuyifei1999-test/test.sh
> >   tools.z+ 20393  0.0  0.0   5796   672 ?        S    07:57   0:00
> >      \_ sleep 60
> >
> >   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> >   sgeadmin   752  0.3  0.1 115112 16100 ?        Sl   Mar25 1313:16
> > /usr/lib/gridengine/sge_execd
> >   [...]
> >   sgeadmin  8715  0.0  0.1  51468  8688 ?        S    07:57   0:00  \_
> > /usr/lib/gridengine/sge_shepherd -bg
> >   tools.z+  8717  0.0  0.0  23580  3324 ?        Ss   07:57   0:00
> >  \_ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh
> >   tools.z+  8720  0.0  0.0   5796   656 ?        S    07:57   0:00
> >      \_ sleep 60
> >
> > It did take me by surprise that it's still bash that invokes the given
> > command, because bash was not in the process tree for a usual "jsub
> > [...] python script.sh". For example, a non-continuous job typically
> > looks like this:
> >
> >   sgeadmin 28386  0.0  0.1  51468  8588 ?        S    Nov15   0:00  \_
> > /usr/lib/gridengine/sge_shepherd -bg
> >   tools.f+ 28388  7.2  3.5 427144 293024 ?       Ss   Nov15 210:55  |
> >  \_ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10
> >
> > And a continuous one:
> >
> >   sgeadmin  3699  0.0  0.0  51464  4540 ?        S    Apr19   0:00  \_
> > /usr/lib/gridengine/sge_shepherd -bg
> >   tools.b+  3701  0.0  0.0   4280    68 ?        SNs  Apr19   0:00  |
> >  \_ /bin/sh
> /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451
> >   tools.b+  3702  0.2  2.8 505104 231092 ?       SNl  Apr19 674:45  |
> >      \_ /usr/bin/python bot2.py
> >
> > There is no `-bash -c "python script.sh"`
> >
> > However, if you trace what's going on, for a non-interactive bash that
> > only receives a single command, it will directly execve that command:
> >
> >   $ strace -e clone,execve bash -c '/bin/true'
> >   execve("/bin/bash", ["bash", "-c", "/bin/true"], [/* 26 vars */]) = 0
> >   execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0
> >   +++ exited with 0 +++
> >
> > It does not involve child processes from the fork-exec model you'd
> > expect. Therefore, we can say that no matter what you do with the job
> > submission, a bash non-interactive login shell will be executed to run
> > the command you specified to jsub. And the mess of "bash replace
> > itself with zsh which immediately exits because stdin is empty" will
> > apply.
> >
> > I think it is important to clarify that a shell like bash has 4 modes
> > of execution, defined by whether it is an interactive shell, and
> > whether it is a login shell. The details for the modes in the case of
> > bash you can find in its man page [1]. But tl;dr:
> >
> > Login shells:
> > - Upon startup, sources /etc/profile, then the first one among
> > ~/.bash_profile, ~/.bash_login, and ~/.profile, that exists.
> > - `bash -l` and `-bash` (note the dash sign at the front) makes bash a
> > login shell
> >
> > Non-login shells:
> > - If also interactive, upon startup, sources ~/.bashrc
> >
> > Interactive shells:
> > - DIsplays a prompt for each command
> >
> > Non-interactive shells:
> > - Upon startup, sources $BASH_ENV if it exists
> > - As we saw above, if the command is given in the command string in -c
> > and there is only one command, bash does not fork-exec the command but
> > execs the command directly.
> >
> > So you might wonder why the separation of login shells (profile) vs
> > non-login shells (rc). The reason is some environments are inherited
> > by subshells while others are not. Environment variables are
> > inherited:
> >
> >   $ export FOO=bar
> >   $ echo $FOO
> >   bar
> >   $ bash
> >   $ echo $FOO
> >   bar
> >
> > While things like aliases are not:
> >
> >   $ alias foo='echo bar'
> >   $ foo
> >   bar
> >   $ bash
> >   $ foo
> >   bash: foo: command not found
> >
> > There are environment setups that get inherited but you do not want it
> > to be executed over and over by subshells. For example, appending to
> > $PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of
> > profile, every time you run an interactive bash subshell PATH gets
> > longer and more redundant; hence $PATH setups normally go to profile
> > instead of rc. Non-inheritable setups like aliases go to rc. And the
> > separation between .bash_profile and .profile is just so that you can
> > have a .bash_profile that uses bash-specific syntax. I never needed
> > any so I always use .profile.
> >
> > And to have bash login shells also get the initialization from rc,
> > .profile usually has a header like this:
> >
> >   # if running bash
> >   if [ -n "$BASH_VERSION" ]; then
> >       # include .bashrc if it exists
> >       if [ -f "$HOME/.bashrc" ]; then
> >           . "$HOME/.bashrc"
> >       fi
> >   fi
> >
> > And .bashrc:
> >
> >   # Test for an interactive shell
> >   if [[ $- != *i* ]] ; then
> >           # Shell is non-interactive.  Be done now!
> >           return
> >   fi
> >
> > I hope this makes sense. Let me know if not.
> >
> > Back to your question, let's see in what scenarios you would want to
> invoke zsh:
> > - Non-interactive shells: No, you don't want `bash command.sh` randomly
> exec zsh
> > - Interactive non-login shells: No, if you explicitly run `bash`, you
> > want bash not zsh.
> > - Interactive login shells. Yes, this is what `become tool` runs
> > initially and you want bash here.
> >
> > Hence, to run in a login shell environment you'd want the .profile or
> > .bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash
> > syntax, so what you want, expressed in code, is in .bash_profile:
> >
> >   if [[ $- = *i* ]]; then
> >           exec zsh
> >   fi
> >
> > As a side note, yes zsh exists on the grid hosts:
> >
> >   zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh
> >   -rwxr-xr-x 1 root root 819744 Dec  1  2020 /bin/zsh
> >   lrwxrwxrwx 1 root root      8 Nov 22  2018 /usr/bin/zsh -> /bin/zsh
> >
> > [1] https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION
> >
> > YiFei Zhu
>
> Have you had a chance to take a look at it yet?
>
> YiFei Zhu
> _______________________________________________
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to