I had to read through your email a few times to fully understand it. You provided lots of useful information; thank you!
I tried changing the code in my .bash_profile to what you suggested; after logging out and logging back in, zsh was my shell in interactive mode. I then submitted a job via jsub and that also seemed to work correctly. In short, it seems like what you suggested takes care of my problem. I will let you know if I find any evidence otherwise. On Tue, Nov 23, 2021 at 12:06 PM YiFei Zhu <zhuyifei1...@gmail.com> wrote: > On Wed, Nov 17, 2021 at 1:04 AM YiFei Zhu <zhuyifei1...@gmail.com> wrote: > > On Tue, Nov 16, 2021 at 6:38 PM Huji Lee <huji.h...@gmail.com> wrote: > > > > > > I went back and reactivated the line in .bash_profile which enabled > zsh ("exec zsh" as the last line of .bash_profile) > > > > > > Then I submitted the job to the grid, using a command like this: > > > > > > jsub -N "n" -once -o ~/err/nightly.out -e ~/err/nightly.err > ~/grid/jobs/nightly.sh > > > > > > I did it three ways. First, I used the nightly.sh file as is (see > source). Second, I replaced "source" with "." and third I replaced "source" > with "bash". In all three cases, it failed, without even producing an > output or error. The nightly.out and nightly.err files were created of > course, but were empty. > > > > > > Next, I added a "#!/bin/bash" shabang and ran it again all three ways. > Result was the same. > > > > > > Running qstat many times shows that the job gets into a queued state > ("qw") and after a few seconds, it goes into the run state ("r") and > immediately stops. > > > > > > Removing the "exec zsh" command from .bash_profile will make things > work again. > > > > > > Finally, I decided maybe the problem is that zsh is available for me, > but not on the grid. So I change the .bash_profile ending from a single > "exec zsh" command to this: > > > > > > if [ -f /usr/bin/zsh ]; then > > > zsh > > > fi > > > > > > Under this config, jobs on the grid worked, and when I used "become" > to login as my tool, I ended with zsh. Obviously, I am happy with this > workaround. But I am still curious as to the root cause. > > > > > > Is it really that zsh is not available on the grid, and the grid tries > to replicate my environment first and reaches the "exec zsh" command and > falls apart somehow? > > > > > > > This is consistent with what I described earlier: > > > > > Since you have "exec zsh" in your > > > .bash_profile, bash will run it as startup as a login shell, which in > > > theory would immediately replace itself with zsh with no arguments. > > > zsh will then see it has no arguments, attempts to read script from > > > stdin and get nothing, and immediately exit, stopping the job in grid. > > > > However, now that you have "zsh" instead of "exec zsh", the "replace" > > is not done. bash as the login shell executes zsh as a subshell, and > > zsh, having no inputs, immediately exits. The execution continues as > > if nothing had ever happened. > > > > I just tested the behavior of a how bash invokes .bash_profile by > > adding a sleep 60 to .bash_profile, and have my test.sh have a > > shebang, a a job is submitted for both with explicit 'bash' and > > without, and it looks like .bash_profile is executed in bath cases: > > > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME > COMMAND > > sgeadmin 762 0.4 0.1 111020 16056 ? Sl Mar25 1383:08 > > /usr/lib/gridengine/sge_execd > > [...] > > sgeadmin 20388 0.0 0.1 51468 8540 ? S 07:57 0:00 \_ > > /usr/lib/gridengine/sge_shepherd -bg > > tools.z+ 20390 0.0 0.0 23580 3196 ? Ss 07:57 0:00 > > \_ -bash -c /data/project/zhuyifei1999-test/test.sh > > tools.z+ 20393 0.0 0.0 5796 672 ? S 07:57 0:00 > > \_ sleep 60 > > > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME > COMMAND > > sgeadmin 752 0.3 0.1 115112 16100 ? Sl Mar25 1313:16 > > /usr/lib/gridengine/sge_execd > > [...] > > sgeadmin 8715 0.0 0.1 51468 8688 ? S 07:57 0:00 \_ > > /usr/lib/gridengine/sge_shepherd -bg > > tools.z+ 8717 0.0 0.0 23580 3324 ? Ss 07:57 0:00 > > \_ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh > > tools.z+ 8720 0.0 0.0 5796 656 ? S 07:57 0:00 > > \_ sleep 60 > > > > It did take me by surprise that it's still bash that invokes the given > > command, because bash was not in the process tree for a usual "jsub > > [...] python script.sh". For example, a non-continuous job typically > > looks like this: > > > > sgeadmin 28386 0.0 0.1 51468 8588 ? S Nov15 0:00 \_ > > /usr/lib/gridengine/sge_shepherd -bg > > tools.f+ 28388 7.2 3.5 427144 293024 ? Ss Nov15 210:55 | > > \_ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10 > > > > And a continuous one: > > > > sgeadmin 3699 0.0 0.0 51464 4540 ? S Apr19 0:00 \_ > > /usr/lib/gridengine/sge_shepherd -bg > > tools.b+ 3701 0.0 0.0 4280 68 ? SNs Apr19 0:00 | > > \_ /bin/sh > /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451 > > tools.b+ 3702 0.2 2.8 505104 231092 ? SNl Apr19 674:45 | > > \_ /usr/bin/python bot2.py > > > > There is no `-bash -c "python script.sh"` > > > > However, if you trace what's going on, for a non-interactive bash that > > only receives a single command, it will directly execve that command: > > > > $ strace -e clone,execve bash -c '/bin/true' > > execve("/bin/bash", ["bash", "-c", "/bin/true"], [/* 26 vars */]) = 0 > > execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0 > > +++ exited with 0 +++ > > > > It does not involve child processes from the fork-exec model you'd > > expect. Therefore, we can say that no matter what you do with the job > > submission, a bash non-interactive login shell will be executed to run > > the command you specified to jsub. And the mess of "bash replace > > itself with zsh which immediately exits because stdin is empty" will > > apply. > > > > I think it is important to clarify that a shell like bash has 4 modes > > of execution, defined by whether it is an interactive shell, and > > whether it is a login shell. The details for the modes in the case of > > bash you can find in its man page [1]. But tl;dr: > > > > Login shells: > > - Upon startup, sources /etc/profile, then the first one among > > ~/.bash_profile, ~/.bash_login, and ~/.profile, that exists. > > - `bash -l` and `-bash` (note the dash sign at the front) makes bash a > > login shell > > > > Non-login shells: > > - If also interactive, upon startup, sources ~/.bashrc > > > > Interactive shells: > > - DIsplays a prompt for each command > > > > Non-interactive shells: > > - Upon startup, sources $BASH_ENV if it exists > > - As we saw above, if the command is given in the command string in -c > > and there is only one command, bash does not fork-exec the command but > > execs the command directly. > > > > So you might wonder why the separation of login shells (profile) vs > > non-login shells (rc). The reason is some environments are inherited > > by subshells while others are not. Environment variables are > > inherited: > > > > $ export FOO=bar > > $ echo $FOO > > bar > > $ bash > > $ echo $FOO > > bar > > > > While things like aliases are not: > > > > $ alias foo='echo bar' > > $ foo > > bar > > $ bash > > $ foo > > bash: foo: command not found > > > > There are environment setups that get inherited but you do not want it > > to be executed over and over by subshells. For example, appending to > > $PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of > > profile, every time you run an interactive bash subshell PATH gets > > longer and more redundant; hence $PATH setups normally go to profile > > instead of rc. Non-inheritable setups like aliases go to rc. And the > > separation between .bash_profile and .profile is just so that you can > > have a .bash_profile that uses bash-specific syntax. I never needed > > any so I always use .profile. > > > > And to have bash login shells also get the initialization from rc, > > .profile usually has a header like this: > > > > # if running bash > > if [ -n "$BASH_VERSION" ]; then > > # include .bashrc if it exists > > if [ -f "$HOME/.bashrc" ]; then > > . "$HOME/.bashrc" > > fi > > fi > > > > And .bashrc: > > > > # Test for an interactive shell > > if [[ $- != *i* ]] ; then > > # Shell is non-interactive. Be done now! > > return > > fi > > > > I hope this makes sense. Let me know if not. > > > > Back to your question, let's see in what scenarios you would want to > invoke zsh: > > - Non-interactive shells: No, you don't want `bash command.sh` randomly > exec zsh > > - Interactive non-login shells: No, if you explicitly run `bash`, you > > want bash not zsh. > > - Interactive login shells. Yes, this is what `become tool` runs > > initially and you want bash here. > > > > Hence, to run in a login shell environment you'd want the .profile or > > .bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash > > syntax, so what you want, expressed in code, is in .bash_profile: > > > > if [[ $- = *i* ]]; then > > exec zsh > > fi > > > > As a side note, yes zsh exists on the grid hosts: > > > > zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh > > -rwxr-xr-x 1 root root 819744 Dec 1 2020 /bin/zsh > > lrwxrwxrwx 1 root root 8 Nov 22 2018 /usr/bin/zsh -> /bin/zsh > > > > [1] https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION > > > > YiFei Zhu > > Have you had a chance to take a look at it yet? > > YiFei Zhu > _______________________________________________ > Cloud mailing list -- cloud@lists.wikimedia.org > List information: > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/ >
_______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/