Doh; yes we did. This was a minor glitch in porting the 1.2 series
fix to the trunk/v1.3 (i.e., the fix in v1.2.8 is ok -- whew!).
Fixed on the trunk in r19758; thanks for noticing. I'll file a CMR
for v1.3.
On Oct 16, 2008, at 7:05 PM, Mostyn Lewis wrote:
Jeff,
You broke my ksh (and I expect something else)
Today's SVN 1.4a1r19757
orte/mca/plm/rsh/plm_rsh_module.c
line 471:
tmp = opal_argv_split("( test ! -r ./.profile
|| . ./.profile;", ' ');
^
ARGHH
No (
tmp = opal_argv_split(" test ! -r ./.profile
|| . ./.profile;", ' ');
and all is well again :)
Regards,
Mostyn
On Thu, 9 Oct 2008, Jeff Squyres wrote:
FWIW, the fix has been pushed into the trunk, 1.2.8, and 1.3 SVN
branches. So I'll probably take down the hg tree (we use those as
temporary branches).
On Oct 9, 2008, at 2:32 PM, Hahn Kim wrote:
Hi,
Thanks for providing a fix, sorry for the delay in response. Once
I found out about -x, I've been busy working on the rest of our
code, so I haven't had the time to try out the fix. I'll take a
look at it soon as I can and will let you know how it works out.
Hahn
On Oct 7, 2008, at 5:41 PM, Jeff Squyres wrote:
On Oct 7, 2008, at 4:19 PM, Hahn Kim wrote:
you probably want to set the LD_LIBRARY_PATH (and PATH, likely,
and
possibly others, such as that LICENSE key, etc.) regardless of
whether it's an interactive or non-interactive login.
Right, that's exactly what I want to do. I was hoping that mpirun
would run .profile as the FAQ page stated, but the -x fix works
for
now.
If you're using Bash, it should be running .bashrc. But it looks
like
you did identify a bug that we're *not* running .profile. I have a
Mercurial branch up with a fix if you want to give it a spin:
http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/sh-profile-fixes/
I just realized that I'm using .bash_profile on the x86 and need
to
move its contents into .bashrc and call .bashrc
from .bash_profile,
since eventually I will also be launching MPI jobs onto other x86
processors.
Thanks to everyone for their help.
Hahn
On Oct 7, 2008, at 2:16 PM, Jeff Squyres wrote:
On Oct 7, 2008, at 12:48 PM, Hahn Kim wrote:
Regarding 1., we're actually using 1.2.5. We started using
Open MPI
last winter and just stuck with it. For now, using the -x
flag with
mpirun works. If this really is a bug in 1.2.7, then I think
we'll
stick with 1.2.5 for now, then upgrade later when it's fixed.
It looks like this behavior has been the same throughout the
entire
1.2 series.
Regarding 2., are you saying I should run the commands you
suggest
from the x86 node running bash, so that ssh logs into the Cell
node
running Bourne?
I'm saying that if "ssh othernode env" gives different answers
than
"ssh othernode"/"env", then your .bashrc or .profile or
whatever is
dumping out early depending on whether you have an interactive
login
or not. This is the real cause of the error -- you probably
want to
set the LD_LIBRARY_PATH (and PATH, likely, and possibly others,
such
as that LICENSE key, etc.) regardless of whether it's an
interactive
or non-interactive login.
When I run "ssh othernode env" from the x86 node, I get the
following vanilla environment:
USER=ha17646
HOME=/home/ha17646
LOGNAME=ha17646
SHELL=/bin/sh
PWD=/home/ha17646
When I run "ssh othernode" from the x86 node, then run "env"
on the
Cell, I get the following:
USER=ha17646
LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
HOME=/home/ha17646
MCS_LICENSE_PATH=/opt/MultiCorePlus/mcf.key
LOGNAME=ha17646
TERM=xterm-color
PATH=/usr/local/bin:/usr/bin:/sbin:/bin:/tools/openmpi-1.2.5/
bin:/
tools/cmake-2.4.7/bin:/tools
SHELL=/bin/sh
PWD=/home/ha17646
TZ=EST5EDT
Hahn
On Oct 7, 2008, at 12:07 PM, Jeff Squyres wrote:
Ralph and I just talked about this a bit:
1. In all released versions of OMPI, we *do* source
the .profile
file
on the target node if it exists (because vanilla Bourne
shells do
not
source anything on remote nodes -- Bash does, though, per the
FAQ).
However, looking in 1.2.7, it looks like it might not be
executing
that code -- there *may* be a bug in this area. We're checking
into it.
2. You might want to check your configuration to see if
your .bashrc
is dumping out early because it's a non-interactive shell.
Check
the
output of:
ssh othernode env
vs.
ssh othernode
env
(i.e., a non-interactive running of "env" vs. an interactive
login
and
running "env")
On Oct 7, 2008, at 8:53 AM, Ralph Castain wrote:
I am unaware of anything in the code that would
"source .profile"
for you. I believe the FAQ page is in error here.
Ralph
On Oct 6, 2008, at 7:47 PM, Hahn Kim wrote:
Great, that worked, thanks! However, it still concerns me
that
the
FAQ page says that mpirun will execute .profile which doesn't
seem
to work for me. Are there any configuration issues that
could
possibly be preventing mpirun from doing this? It would
certainly
be more convenient if I could maintain my environment in a
single .profile file instead of adding what could potentially
be a
lot of -x arguments to my mpirun command.
Hahn
On Oct 6, 2008, at 5:44 PM, Aurélien Bouteiller wrote:
tYou can forward your local env with mpirun -x
LD_LIBRARY_PATH. As
an
alternative you can set specific values with mpirun -x
LD_LIBRARY_PATH=/some/where:/some/where/else . More
information
with
mpirun --help (or man mpirun).
Aurelien
Le 6 oct. 08 à 16:06, Hahn Kim a écrit :
Hi,
I'm having difficulty launching an Open MPI job onto a
machine
that
is running the Bourne shell.
Here's my basic setup. I have two machines, one is an x86-
based
machine running bash and the other is a Cell-based machine
running
Bourne shell. I'm running mpirun from the x86 machine,
which
launches a C++ MPI application onto the Cell machine. I
get
the
following error:
error while loading shared libraries: libstdc++.so.6:
cannot
open
shared object file: No such file or directory
The basic problem is that LD_LIBRARY_PATH needs to be set
to
the
directory that contains libstdc++.so.6 for the Cell. I
set the
following line in .profile:
export LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/
4.1.1/32
which is the path to the PPC libraries for Cell.
Now if I log directly into the Cell machine and run the
program
directly from the command line, I don't get the above
error.
But
mpirun still fails, even after setting LD_LIBRARY_PATH
in .profile.
As a sanity check, I did the following. I ran the
following
command
from the x86 machine:
mpirun -np 1 --host cab0 env
which, among others things, shows me the following value:
LD_LIBRARY_PATH=/tools/openmpi-1.2.5/lib:
If I log into the Cell machine and run env directly from
the
command
line, I get the following value:
LD_LIBRARY_PATH=/opt/cell/toolchain/lib/gcc/ppu/4.1.1/32
So it appears that .profile gets sourced when I log in
but not
when
mpirun runs.
However, according to the OpenMPI FAQ
(http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
), mpirun is supposed to directly call .profile since
Bourne
shell
doesn't automatically call it for non-interactive shells.
Does anyone have any insight as to why my environment isn't
being
set properly? Thanks!
Hahn
--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Hahn Kim
MIT Lincoln Laboratory Phone: (781) 981-0940
244 Wood Street, S2-252 Fax: (781) 981-5255
Lexington, MA 02420 E-mail: h...@ll.mit.edu
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems