Hi Francesco

See answers inline.

Francesco Pietra wrote:
Hi Gus:
Partial quick answers below. I have reestablished the ssh connection
so that tomorrow I'll run the tests. Everything that relates to
running amber is on the "parallel computer", where I have access to
everything.

On Mon, Apr 6, 2009 at 7:53 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
Hi Francesco, list

Francesco Pietra wrote:
On Mon, Apr 6, 2009 at 5:21 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
Hi Francesco

Did you try to run examples/connectivity_c.c,
or examples/hello_c.c before trying amber?
They are in the directory where you untarred the OpenMPI tarball.
It is easier to troubleshoot
possible network and host problems
with these simpler programs.
I have found the "examples". Should they be compiled? how? This is my
only question here.
cd examples/
/full/path/to/openmpi/bin/mpicc -o connectivity_c connectivity_c.c

Then run it with, say:

/full/path/to/openmpi/bin/mpirun -host {whatever_hosts_you_want}
-n {as_many_processes_you_want} connectivity_c

Likewise for hello_c.c

What's below is info. Although amber parallel
would have not compiled with faulty openmpi, I'll run openmpi tests as
soon as I understand how.

Also, to avoid confusion,
you may use a full path name to mpirun,
in case you have other MPI flavors in your system.
Often times the mpirun your path is pointing to is not what you
may think it is.

which mpirun
/usr/local/bin/mpirun
Did you install OpenMPI on /usr/local ?
When you do "mpirun -help", do you see "mpirun (Open MPI) 1.3"?

mpirun -help
mpirun (Open MPI) 1.3.1
on the 1st line, then follow the options

Ok, it looks like you installed OpenMPI 1.3.1 with the default
"--prefix" which is /usr/local.



How about the output of "orte_info" ?
orte_info was not installed. See below what has been installed.


Sorry, my fault.
I meant ompi_info (not orte_info).
Please try ompi_info or "ompi_info --config".
It will tell you the compilers used to build OpenMPI, etc.

I presume all of this is being done in the "parallel computer",
i.e., in one of the AMD64 Debian systems, right?


Does it show your Intel compilers, etc?

I guess so, otherwise amber would have not been compiled, but I don't
know the commands to prove it. The intel compilers are on the path:
/opt/intel/cce/10.1.015/bin:/opt/intel/fce/10.1.015/bin and the mkl
are sourced in .bashrc.


Again, all in the AMD64 system, right?

I ask because many Linux distributions come with one or more flavors
of MPI (OpenMPI, MPICH, LAM, etc), some compilers also do (PGI for
instance), some tools (Intel MKL?) may also have their MPI,
and you end up with a bunch of MPI commands
on your path that may produce a big mixup.
This is a pretty common problem that affect new users on this list,
on the MPICH list, on clustering lists, etc.
The errors messages often don't help find the source of the problem,
and people spend a lot of time trying to troubleshoot network,
etc, when is often just a path problem.

So, this is why when you begin, you may want to use full path
names, to avoid confusion.
After the basic MPI functionality is working,
then you can go and fix your path chain,
and rely on your path chain.

there is no other accessible MPI (one application, DOT2, has mpich but
it is a static compilation; DOT2 parallelizatuion requires thar the
computer knows itself, i.e." ssh hostname date" should afford the date
passwordless. The reported issues in testing amber have destroyed this
situation: now deb64 has port22 closed, evem to itself.

Have you tried to reboot the master node, to see if it comes back
to the original ssh setup?
You need ssh to be functional to run OpenMPI code,
including the tests above.

I don't know if you want to run on amd64 alone (master node?)
or on a cluster.
In any case, you may use a list of hosts
or a hostfile on the mpirun command line,
to specify where you want to run.
With amber I use the parallel computer directly and the amber
installation is chown to me. The ssh connection, in this case, only
serves to get file from. or send files to, my desktop.

It is unclear to me what you mean by "the parallel computer directly".
Can you explain better which computers are in this game?
Your desktop and a cluster perhaps?
Are they both Debian 64 Linux?
Where do you compile the programs?
Where do you want to run the programs?

In my .bashrc:

(for amber)
MPI_HOME=/usr/local
export MPI_HOME

(for openmpi)
if [ "$LD_LIBRARY_PATH" ] ; then
 export LD_LIBRARY_PATH="$LD_LIBRARY_PATH'/usr/local/lib"
else
 export LD_LIBRARY_PATH="/usr/local/lib"
fi

Is this on your desktop or on the "parallel computer"?


On both "parallel computers" (there is my desktop, ssh to two uma-type
dual-opteron "parallel computers". Only one was active when the "test"
problems arose. While the (ten years old) destop is i386, both other
machines are amd64, i.e., all debian lenny. I prepare the input files
on the i386 and use it also as storage for backups.

So, you only use your i386 desktop to ssh to the AMD64 machine,
and to prepare input files, etc, right?
The OpenMPI installation, the compilations you do, and the job runs
all happen in the AMD64 system, right?

BTW, do you use each of these systems separately on your
MPI program runs,
or do you use them together?
If you use them together, are they connected through a network,
and did you setup passowrdless ssh connections between them?

The "parallel
computer" has only the X server and a minimal window for a
two-dimensional graphics of amber.

I don't know how amber works, so please tell me.
Do you somehow interact with amber while it is running in parallel mode,
using this "minimal window for a two dimensional graphics"?
Or is this only a data post-processing activity that happens after the
parallel run of amber finishes?

The other parallel computer has a
GeForce 6600 card with GLSL support, which I use to elaborate
graphically the outputs from the numerical computations (using VMD,
Chimera and other 64 bit graphical programs).


There is also

MPICH_HOME=/usr/local
export MPICH_HOME

this is for DOCK, which, with this env variabl, accepts openmpi (at
lest it was so with v 1.2.6)

Oh, well, it looks like there is MPICH already installed on /usr/local.
So, this may be part of the confusion, the path confusion I referred to.

No, there is no MPICH installed. With the above export, DOCK (a
docking program from the same developers of Amber) is so kind to use
the executables of openmpi. The export was suggested by the DOCK
developers, and it worked. Unable to explain why.


OK, this may be a way the DOCK developers found to trick their own
software (DOCK) to think MPICH is installed in /usr/local,
and actually use the OpenMPI libraries instead of MPICH.
They may have hardwired on their build scripts the "MPICH_HOME"
environment variable as the location where the MPI libraries reside.
But which MPI libraries are there may not matter much, I would guess.
Just a guess anyway.
(I have no idea of what the heck DOCK is or how it works.)

As far as the parallel support is concerned, /usr/local/bin only
contains what openmpi 1.3.1 has installed (resulting from ./configure
cc=/path/icc cxx=/path/icpc F77=path/ifort FC=path/ifort
--with-libnuma=/usr/lib):
mpic++ mpicc mpiCC mpicc-vt mpiCC-vt mpic++-vt mpicxx mpicxx-vt
mpiexec mpif77 mpif77-vt mpif90 mpif90-vt mpirun ompi-clean ompi-info
ompi-ps ompi-server opal-wapper opari orte-clean orted orte-iof
orte-ps orterun otfaux otfcompress otfconfig otfdecompress otfdump
otfmerge vtcc vtcxx vtf77 vtf90 vtfilter vtunify. There is no
orte_info.


Of course not.
Doh!  I misspelled the name ... :(
It is ompi_info for sure.


I would suggest installing OpenMPI on a different directory,
using the --prefix option of the OpenMPI configure script.
Do configure --help for details about all configuration options.


the intel compilers (compiled ifort and icc, are sourced in both my
.bashrc and root home .bashrc.

Thanks and apologies for my low level in these affairs. It is the
first time I am faced by such problems, with amd64, same intel
compilers, and openmpi 1.2.6 everything was in order.

To me it doesn't look like the problem is related to the new version
of OpenMPI.

I asked about that because I am using the same commands, .bashrc, etc
that worked with version 1.2.6. The computers are the same, the only
(non minor) difference is upgrading from amd64 etch to amd64 lenny (or
I am doing mistakes that I have not yet detected).

Yes, but I still don't think it is some problem in OpenMPI 1.3.1 that is
causing trouble here.
If it were, the program would start running, but mpirun is having trouble even to start the programs, right?

Since you seem to have also upgraded the Debian release,
therefore another part of the system also changed.
But still, it may not be related to Debian either.
It may be just some confusion on paths, etc.

I really encourage you to try to compile and run the programs in the examples directory.
They are very clear and simple (as opposed to amber, which hides behind
a few layers of software), and even if they fail, the failure will help
clarify the nature of the problem, and find a fix.

Oh, well, I am afraid I am asking more questions than helping out,
but I am trying to understand what is going on.

Gus Correa

Try the test programs with full path names first.
It may not solve the problem, but it may clarify things a bit.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

francesco



Do "/full/path/to/openmpi/bin/mpirun --help" for details.

I am not familiar to amber, but how does it find your openmpi
libraries and compiler wrappers?
Don't you need to give it the paths during configuration,
say,
/configure_amber -openmpi=/full/path/to/openmpi
or similar?

I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Francesco Pietra wrote:
I have compiled openmpi 1.3.1 on debian amd64 lenny with icc/ifort
(10.1.015) and libnuma. Tests passed:

ompi_info | grep libnuma
 MCA affinity: libnuma (MCA v 2.0, API 2.0)

ompi_info | grep maffinity
 MCA affinity: first use (MCA as above)
 MCA affinity: libnuma as above.

Then, I have compiled parallel a molecular dynamics package, amber10,
without error signals but I am having problems in testing the amber
parallel installation.

amber10 configure was set as:

./configure_amber -openmpi -nobintray ifort

just as I used before with openmpi 1.2.6. Could you say if the
-openmpi should be changed?

cd tests

export DO_PARALLEL='mpirun -np 4'

make test.parallel.MM  < /dev/null

cd cytosine && ./Run.cytosine
The authenticity of host deb64 (which is the hostname) (127.0.1.1)
can't be established.
RSA fingerprint .....
connecting ?

I stopped the ssh daemon, whereby tests were interrupted because deb64
(i.e., itself) could no more be accessed. Further attempts under these
conditions failed for the same reason. Now, sshing to deb64 is no more
possible: port 22 closed. In contrast, sshing from deb64 to other
computers occurs passwordless. No such problems arose at the time of
amd64 etch with the same
configuration of ssh, same compilers, and openmpi 1.2.6.

I am here because the warning from the amber site is that I should to
learn how to use my installation of MPI. Therefore, if there is any
clue ..

thanks
francesco pietra
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to