[OMPI users] mpirun hangs

2007-08-14 Thread jody
Hi
I installed openmpi 1.2.2 on a quad core intel machine running fedora 6
(hostname plankton)
I set PATH and LD_LIBRARY in the .zshrc file:
$ echo $PATH
/opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin
$ echo $LD_LIBRARY_PATH
/opt/openmpi/lib:

When i run
$ mpirun -np 2 ./MPITest2
i get the message
./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0: cannot
open shared object file: No such file or directory
./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0: cannot
open shared object file: No such file or directory

However
$ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
works.  Any explanation?

Second problem:
I have also  installed openmpi 1.2.2 on an AMD machine running gentoo linux
(hostname nano_02).
Here as well PATH and LD_LIBRARY_PATH are set correctly,
and
$ mpirun -np 2 ./MPITest2
works locally on nano_02.

If, however, from plankton i call
$ mpirun -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2
the call hangs with no output whatsoever.
Any pointers on how to solve this problem?

Thank You
  Jody


[OMPI users] libmpi.so.0 problem

2007-08-14 Thread Rodrigo Faccioli
Hi,

I need to know what I can resolve my problem. I'm starting my study on mpi,
more specificaly open-mpi.

But, when I execute mpirun a.out, the message I received is: a.out: error
while loading shared libraries: libmpi.so.0: cannot open shared object file:
No such file or directory

The a.out file was obtained through mpicc hello.c

Thanks.


Re: [OMPI users] libmpi.so.0 problem

2007-08-14 Thread Tim Prins

You need to set your LD_LIBRARY_PATH. See these FAQ entries:
http://www.open-mpi.org/faq/?category=running#run-prereqs
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path

Tim

Rodrigo Faccioli wrote:

Hi,

I need to know what I can resolve my problem. I'm starting my study on 
mpi, more specificaly open-mpi.


But, when I execute mpirun a.out, the message I received is: a.out: 
error while loading shared libraries: libmpi.so.0: cannot open shared 
object file: No such file or directory


The a.out file was obtained through mpicc hello.c

Thanks.





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpirun hangs

2007-08-14 Thread Tim Prins

Hi Jody,

jody wrote:

Hi
I installed openmpi 1.2.2 on a quad core intel machine running fedora 6 
(hostname plankton)

I set PATH and LD_LIBRARY in the .zshrc file:
Note that .zshrc is only used for interactive logins. You need to setup 
your system so the LD_LIBRARY_PATH and PATH is also set for 
non-interactive logins. See this zsh FAQ entry for what files you need 
to modify:

http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19

(BTW: I do not use zsh, but my assumption is that the file you want to 
set the PATH and LD_LIBRARY_PATH in is .zshenv)
$ echo $PATH 
/opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin 


$ echo $LD_LIBRARY_PATH
/opt/openmpi/lib:

When i run
$ mpirun -np 2 ./MPITest2
i get the message
./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory
./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory


However
$ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
works.  Any explanation?

Yes, the LD_LIBRARY_PATH is probably not set correctly. Try running:
mpirun -np 2 ldd ./MPITest2

This should show what libraries your executable is using. Make sure all 
of the libraries are resolved.


Also, try running:
mpirun -np 1 printenv |grep LD_LIBRARY_PATH
to see what the LD_LIBRARY_PATH is for you executables. Note that you 
can NOT simply run mpirun echo $LD_LIBRARY_PATH, as the variable will be 
interpreted in the executing shell.




Second problem:
I have also  installed openmpi 1.2.2 on an AMD machine running gentoo 
linux (hostname nano_02).

Here as well PATH and LD_LIBRARY_PATH are set correctly,
and
$ mpirun -np 2 ./MPITest2
works locally on nano_02.

If, however, from plankton i call
$ mpirun -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2
the call hangs with no output whatsoever.
Any pointers on how to solve this problem?

Try running:
mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02 
./MPI2Test2


This should give some more output as to what is happening.

Hope this helps,

Tim



Thank You
  Jody





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Help : Need some tuning, or is it a bug ?

2007-08-14 Thread Tim Prins

Guillaume THOMAS-COLLIGNON wrote:

Hi,

I wrote an application which works fine on a small number of nodes  
(eg. 4), but it crashes on a large number of CPUs.


In this application, all the slaves send many small messages to the  
master. I use the regular MPI_Send, and since the messages are  
relatively small (1 int, then many times 3296 ints), OpenMPI does a  
very good job at sending them asynchronously, and it maxes out the  
gigabit link on the master node. I'm very happy with this behaviour,  
it gives me the same performance as if I was doing all the  
asynchronous stuff myself, and the code remains simple.


But it crashes when there are too many slaves. 
How many is too many? I successfully ran your code on 96 nodes, with 4 
processes per node and it seemed to work fine. Also, what network are 
you using?


So it looks like at  
some point the master node runs out of buffers and the job crashes  
brutally. 

What do you mean by crashing? Is there a segfault or an error message?

Tim


That's my understanding but I may be wrong.
If I use explicit synchronous sends (MPI_Ssend), it does not crash  
anymore but the performance is a lot lower.


I have 2 questions regarding this :

1) What kind of tuning would help handling more messages and keep the  
master from crashing ?


2) Is this the expected behaviour ? I don't think my code is doing  
anything wrong, so I would not expect a brutal crash.



The workaround I've found so far is to do an MPI_Ssend for the  
request, then use MPI_Send for the data blocks. So all the slaves are  
blocked on the request, it keeps the master from being flooded, and  
the performance is still good. But nothing tells me it won't crash at  
some point if I have more data blocks in my real code, so I'd like to  
know more about what's happening here.


Thanks,

-Guillaume


Here is the code, so you get a better idea of the communication  
scheme, or if you someone wants to reproduce the problem.



#include 
#include 

#include 

#define BLOCKSIZE 3296
#define MAXBLOCKS 1000
#define NLOOP 4

int main (int argc, char **argv) {
   int i, j, ier, rank, npes, slave, request;
   int *data;
   MPI_Status status;

   MPI_Init (&argc, &argv);
   MPI_Comm_rank (MPI_COMM_WORLD, &rank);
   MPI_Comm_size (MPI_COMM_WORLD, &npes);

   if ((data = (int *) calloc (BLOCKSIZE, sizeof (int))) == NULL)
 return -10;

   // Master
   if (rank == 0) {
 // Expect (NLOOP * number of slaves) requests
 for (i=0; i<(npes-1)*NLOOP; i++) {
   /* Wait for a request from any slave. Request contains number  
of data blocks */
   ier = MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, 964,  
MPI_COMM_WORLD, &status);

   if (ier != MPI_SUCCESS)
return -1;
   slave = status.MPI_SOURCE;
   printf ("Master : request for %d blocks from slave %d\n",  
request, slave);


   /* Receive the data blocks from this slave */
   for (j=0; j	ier = MPI_Recv (data, BLOCKSIZE, MPI_INT, slave, 993,  
MPI_COMM_WORLD, &status);

if (ier != MPI_SUCCESS)
  return -2;
   }
 }
   }
   // Slaves
   else {
 for (i=0; i   /* Send the request = number of blocks we want to send to the  
master */

   request = MAXBLOCKS;
/* Changing this MPI_Send to MPI_Ssend is enough to keep the master  
from being flooded */

   ier = MPI_Send (&request, 1, MPI_INT, 0, 964, MPI_COMM_WORLD);
   if (ier != MPI_SUCCESS)
return -3;
   /* Send the data blocks */
   for (j=0; j

Re: [OMPI users] libmpi.so.0 problem

2007-08-14 Thread Rodrigo Faccioli
Thanks, Tim Prins for your email.

However It did't resolve my problem.

I set the enviroment variable on my Kubuntu Linux:

faccioli@faccioli-desktop:/usr/local/lib$
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin
faccioli@faccioli-desktop:/usr/local/lib$ LD_LIBRARY_PATH=/usr/local/lib/

Therefore, set command will display:

BASH=/bin/bash
BASH_ARGC=()
BASH_ARGV=()
BASH_COMPLETION=/etc/bash_completion
BASH_COMPLETION_DIR=/etc/bash_completion.d
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="3" [1]="2" [2]="13" [3]="1" [4]="release"
[5]="x86_64-pc-linux-gnu")
BASH_VERSION='3.2.13(1)-release'
COLORTERM=
COLUMNS=83
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-C83Ve0QbQz,guid=e07c2bd483a99b50932d080046c199e9
DESKTOP_SESSION=default
DIRSTACK=()
DISPLAY=:0.0
DM_CONTROL=/var/run/xdmctl
EUID=1000
GROUPS=()
GS_LIB=/home/faccioli/.fonts
GTK2_RC_FILES=/home/faccioli/.gtkrc-
2.0-kde:/home/faccioli/.kde/share/config/gtkrc-2.0
GTK_RC_FILES=/etc/gtk/gtkrc:/home/faccioli/.gtkrc:/home/faccioli/.kde/share/config/gtkrc
HISTCONTROL=ignoreboth
HISTFILE=/home/faccioli/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/faccioli
HOSTNAME=faccioli-desktop
HOSTTYPE=x86_64
IFS=$' \t\n'
KDE_FULL_SESSION=true
KDE_MULTIHEAD=false
KONSOLE_DCOP='DCOPRef(konsole-5587,konsole)'
KONSOLE_DCOP_SESSION='DCOPRef(konsole-5587,session-2)'
LANG=en_US.UTF-8
LD_LIBRARY_PATH=/usr/local/lib/
LESSCLOSE='/usr/bin/lesspipe %s %s'
LESSOPEN='| /usr/bin/lesspipe %s'
LINES=33
LOGNAME=faccioli
LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:'
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OLDPWD=/home/faccioli
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin
PIPESTATUS=([0]="0")
PPID=5587

Unfortunately,  when I execute mpirun a.out, the message I received is:
a.out:  error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory

Thanks,


On 8/14/07, Tim Prins  wrote:
>
> You need to set your LD_LIBRARY_PATH. See these FAQ entries:
> http://www.open-mpi.org/faq/?category=running#run-prereqs
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>
> Tim
>
> Rodrigo Faccioli wrote:
> > Hi,
> >
> > I need to know what I can resolve my problem. I'm starting my study on
> > mpi, more specificaly open-mpi.
> >
> > But, when I execute mpirun a.out, the message I received is: a.out:
> > error while loading shared libraries: libmpi.so.0: cannot open shared
> > object file: No such file or directory
> >
> > The a.out file was obtained through mpicc hello.c
> >
> > Thanks.
> >
> >
> >
> > 
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] mpirun hangs

2007-08-14 Thread jody
Hi TIm
thanks for the suggestions.

I now set both paths  in .zshenv but it seems that LD_LIBRARY_PATH
still does not get set.
The ldd experment shows that all openmpi libraries are not found,
and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is not.

It is rather unclear why this happens...

As to thew second problem:
$ mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02
./MPI2Test2
[aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: connect to
130.60.49.134:40618 failed: Software caused connection abort (103)
[aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: connect to
130.60.49.134:40618 failed, connecting over all interfaces failed!
[aim-nano_02:05455] OOB: Connection to HNP lost
[aim-plankton.unizh.ch:24222] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[aim-plankton.unizh.ch:24222] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1164
[aim-plankton.unizh.ch:24222] [0,0,0] ORTE_ERROR_LOG: Timeout in file
errmgr_hnp.c at line 90
[aim-plankton.unizh.ch:24222] ERROR: A daemon on node nano_02 failed to
start as expected.
[aim-plankton.unizh.ch:24222] ERROR: There may be more information available
from
[aim-plankton.unizh.ch:24222] ERROR: the remote shell (see above).
[aim-plankton.unizh.ch:24222] ERROR: The daemon exited unexpectedly with
status 1.
[aim-plankton.unizh.ch:24222] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[aim-plankton.unizh.ch:24222] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1196

The strange thing is that nano_02's address is 130.60.49.130 and plankton's
(the caller) is 130.60.49 134.
I also made sure that nano_02 cann ssh to plankton without password, but
that didn't change the output.

Does this message give any hints as to the problem?

Jody


On 8/14/07, Tim Prins  wrote:
>
> Hi Jody,
>
> jody wrote:
> > Hi
> > I installed openmpi 1.2.2 on a quad core intel machine running fedora 6
> > (hostname plankton)
> > I set PATH and LD_LIBRARY in the .zshrc file:
> Note that .zshrc is only used for interactive logins. You need to setup
> your system so the LD_LIBRARY_PATH and PATH is also set for
> non-interactive logins. See this zsh FAQ entry for what files you need
> to modify:
> http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19
>
> (BTW: I do not use zsh, but my assumption is that the file you want to
> set the PATH and LD_LIBRARY_PATH in is .zshenv)
> > $ echo $PATH
> >
> /opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin
> >
> > $ echo $LD_LIBRARY_PATH
> > /opt/openmpi/lib:
> >
> > When i run
> > $ mpirun -np 2 ./MPITest2
> > i get the message
> > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
> > cannot open shared object file: No such file or directory
> > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
> > cannot open shared object file: No such file or directory
> >
> > However
> > $ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
> > works.  Any explanation?
> Yes, the LD_LIBRARY_PATH is probably not set correctly. Try running:
> mpirun -np 2 ldd ./MPITest2
>
> This should show what libraries your executable is using. Make sure all
> of the libraries are resolved.
>
> Also, try running:
> mpirun -np 1 printenv |grep LD_LIBRARY_PATH
> to see what the LD_LIBRARY_PATH is for you executables. Note that you
> can NOT simply run mpirun echo $LD_LIBRARY_PATH, as the variable will be
> interpreted in the executing shell.
>
> >
> > Second problem:
> > I have also  installed openmpi 1.2.2 on an AMD machine running gentoo
> > linux (hostname nano_02).
> > Here as well PATH and LD_LIBRARY_PATH are set correctly,
> > and
> > $ mpirun -np 2 ./MPITest2
> > works locally on nano_02.
> >
> > If, however, from plankton i call
> > $ mpirun -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2
> > the call hangs with no output whatsoever.
> > Any pointers on how to solve this problem?
> Try running:
> mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02
> ./MPI2Test2
>
> This should give some more output as to what is happening.
>
> Hope this helps,
>
> Tim
>
> >
> > Thank You
> >   Jody
> >
> >
> >
> > 
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] libmpi.so.0 problem

2007-08-14 Thread Durga Choudhury
Did you export your variables? Otherwise the child shell that forks the MPI
process will not inherit it.



On 8/14/07, Rodrigo Faccioli  wrote:
>
> Thanks, Tim Prins for your email.
>
> However It did't resolve my problem.
>
> I set the enviroment variable on my Kubuntu Linux:
>
> faccioli@faccioli-desktop:/usr/local/lib$
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin
>
> faccioli@faccioli-desktop:/usr/local/lib$ LD_LIBRARY_PATH=/usr/local/lib/
>
>
> Therefore, set command will display:
>
> BASH=/bin/bash
> BASH_ARGC=()
> BASH_ARGV=()
> BASH_COMPLETION=/etc/bash_completion
> BASH_COMPLETION_DIR=/etc/bash_completion.d
> BASH_LINENO=()
> BASH_SOURCE=()
> BASH_VERSINFO=([0]="3" [1]="2" [2]="13" [3]="1" [4]="release"
> [5]="x86_64-pc-linux-gnu")
> BASH_VERSION='3.2.13(1)-release'
> COLORTERM=
> COLUMNS=83
>
> DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-C83Ve0QbQz,guid=e07c2bd483a99b50932d080046c199e9
> DESKTOP_SESSION=default
> DIRSTACK=()
> DISPLAY=: 0.0
> DM_CONTROL=/var/run/xdmctl
> EUID=1000
> GROUPS=()
> GS_LIB=/home/faccioli/.fonts
> GTK2_RC_FILES=/home/faccioli/.gtkrc-
> 2.0-kde:/home/faccioli/.kde/share/config/gtkrc-2.0
> GTK_RC_FILES=/etc/gtk/gtkrc:/home/faccioli/.gtkrc:/home/faccioli/.kde/share/config/gtkrc
>
> HISTCONTROL=ignoreboth
> HISTFILE=/home/faccioli/.bash_history
> HISTFILESIZE=500
> HISTSIZE=500
> HOME=/home/faccioli
> HOSTNAME=faccioli-desktop
> HOSTTYPE=x86_64
> IFS=$' \t\n'
> KDE_FULL_SESSION=true
> KDE_MULTIHEAD=false
> KONSOLE_DCOP='DCOPRef(konsole-5587,konsole)'
> KONSOLE_DCOP_SESSION='DCOPRef(konsole-5587,session-2)'
> LANG=en_US.UTF-8
> LD_LIBRARY_PATH=/usr/local/lib/
> LESSCLOSE='/usr/bin/lesspipe %s %s'
> LESSOPEN='| /usr/bin/lesspipe %s'
> LINES=33
> LOGNAME=faccioli
> LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:'
>
> MACHTYPE=x86_64-pc-linux-gnu
> MAILCHECK=60
> OLDPWD=/home/faccioli
> OPTERR=1
> OPTIND=1
> OSTYPE=linux-gnu
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin
>
> PIPESTATUS=([0]="0")
> PPID=5587
>
> Unfortunately,  when I execute mpirun a.out, the message I received is:
> a.out:  error while loading shared libraries: libmpi.so.0 : cannot open
> shared object file: No such file or directory
>
> Thanks,
>
>
> On 8/14/07, Tim Prins < tpr...@open-mpi.org> wrote:
> >
> > You need to set your LD_LIBRARY_PATH. See these FAQ entries:
> > http://www.open-mpi.org/faq/?category=running#run-prereqs
> > http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
> >
> > Tim
> >
> > Rodrigo Faccioli wrote:
> > > Hi,
> > >
> > > I need to know what I can resolve my problem. I'm starting my study on
> > > mpi, more specificaly open-mpi.
> > >
> > > But, when I execute mpirun a.out, the message I received is: a.out:
> > > error while loading shared libraries: libmpi.so.0: cannot open shared
> > > object file: No such file or directory
> > >
> > > The a.out file was obtained through mpicc hello.c
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > 
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Its a battle between humans and communists;
Which side are you in?
. 


Re: [OMPI users] segmentation faults

2007-08-14 Thread Adams, Samuel D Contr AFRL/HEDR
So I ran valgrind on my code and it came up with a few thousand memory
errors, but none of them had anything to do with the code I wrote.  It
gave a few errors for the LDAP authentication stuff at the beginning,
but most of the error came from orte*.  The only part that made
reference to my code was in the main file on line 13 where I include
mpi.h.  This seems suspect to me to have so many "error" in well used
and test codes.  Also the stack trace errors that I previously posted
showed errors in places in my code that have been stable and unchanged
for about a year.  

It seems like maybe this is some kind of error with the system
configuration or something like that.  It just seems too odd for these
memory faults to just appear like that.  

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Monday, August 13, 2007 4:13 PM
To: Open MPI Users
Subject: Re: [OMPI users] segmentation faults

It *looks* like a run-of-the-mill memory-badness kind of error, but  
it's impossible to say without more information.

Are you able to run this through valgrind or some other memory- 
checking debugger?  It looks like the single process case may be the  
simplest to check...?


On Aug 13, 2007, at 5:03 PM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I tried to run a code that I have running for a while now this  
> morning,
> but for some reason it is causing segmentation faults.  I can't really
> think of anything that I have done recently that would be causing  
> these
> errors.  Does anyone have any idea?
>
> I get this running it on more than one processor..
> [sam@prodnode1 all]$ mpirun -np 2 --prefix
> /usr/local/profiles/gcc-openmpi/ /home/sam/code/fdtd/fdtd_0.3/fdtd -t
> /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r
> /home/sam/code/fdtd/fdtd_0.3/test_files/tester_x002y002z004.raw -v -f
> 3000 --pw 90,0,1,0 -l test_log.out -a 1
> [prodnode1:04400] *** Process received signal ***
> [prodnode1:04400] Signal: Segmentation fault (11)
> [prodnode1:04400] Signal code: Invalid permissions (2)
> [prodnode1:04400] Failing at address: 0x2b48
> [prodnode1:04399] *** Process received signal ***
> [prodnode1:04399] Signal: Segmentation fault (11)
> [prodnode1:04399] Signal code: Invalid permissions (2)
> [prodnode1:04399] Failing at address: 0x2b0a0a48
> [prodnode1:04400] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04400] [ 1]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc 
> +0x2a5)
> [0x2afda345]
> [prodnode1:04400] [ 2]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa)
> [0x2afdbd8a]
> [prodnode1:04400] [ 3]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3]
> [prodnode1:04400] [ 4]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09]
> [prodnode1:04400] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04400] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04400] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04400] *** End of error message ***
> [prodnode1:04399] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04399] [ 1]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc 
> +0x2a5)
> [0x2afda345]
> [prodnode1:04399] [ 2]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa)
> [0x2afdbd8a]
> [prodnode1:04399] [ 3]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3]
> [prodnode1:04399] [ 4]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09]
> [prodnode1:04399] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04399] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04399] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04399] *** End of error message ***
> mpirun noticed that job rank 0 with PID 4399 on node
> prodnode1.brooks.af.mil exited on signal 11 (Segmentation fault).
> 1 additional process aborted (not shown)
>
> --Or I get this if I run it on just one processor.
> [sam@prodnode1 all]$ ./script2.sh [prodnode1:04405] *** Process  
> received
> signal ***
> [prodnode1:04405] Signal: Segmentation fault (11)
> [prodnode1:04405] Signal code: Address not mapped (1)
> [prodnode1:04405] Failing at address: 0x18
> [prodnode1:04405] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40]
> [prodnode1:04405] [ 1] /home/sam/code/fdtd/fdtd_0.3/fdtd(calcMass 
> +0xac)
> [0x40443c]
> [prodnode1:04405] [ 2]
> /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x5a1) [0x404c21]
> [prodnode1:04405] [ 3] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41)
> [0x404eb1]
> [prodnode1:04405] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3aa781d8a4]
> [prodnode1:04405] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9]
> [prodnode1:04405] *** End of error message ***
> mpirun noticed that job rank 0 wit

Re: [OMPI users] libmpi.so.0 problem

2007-08-14 Thread Tim Prins
In general, exporting the variables is good enough. You really should be 
setting the variables in the appropriate shell (non-interactive) login 
scripts, such as .bashrc (I again point you to the same FAQ entries for 
more information: 
http://www.open-mpi.org/faq/?category=running#run-prereqs and 
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path )


Try running:
mpirun -np 1 printenv
to see what variables are set.

Also,
mpirun -np 1 ldd a.out
will show the libraries your executable is trying to use.

Tim

Durga Choudhury wrote:
Did you export your variables? Otherwise the child shell that forks the 
MPI process will not inherit it.



 
On 8/14/07, *Rodrigo Faccioli* > wrote:


Thanks, Tim Prins for your email.

However It did't resolve my problem.

I set the enviroment variable on my Kubuntu Linux:

faccioli@faccioli-desktop:/usr/local/lib$

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin

faccioli@faccioli-desktop:/usr/local/lib$
LD_LIBRARY_PATH=/usr/local/lib/
 


Therefore, set command will display:

BASH=/bin/bash
BASH_ARGC=()
BASH_ARGV=()
BASH_COMPLETION=/etc/bash_completion
BASH_COMPLETION_DIR=/etc/bash_completion.d
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="3" [1]="2" [2]="13" [3]="1" [4]="release"
[5]="x86_64-pc-linux-gnu")
BASH_VERSION='3.2.13(1)-release'
COLORTERM=
COLUMNS=83

DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-C83Ve0QbQz,guid=e07c2bd483a99b50932d080046c199e9
DESKTOP_SESSION=default
DIRSTACK=()
DISPLAY=: 0.0
DM_CONTROL=/var/run/xdmctl
EUID=1000
GROUPS=()
GS_LIB=/home/faccioli/.fonts

GTK2_RC_FILES=/home/faccioli/.gtkrc-2.0-kde:/home/faccioli/.kde/share/config/gtkrc-2.0

GTK_RC_FILES=/etc/gtk/gtkrc:/home/faccioli/.gtkrc:/home/faccioli/.kde/share/config/gtkrc

HISTCONTROL=ignoreboth
HISTFILE=/home/faccioli/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/faccioli
HOSTNAME=faccioli-desktop
HOSTTYPE=x86_64
IFS=$' \t\n'
KDE_FULL_SESSION=true
KDE_MULTIHEAD=false
KONSOLE_DCOP='DCOPRef(konsole-5587,konsole)'
KONSOLE_DCOP_SESSION='DCOPRef(konsole-5587,session-2)'
LANG=en_US.UTF-8
LD_LIBRARY_PATH=/usr/local/lib/
LESSCLOSE='/usr/bin/lesspipe %s %s'
LESSOPEN='| /usr/bin/lesspipe %s'
LINES=33
LOGNAME=faccioli

LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:'

MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OLDPWD=/home/faccioli
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin

PIPESTATUS=([0]="0")
PPID=5587

Unfortunately,  when I execute mpirun a.out, the message I received
is: a.out:  error while loading shared libraries: libmpi.so.0 :
cannot open shared object file: No such file or directory

Thanks,


On 8/14/07, *Tim Prins* < tpr...@open-mpi.org
 > wrote:

You need to set your LD_LIBRARY_PATH. See these FAQ entries:
http://www.open-mpi.org/faq/?category=running#run-prereqs
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path


Tim

Rodrigo Faccioli wrote:
 > Hi,
 >
 > I need to know what I can resolve my problem. I'm starting my
study on
 > mpi, more specificaly open-mpi.
 >
 > But, when I execute mpirun a.out, the message I received is:
a.out:
 > error while loading shared libraries: libmpi.so.0: cannot
open shared
 > object file: No such file or directory
 >
 > The a.out file was obtained through mpicc hello.c
 >
 > Thanks.
 >
 >
 >
 >

 >
 > ___
 > users mailing list
 > us...@open-mpi.org 
 > http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org 
  

Re: [OMPI users] libmpi.so.0 problem

2007-08-14 Thread Tim Prins

I meant to say, "exporting the variables is *not* good enough".

Tim

Tim Prins wrote:
In general, exporting the variables is good enough. You really should be 
setting the variables in the appropriate shell (non-interactive) login 
scripts, such as .bashrc (I again point you to the same FAQ entries for 
more information: 
http://www.open-mpi.org/faq/?category=running#run-prereqs and 
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path )


Try running:
mpirun -np 1 printenv
to see what variables are set.

Also,
mpirun -np 1 ldd a.out
will show the libraries your executable is trying to use.

Tim

Durga Choudhury wrote:
Did you export your variables? Otherwise the child shell that forks the 
MPI process will not inherit it.



 
On 8/14/07, *Rodrigo Faccioli* > wrote:


Thanks, Tim Prins for your email.

However It did't resolve my problem.

I set the enviroment variable on my Kubuntu Linux:

faccioli@faccioli-desktop:/usr/local/lib$

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin

faccioli@faccioli-desktop:/usr/local/lib$
LD_LIBRARY_PATH=/usr/local/lib/
 


Therefore, set command will display:

BASH=/bin/bash
BASH_ARGC=()
BASH_ARGV=()
BASH_COMPLETION=/etc/bash_completion
BASH_COMPLETION_DIR=/etc/bash_completion.d
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="3" [1]="2" [2]="13" [3]="1" [4]="release"
[5]="x86_64-pc-linux-gnu")
BASH_VERSION='3.2.13(1)-release'
COLORTERM=
COLUMNS=83

DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-C83Ve0QbQz,guid=e07c2bd483a99b50932d080046c199e9
DESKTOP_SESSION=default
DIRSTACK=()
DISPLAY=: 0.0
DM_CONTROL=/var/run/xdmctl
EUID=1000
GROUPS=()
GS_LIB=/home/faccioli/.fonts

GTK2_RC_FILES=/home/faccioli/.gtkrc-2.0-kde:/home/faccioli/.kde/share/config/gtkrc-2.0

GTK_RC_FILES=/etc/gtk/gtkrc:/home/faccioli/.gtkrc:/home/faccioli/.kde/share/config/gtkrc

HISTCONTROL=ignoreboth
HISTFILE=/home/faccioli/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/faccioli
HOSTNAME=faccioli-desktop
HOSTTYPE=x86_64
IFS=$' \t\n'
KDE_FULL_SESSION=true
KDE_MULTIHEAD=false
KONSOLE_DCOP='DCOPRef(konsole-5587,konsole)'
KONSOLE_DCOP_SESSION='DCOPRef(konsole-5587,session-2)'
LANG=en_US.UTF-8
LD_LIBRARY_PATH=/usr/local/lib/
LESSCLOSE='/usr/bin/lesspipe %s %s'
LESSOPEN='| /usr/bin/lesspipe %s'
LINES=33
LOGNAME=faccioli

LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:'

MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OLDPWD=/home/faccioli
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin

PIPESTATUS=([0]="0")
PPID=5587

Unfortunately,  when I execute mpirun a.out, the message I received
is: a.out:  error while loading shared libraries: libmpi.so.0 :
cannot open shared object file: No such file or directory

Thanks,


On 8/14/07, *Tim Prins* < tpr...@open-mpi.org
 > wrote:

You need to set your LD_LIBRARY_PATH. See these FAQ entries:
http://www.open-mpi.org/faq/?category=running#run-prereqs
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path


Tim

Rodrigo Faccioli wrote:
 > Hi,
 >
 > I need to know what I can resolve my problem. I'm starting my
study on
 > mpi, more specificaly open-mpi.
 >
 > But, when I execute mpirun a.out, the message I received is:
a.out:
 > error while loading shared libraries: libmpi.so.0: cannot
open shared
 > object file: No such file or directory
 >
 > The a.out file was obtained through mpicc hello.c
 >
 > Thanks.
 >
 >
 >
 >

 >
 > ___
 > users mailing list
 > us...@open-mpi.org 
 > http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] mpirun hangs

2007-08-14 Thread Tim Prins

Jody,

jody wrote:

Hi TIm
thanks for the suggestions.

I now set both paths  in .zshenv but it seems that LD_LIBRARY_PATH
still does not get set.
The ldd experment shows that all openmpi libraries are not found,
and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is 
not.
Are you setting LD_LIBRARY_PATH anywhere else in your scripts? I have, 
on more than one occasion, forgotten that I needed to do:

export LD_LIBRARY_PATH="/foo:$LD_LIBRARY_PATH"

Instead of just:
export LD_LIBRARY_PATH="/foo"



It is rather unclear why this happens...

As to thew second problem:
$ mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02 
./MPI2Test2
[aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: 
connect to 130.60.49.134:40618  failed: 
(103)
[aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect: 
connect to 130.60.49.134:40618  failed, 
connecting over all interfaces failed!

[aim-nano_02:05455] OOB: Connection to HNP lost
[aim-plankton.unizh.ch:24222 ] 
[0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at 
line 275
[aim-plankton.unizh.ch:24222 ] 
[0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164
[aim-plankton.unizh.ch:24222 ] 
[0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[aim-plankton.unizh.ch:24222 ] 
ERROR: A daemon on node nano_02 failed to start as expected.
[ aim-plankton.unizh.ch:24222 ] 
ERROR: There may be more information available from
[aim-plankton.unizh.ch:24222 ] 
ERROR: the remote shell (see above).
[ aim-plankton.unizh.ch:24222 ] 
ERROR: The daemon exited unexpectedly with status 1.
[aim-plankton.unizh.ch:24222 ] 
[0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at 
line 188
[aim-plankton.unizh.ch:24222 ] 
[0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196


The strange thing is that nano_02's address is 130.60.49.130 
 and plankton's (the caller) is 130.60.49 134.
I also made sure that nano_02 cann ssh to plankton without password, but 
that didn't change the output.


What is happening here is that the daemon launched on nano_02 is trying 
to contact mpirun on plankton, and is failing for some reason.


Do you have any firewalls/port filtering enabled on nano_02? Open MPI 
generally cannot be run when there are any firewalls on the machines 
being used.


Hope this helps,

Tim



Does this message give any hints as to the problem?

Jody


On 8/14/07, *Tim Prins* > wrote:


Hi Jody,

jody wrote:
 > Hi
 > I installed openmpi 1.2.2 on a quad core intel machine running
fedora 6
 > (hostname plankton)
 > I set PATH and LD_LIBRARY in the .zshrc file:
Note that .zshrc is only used for interactive logins. You need to setup
your system so the LD_LIBRARY_PATH and PATH is also set for
non-interactive logins. See this zsh FAQ entry for what files you need
to modify:
http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19


(BTW: I do not use zsh, but my assumption is that the file you want to
set the PATH and LD_LIBRARY_PATH in is .zshenv)
 > $ echo $PATH
 >

/opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin

 >
 > $ echo $LD_LIBRARY_PATH
 > /opt/openmpi/lib:
 >
 > When i run
 > $ mpirun -np 2 ./MPITest2
 > i get the message
 > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
 > cannot open shared object file: No such file or directory
 > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
 > cannot open shared object file: No such file or directory
 >
 > However
 > $ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
 > works.  Any explanation?
Yes, the LD_LIBRARY_PATH is probably not set correctly. Try running:
mpirun -np 2 ldd ./MPITest2

This should show what libraries your executable is using. Make sure all
of the libraries are resolved.

Also, try running:
mpirun -np 1 printenv |grep LD_LIBRARY_PATH
to see what the LD_LIBRARY_PATH is for you executables. Note that you
can NOT simply run mpirun echo $LD_LIBRARY_PATH, as the variable
will be
interpreted in the executing shell.

 >
 > Second problem:
 > I have also  installed openmpi 1.2.2 on an AMD machine running gentoo
 > linux (hostname nano_02).
 > Here as well PATH and LD_LIBRARY_PATH are set correctly,
 > and
 > $ mpirun -np 2 ./MPITest2
 > works locally on nano_02.
  

Re: [OMPI users] MPI_AllReduce design for shared-memory.

2007-08-14 Thread smairal
Can anyone help on this?

-Thanks,
Sarang.

Quoting smai...@ksu.edu:

> Hi,
> I am doing a research on parallel techniques for shared-memory
> systems(NUMA). I understand that OpenMPI is intelligent to utilize
> shared-memory system and it uses processor-affinity. Is the OpenMPI
> design of MPI_AllReduce "same" for shared-memory (NUMA) as well as
> distributed system? Can someone please tell me MPI_AllReduce design,
> in
> brief, in terms of processes and their interaction on shared-memory?
> Else please suggest me a good reference for this.
>
> -Thanks,
> Sarang.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>




Re: [OMPI users] MPI_AllReduce design for shared-memory.

2007-08-14 Thread smairal
Thanks, I understand what you are saying. But my query is regarding the
design of MPI_AllReduce for shared-memory systems. I mean is there any
different logic/design of MPI_AllReduce when OpenMPI is used on
shared-memory systems?
The standard MPI_AllReduce says,
1. Each MPI process sends its value (and WAIT for others to send)
2. Values from all the processes is combined
3. Computed result is sent back to all processes (all LEAVE)
Does OpenMPI implement the same logic/design for shared-memory system or
it has some other way of doing it for shared-memory?

-Thanks,
Sarang.

Quoting "Yuan,  Huapeng" :

> HI,
>
> I think it has nothing to do with shared memory. It just has
> something
> to do with process or thread.
> So, with interprocess, you can use mpi in shared memory (multicore or
> distributed shared memory). But for multiple threads in the same
> process, it cannot be used.
>
>
> Hope this helps.
>
>
> Quoting smai...@ksu.edu:
>
> > Can anyone help on this?
> >
> > -Thanks,
> > Sarang.
> >
> > Quoting smai...@ksu.edu:
> >
> >> Hi,
> >> I am doing a research on parallel techniques for shared-memory
> >> systems(NUMA). I understand that OpenMPI is intelligent to utilize
> >> shared-memory system and it uses processor-affinity. Is the
> OpenMPI
> >> design of MPI_AllReduce "same" for shared-memory (NUMA) as well as
> >> distributed system? Can someone please tell me MPI_AllReduce
> design,
> >> in
> >> brief, in terms of processes and their interaction on
> shared-memory?
> >> Else please suggest me a good reference for this.
> >>
> >> -Thanks,
> >> Sarang.
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
>




Re: [OMPI users] MPI_AllReduce design for shared-memory.

2007-08-14 Thread Jeff Squyres
The primary person you need to talk to is turning in her dissertation  
within the next few days.  So I think she's kinda busy at the  
moment...  :-)


Sorry for the delay -- I'll take a shot at answers below...


On Aug 14, 2007, at 4:39 PM, smai...@ksu.edu wrote:


Can anyone help on this?

-Thanks,
Sarang.

Quoting smai...@ksu.edu:


Hi,
I am doing a research on parallel techniques for shared-memory
systems(NUMA). I understand that OpenMPI is intelligent to utilize
shared-memory system and it uses processor-affinity.


Open MPI has coarse-grained processor-affinity control, see:

http://www.open-mpi.org/faq/?category=tuning#using-paffinity

Expect to see more functionality / flexibility here in the future...

Is the OpenMPI design of MPI_AllReduce "same" for shared-memory  
(NUMA) as well as distributed system? Can someone please tell me   
MPI_AllReduce design, in brief, in terms of processes and their  
interaction on shared-memory?


Open MPI is fundamentally based on plugins.  We have plugins in for  
various flavors of collective algorithms (see the code base: ompi/mca/ 
coll/), one of which is "sm" (shared memory).  The shared memory  
collectives are currently quite limited but are being expanded and  
improved by Indiana University (e.g., IIRC, allreduce uses the shared  
memory reduce followed by a shared memory bcast).


The "tuned" collective plugin has its own implementation(s) of  
Allreduce -- Jelena or George will have to comment here.  They do not  
assume shared memory; they use well-known algorithms for allreduce.   
The "tuned" component basically implements a wide variety of  
algorithms for each MPI collective and attempts to choose which one  
will be best to use at run-time.  U. Tennessee has done a lot of work  
in this area and I think they have several published papers on it.


The "basic" plugin is the dirt-simple correct-but-not-optimized  
component that does simple linear and logarithmic algorithms for all  
the MPI collectives.  If we don't have a usable algorithm anywhere  
else, we fall back to the basic plugin (e.g., allreduce is a reduce  
followed by a bcast).



Else please suggest me a good reference for this.


Our basic philosophy / infrastructure for MPI collectives is based on  
this paper:


http://www.open-mpi.org/papers/ics-2004/

Although work that happened literally last week is just about to hit  
the development trunk (within a week or so -- still doing some  
debugging) that brings Goodness to allowing a first-level of mixing-n- 
matching between collective components that do not provide all the  
MPI algorithms.  I can explain more if you care.


Hope this helps...

--
Jeff Squyres
Cisco Systems