Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Chris Gottbrath
Eugene, On Apr 14, 2009, at 7:10 PM, Eugene Loh wrote: > Shaun Jackman wrote: > >> Wow. Thanks, Eugene. I definitely have to look into the Sun HPC >> ClusterTools. It looks as though it could be very informative. > > Great. And, I didn't mean to slight TotalView. I'm just not > familiar wit

Re: [OMPI users] Problem with MPI_File_read() (2)

2009-04-14 Thread Jeff Squyres
In general, files written by MPI_File_write (and friends) are only guaranteed to be readable by MPI_File_read (and friends). So if you have an ASCII input file, or even a binary input file, you might need to read it in with traditional/unix file read functions and then write it out with MP

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Eugene Loh
Shaun Jackman wrote: Wow. Thanks, Eugene. I definitely have to look into the Sun HPC ClusterTools. It looks as though it could be very informative. Great. And, I didn't mean to slight TotalView. I'm just not familiar with it. What's the purpose of the 400 MB that MPI_Init has allocated?

[OMPI users] Problem with MPI_File_read() (2)

2009-04-14 Thread Jovana Knezevic
> > Hi Jovana, > > 825307441 is 0x31313131 in base 16 (hexadecimal), which is the string > `' in ASCII. MPI_File_read reads in binary values (not ASCII) just > as the standard functions read(2) and fread(3) do. > > So, your program is fine; however, your data file (first.dat) is not. > >

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Chris Gottbrath
Shaun, These all look like fine suggestions. Another tool you should consider using for this problem or others like it in the future is TotalView. It seems like there are two related questions in your current troubleshooting scenario: 1. is the memory being used where you think it is? 2. is t

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Shaun Jackman
Eugene Loh wrote: Okay. Attached is a "little" note I wrote up illustrating memory profiling with Sun tools. (It's "big" because I ended up including a few screenshots.) The program has a bunch of one-way message traffic and some user-code memory allocation. I then rerun with the receiver

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Eugene Loh
Shaun Jackman wrote: Eugene Loh wrote: On the other hand, I assume the memory imbalance we're talking about is rather severe. Much more than 2500 bytes to be noticeable, I would think. Is that really the situation you're imagining? The memory imbalance is drastic. I'm expecting 2 GB of m

Re: [OMPI users] Problem with MPI_File_read()

2009-04-14 Thread Shaun Jackman
Hi Jovana, 825307441 is 0x31313131 in base 16 (hexadecimal), which is the string `' in ASCII. MPI_File_read reads in binary values (not ASCII) just as the standard functions read(2) and fread(3) do. So, your program is fine; however, your data file (first.dat) is not. Cheers, Shaun Jovan

[OMPI users] Problem with MPI_File_read()

2009-04-14 Thread Jovana Knezevic
Hello everyone! I have a problems using MPI_File_read() in C. Simple code below, trying to read an integer prints to the standard output wrong result (instead of 1 prints 825307441). I tried this function with 'MPI_CHAR' datatype and it works. Probably I'm not using it properly for MPI_INT, but I

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Shaun Jackman
Eugene Loh wrote: ompi_info -a | grep eager depends on the BTL. E.g., sm=4K but tcp is 64K. self is 128K. Thanks, Eugene. On the other hand, I assume the memory imbalance we're talking about is rather severe. Much more than 2500 bytes to be noticeable, I would think. Is that really the s

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-14 Thread Ralph Castain
The -x option only applies to your application processes - it is never applied to the OMPI processes such as the OMPI daemons (orteds). If you built OMPI with the Intel library, then trying to pass the path to libimf via -x will fail - your application processes will get that library path,

Re: [OMPI users] XLF and 1.3.1

2009-04-14 Thread Jean-Michel Beuken
ok ! thank you Nysal Can you try adding --disable-dlopen to the configure command line --Nysal On Tue, 2009-04-14 at 10:19 +0200, Jean-Michel Beuken wrote: there is a problem of "multiple definition"... any advices ? it's resolved the problem of "multiple definition"... regards jmb

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-14 Thread Francesco Pietra
mpirun -x LD_LIBRARY_PATH -host tya64 connectivity_c complained about libimf.so (not found), just the same as without "-x LD_LIBRARY_PATH" (tried to give the full path to the PATH with same error) while # dpkg --search libimf.so /opt/intel/fce/10.1.022/lib/libimf.so /opt/intel/fce/10.1.022/lib/l

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Eugene Loh
On Apr 14, 2009, at 12:02 PM, Shaun Jackman wrote: Assuming the problem is congestion and that messages are backing up, ... I'd check this assumption first before going too far down that path. You might be able to instrument your code to spit out sends and receives. VampirTrace (and PERU

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Ralph Castain
On Apr 14, 2009, at 12:02 PM, Shaun Jackman wrote: Hi Eugene, Eugene Loh wrote: At 2500 bytes, all messages will presumably be sent "eagerly" -- without waiting for the receiver to indicate that it's ready to receive that particular message. This would suggest congestion, if any, is on

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Eugene Loh
Shaun Jackman wrote: Eugene Loh wrote: At 2500 bytes, all messages will presumably be sent "eagerly" -- without waiting for the receiver to indicate that it's ready to receive that particular message. This would suggest congestion, if any, is on the receiver side. Some kind of congestion c

Re: [OMPI users] Debugging memory use of Open MPI

2009-04-14 Thread Shaun Jackman
Hi Eugene, Eugene Loh wrote: At 2500 bytes, all messages will presumably be sent "eagerly" -- without waiting for the receiver to indicate that it's ready to receive that particular message. This would suggest congestion, if any, is on the receiver side. Some kind of congestion could, I supp

Re: [OMPI users] all2all algorithms

2009-04-14 Thread Jeff Squyres
George can speak more definitively about this. In general, our "tuned" coll component (plugin) does exactly these kinds of determinations to figure out which algorithm to use at runtime. Not only are communicator process counts involved, but also size of message is considered. I count 5 d

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Gus Correa
Orion Poplawski wrote: Gus Correa wrote: Hi Orion, Prentice, list I had a related problem recently, building OpenMPI with gcc, g++ and pgf90 8.0-4 on CentOS 5.2. Configure would complete, but not make. Easier solution is to set FC to "pgf90 -noswitcherror". Does not appear to interfere wit

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Gus Correa wrote: Hi Orion, Prentice, list I had a related problem recently, building OpenMPI with gcc, g++ and pgf90 8.0-4 on CentOS 5.2. Configure would complete, but not make. Easier solution is to set FC to "pgf90 -noswitcherror". Does not appear to interfere with any configure tests.

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Jeff Squyres
On Apr 14, 2009, at 11:28 AM, Orion Poplawski wrote: Sorry, it inherits it from libmpi.la. I hate libtool. To be fair, Libtool actually does a pretty darn good job at a very complex job. :-) These corner cases are pretty obscure (mixing one vendor's fortran compiler with another vend

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Gus Correa wrote: Hi Orion, Prentice, list I had a related problem recently, building OpenMPI with gcc, g++ and pgf90 8.0-4 on CentOS 5.2. Configure would complete, but not make. See this thread for a workaround: http://www.open-mpi.org/community/lists/users/2009/04/8724.php Gus Correa Than

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Gus Correa
Hi Orion That's exactly what happened to me. Configured OK, failed on make because of "-pthread". See my message from a minute ago, and this thread, for a workaround suggested by Jeff Squyres, of stripping off "-phtread" from the pgf90 flags: http://www.open-mpi.org/community/lists/users/2009/04

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Orion Poplawski wrote: Looks like libtool is adding -pthread because it sees that you use -pthread to link C programs and assumes that all linkers use it. Sorry, it inherits it from libmpi.la. I hate libtool. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/Co

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Orion Poplawski wrote: ./configure LIBS=-lgcc_eh ... did the trick. Spoke too soon. This leads to: /bin/sh ../../../libtool --mode=link pgf90 -I../../../ompi/include -I../../../ompi/include -I. -I. -I../../../ompi/mpi/f90 -fastsse -fPIC -export-dynamic -Wl,-z,noexecstack -o libmpi_f90

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Gus Correa
Hi Orion, Prentice, list I had a related problem recently, building OpenMPI with gcc, g++ and pgf90 8.0-4 on CentOS 5.2. Configure would complete, but not make. See this thread for a workaround: http://www.open-mpi.org/community/lists/users/2009/04/8724.php Gus Correa -

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Orion Poplawski wrote: Looks like I need link to -lgcc_eh some how. ./configure LIBS=-lgcc_eh ... did the trick. checking if F77 compiler and POSIX threads work as is... yes checking if C compiler and POSIX threads work with -Kthread... no checking if C compiler and POSIX threads work with

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Orion Poplawski
Prentice Bisbal wrote: Orion, I have no trouble getting thread support during configure with PGI 8.0-3 I'm mixing the pgf and gcc compilers which causes the trouble. Here is the config.log entry for the F77 test: configure:65969: checking if F77 compiler and POSIX threads work as is configur

Re: [OMPI users] help: seg fault when freeing communicator

2009-04-14 Thread Jeff Squyres
In this case, I think we would need a little more information such as your application itself. Is there any chance you can make a small reproducer of the application that we can easily study and reproduce the problem? Have you tried running your application through a memory-checking debu

Re: [OMPI users] shared libraries issue compiling1.3.1/intel10.1.022

2009-04-14 Thread Jeff Squyres
On Apr 13, 2009, at 12:07 PM, Francesco Pietra wrote: I knew that but have considered it again. I wonder whether the info at the end of this mail suggests how to operate from the viewpoint of openmpi in compiling a code. In trying to compile openmpi-1.3.1 on debian amd64 lenny, intels 10.1.022

Re: [OMPI users] openmpi 1.3.1 : mpirun status is 0 after receivingTERM signal

2009-04-14 Thread Jeff Squyres
I believe that this is fixed in 1.3.2. On Apr 14, 2009, at 10:32 AM, Geoffroy Pignot wrote: Hi, I am not sure it's a bug but I think we wait for something else when we kill a proccess - by the way , the signal propagation works well. I read an explanation on a previous thread - ( http://www.

[OMPI users] openmpi 1.3.1 : mpirun status is 0 after receiving TERM signal

2009-04-14 Thread Geoffroy Pignot
Hi, I am not sure it's a bug but I think we wait for something else when we kill a proccess - by the way , the signal propagation works well. I read an explanation on a previous thread - ( http://www.open-mpi.org/community/lists/users/2009/03/8514.php ) . . It's not important but it could contrib

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Prentice Bisbal
Orion, I have no trouble getting thread support during configure with PGI 8.0-3 Are there any other compilers in your path before the PGI compilers? Even if the PGI compilers come first, try specifying the PGI compilers explicitly with these environment variables (bash syntax shown): export CC=p

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-14 Thread Ralph Castain
Ah now, I didn't say it -worked-, did I? :-) Clearly a bug exists in the program. I'll try to take a look at it (if Lenny doesn't get to it first), but it won't be until later in the week. On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: I agree with you Ralph , and that 's what I expect

Re: [OMPI users] Problem with running openMPI program

2009-04-14 Thread Eugene Loh
Ankush Kaul wrote: Finally, after mentioning the hostfiles the cluster is working fine. We downloaded few benchmarking softwares but i would like to know if there is any GUI based benchmarking software so that its easier to demonstrate the working of our cluster while displaying our cluster.

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-14 Thread Geoffroy Pignot
I agree with you Ralph , and that 's what I expect from openmpi but my second example shows that it's not working cat hostfile.0 r011n002 slots=4 r011n003 slots=4 cat rankfile.0 rank 0=r011n002 slot=0 rank 1=r011n003 slot=1 mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-14 Thread Ralph Castain
The rankfile cuts across the entire job - it isn't applied on an app_context basis. So the ranks in your rankfile must correspond to the eventual rank of each process in the cmd line. Unfortunately, that means you have to count ranks. In your case, you only have four, so that makes life eas

Re: [OMPI users] Problem with running openMPI program

2009-04-14 Thread Jeff Squyres
On Apr 14, 2009, at 2:57 AM, Ankush Kaul wrote: Finally, after mentioning the hostfiles the cluster is working fine. We downloaded few benchmarking softwares but i would like to know if there is any GUI based benchmarking software so that its easier to demonstrate the working of our cluster

Re: [OMPI users] XLF and 1.3.1

2009-04-14 Thread Nysal Jan
Can you try adding --disable-dlopen to the configure command line --Nysal On Tue, 2009-04-14 at 10:19 +0200, Jean-Michel Beuken wrote: > Hello, > > I'm trying to build 1.3.1 under IBM Power5 + SLES 9.1 + XLF 9.1... > > after some searches on FAQ and Google, my configure : > > export CC="/opt/

[OMPI users] XLF and 1.3.1

2009-04-14 Thread Jean-Michel Beuken
Hello, I'm trying to build 1.3.1 under IBM Power5 + SLES 9.1 + XLF 9.1... after some searches on FAQ and Google, my configure : export CC="/opt/ibmcmp/vac/7.0/bin/xlc" export CXX="/opt/ibmcmp/vacpp/7.0/bin/xlc++" export CFLAGS="-O2 -q64 -qmaxmem=-1" # export F77="/opt/ibmcmp/xlf/9.1/bin/xlf" e

Re: [OMPI users] Problem with running openMPI program

2009-04-14 Thread Ankush Kaul
Finally, after mentioning the hostfiles the cluster is working fine. We downloaded few benchmarking softwares but i would like to know if there is any GUI based benchmarking software so that its easier to demonstrate the working of our cluster while displaying our cluster. Regards Ankush

Re: [OMPI users] PGI Fortran pthread support

2009-04-14 Thread Åke Sandgren
On Mon, 2009-04-13 at 16:48 -0600, Orion Poplawski wrote: > Seeing the following building openmpi 1.3.1 on CentOS 5.3 with PGI pgf90 > 8.0-5 fortran compiler: > checking for PTHREAD_MUTEX_ERRORCHECK_NP... yes > checking for PTHREAD_MUTEX_ERRORCHECK... yes > checking for working POSIX threads packa

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-04-14 Thread Geoffroy Pignot
Hi, I agree that my examples are not very clear. What I want to do is to launch a multiexes application (masters-slaves) and benefit from the processor affinity. Could you show me how to convert this command , using -rf option (whatever the affinity is) mpirun -n 1 -host r001n001 master.x options