date:20070306

[OMPI users] configure is too smart !

2007-03-06 Thread Christian Simon


Dear developers,

I "switched" from Lam-MPI to Open MPI recently. I am using MacOS X  
server
on small clusters, previously with XLF/XLC on G5, now gfortran/gcc  
with Intels.


Since users are used to Unix file systems, since most applications/ 
libraries compilations are not aware of HFS+ file system case  
insensitivity, I have installed a UFS formatted disk on our new cluster.


Being a careful administrator, I configured/compiled OpenMPI as a  
user, on the UFS partition.

Then I installed it as root, on an HFS+ system partition.

When I tried to install Scalapack, BLACS compilation failed miserably:

BI_EmergencyBuff.c: In function 'void BI_EmergencyBuff(int)':
BI_EmergencyBuff.c:34: error: invalid conversion from 'void*' to 'char*'
make[2]: *** [BI_EmergencyBuff.o] Error 1
make[1]: *** [INTERN] Error 2
make: *** [MPI] Error 2

This is, I guess, due to confusion between wrappers :

$/usr/local/openmpi-1.1.4_32bits/bin/mpic++
i686-apple-darwin8-g++-4.0.1: no input files

seems ok, but:

$ /usr/local/openmpi-1.1.4_32bits/bin/mpicc
i686-apple-darwin8-g++-4.0.1: no input files

is wrong...
Re-compiling OpenMPI on an HFS+ filesystem, I get:

$ /usr/local/openmpi-1.1.4_32bits_hfs/bin/mpic++
i686-apple-darwin8-g++-4.0.1: no input files

and

$ /usr/local/openmpi-1.1.4_32bits_hfs/bin/mpicc
i686-apple-darwin8-gcc-4.0.1: no input files

which is correct.
Then BLACS/Scalapack and others get compiled without troubles.
(I have not tested execution yet !)

Is my explanation right ?

If yes, although the documentation is excellent, and FAQ already well  
detailed, could you please add a caveat somewhere:
OpenMPI's configure is smarter than the average: it is case  
sensitiveness aware.


Anyway, many thanks for your great great job !
--
Dr. Christian SIMON, Maitre de Conferences
Laboratoire LI2C-UMR7612Bat. F74, piece 757
Universite Pierre et Marie CurieTel:+33.1.44.27.32.65
Case 51 Fax:+33.1.44.27.32.28
4 Place Jussieu
75252 Paris Cedex 05
France/Europe

Re: [OMPI users] Current working directory issue

2007-03-06 Thread Jeff Squyres

OMPI uses the getcwd() library call to determine the pwd, whereas the  
shell $PWD variable contains the shell's point of view of what the  
PWD is (I *suspect* that the pwd(1) shell command also uses getcwd(),  
but I don't know that for sure).


From the OSX getcwd(3) man page:

 The getcwd() function copies the absolute pathname of the  
current working
 directory into the memory referenced by buf and returns a  
pointer to buf.
 The size argument is the size, in bytes, of the array  
referenced by buf.


From the Linux getcwd(3) man page:

   The  getcwd() function shall place an absolute pathname of  
the current
   working directory in the array pointed to by buf, and return  
buf.  The
   pathname copied to the array shall contain no components that  
are sym-

   bolic links. ...

So this at least explains why you're seeing that behavior.

I'm trying to think of a good reason why we're not checking PWD -- I  
think the reasons are as follows:


1. LAM/MPI has used getcwd() for about 10 years (I can't speak for  
the other MPI's, though)


2. You're the first guy to ask in that time (or the frequency of  
asking is so low that I've forgotten)


But these are pretty wimpy reasons.  :-)  I'll have to check with the  
other developers to see if there are any "gotchas" to using PWD if  
it's defined and contains a valid alias for the current directory.




On Mar 2, 2007, at 1:12 PM, Grismer, Matthew J Civ AFRL/VAAC wrote:

I’m using OpenMPI on an Xserve cluster running OS X Server 10.4.8.  
The user directories exist on an XserveRAID connected to the master  
node via fibre channel.  So, on the master node the full pathname  
for the user directories is /Volumes/RAID1/users1.  The compute  
nodes of the cluster automount the user directories via NFS, so the  
full path to the user directories appears on the nodes as /xhome/ 
users1.  I created a hostfile list of all the compute nodes on the  
cluster, not including the master node.  When I attempt to run a  
program in my home directory matt from the master node with




mpirun –hostfile nodes –np 4 program



it fails because it cannot find program.  If I add the –wdir option  
and specify the directory as /xhome/users1/matt, everything is fine.




My question is this, how does OpenMPI determine your working  
directory, and is there a way to fix this without the –wdir  
option?  For example, if you look at the PWD environment variable,  
it is always /xhome/users1/matt, even on the master.  If you use  
the pwd command, however, you get different the two different  
results on the master and the nodes.




Thanks.



Matt



_

Matthew Grismer



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] Fortran90 interfaces--problem?

2007-03-06 Thread Jeff Squyres


On Mar 5, 2007, at 9:50 AM, Michael wrote:


I have discovered a problem with the Fortran90 interfaces for all
types of communication when one uses derived datatypes (I'm currently
using openmpi-1.3a1r13918 [for testing] and openmpi-1.1.2 [for
compatibility with an HPC system]), for example

call MPI_RECV(tsk,1,MPI_TASKSTATE,src,
1,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)

where tsk is a Fortran 90 structure and MPI_TASKSTATE has been
created by MPI_TYPE_CREATE_STRUCT.

At the moment I can't imagine a way to modify the OpenMPI interface
generation to work around this besides switching to --with-mpi-f90-
size=small.


This is unfortunately a known problem -- not just with Open MPI, but  
with the F90 bindings specification in MPI.  :-(  Since there's no  
F90 equivalent of C's (void*), there's no way to pass a variable of  
arbitrary type through the MPI F90 bindings.  Hence, all we can do is  
define bindings for all the known types (i.e., various dimension  
sizes of the MPI types).


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] BLACS tests fails on IPF

2007-03-06 Thread Jeff Squyres

Sorry for the delay in replying -- we've been quite busy trying to  
get OMPI v1.2 out the door!


Are you sure that you build BLACS properly with Open MPI?  Check this  
FAQ item:


   http://www.open-mpi.org/faq/?category=mpi-apps#blacs

In particular, note that there are items in Bmake.inc that you need  
to set properly or BLACS won't work properly with Open MPI.



On Feb 20, 2007, at 4:25 AM, Kobotov, Alexander V wrote:


Hello all,



I built BLACS on Itanium using Intel compilers under linux  
(2.6.9-34.EL). But it fails default BLACS Fortran tests (xFbtest),  
C tests (xCbtest) are ok. I’ve tried different configurations  
combining OpenMPI-1.1.2 or OpenMPI-1.1.4, ICC 9.1.038 or ICC  
8.1.38, IFORT 9.1.33 or IFORT 8.1.34, but all results were the  
same. OpenMPI is built using 9.1 compilers. Also I’ve tried the  
same using em64t compiler on Intel XEON – all tests were passed.  
MPICH2 on IPF also works fine.




Is that an OpenMPI bug? Maybe some workaround exists?



Bmake.inc is attached.

Below is output I’ve got (Don’t pay attention to blacs warnings,  
they are normal for MPI):


===[ begin of: xFbtest output ]=

-bash-3.00$  mpirun -np 4 xFbtest_MPI-LINUX-0

BLACS WARNING 'No need to set message ID range due to MPI  
communicator.'


from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.



BLACS WARNING 'No need to set message ID range due to MPI  
communicator.'


from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.



BLACS WARNING 'No need to set message ID range due to MPI  
communicator.'


from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.



BLACS WARNING 'No need to set message ID range due to MPI  
communicator.'


from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.



[comp-pvfs-0-7.local:30119] *** An error occurred in MPI_Comm_group

[comp-pvfs-0-7.local:30118] *** An error occurred in MPI_Comm_group

[comp-pvfs-0-7.local:30118] *** on communicator MPI_COMM_WORLD

[comp-pvfs-0-7.local:30118] *** MPI_ERR_COMM: invalid communicator

[comp-pvfs-0-7.local:30119] *** on communicator MPI_COMM_WORLD

[comp-pvfs-0-7.local:30119] *** MPI_ERR_COMM: invalid communicator

[comp-pvfs-0-7.local:30119] *** MPI_ERRORS_ARE_FATAL (goodbye)

[comp-pvfs-0-7.local:30116] *** An error occurred in MPI_Comm_group

[comp-pvfs-0-7.local:30116] *** on communicator MPI_COMM_WORLD

[comp-pvfs-0-7.local:30118] *** MPI_ERRORS_ARE_FATAL (goodbye)

[comp-pvfs-0-7.local:30116] *** MPI_ERR_COMM: invalid communicator

[comp-pvfs-0-7.local:30116] *** MPI_ERRORS_ARE_FATAL (goodbye)

[comp-pvfs-0-7.local:30117] *** An error occurred in MPI_Comm_group

[comp-pvfs-0-7.local:30117] *** on communicator MPI_COMM_WORLD

[comp-pvfs-0-7.local:30117] *** MPI_ERR_COMM: invalid communicator

[comp-pvfs-0-7.local:30117] *** MPI_ERRORS_ARE_FATAL (goodbye)

forrtl: error (78): process killed (SIGTERM)

forrtl: error (78): process killed (SIGTERM)

forrtl: error (78): process killed (SIGTERM)

forrtl: error (78): process killed (SIGTERM)

===[ end of: xFbtest output ]=





W.B.R.,

Kobotov Alexander






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] Fortran90 interfaces--problem?

2007-03-06 Thread Åke Sandgren

On Tue, 2007-03-06 at 09:51 -0500, Jeff Squyres wrote:
> On Mar 5, 2007, at 9:50 AM, Michael wrote:
> 
> > I have discovered a problem with the Fortran90 interfaces for all
> > types of communication when one uses derived datatypes (I'm currently
> > using openmpi-1.3a1r13918 [for testing] and openmpi-1.1.2 [for
> > compatibility with an HPC system]), for example
> >
> > call MPI_RECV(tsk,1,MPI_TASKSTATE,src,
> > 1,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)
> >
> > where tsk is a Fortran 90 structure and MPI_TASKSTATE has been
> > created by MPI_TYPE_CREATE_STRUCT.
> >
> > At the moment I can't imagine a way to modify the OpenMPI interface
> > generation to work around this besides switching to --with-mpi-f90-
> > size=small.
> 
> This is unfortunately a known problem -- not just with Open MPI, but  
> with the F90 bindings specification in MPI.  :-(  Since there's no  
> F90 equivalent of C's (void*), there's no way to pass a variable of  
> arbitrary type through the MPI F90 bindings.  Hence, all we can do is  
> define bindings for all the known types (i.e., various dimension  
> sizes of the MPI types).
> 

What about the "Fortran 2003 ISO_C_BINDING" couldn't a C_LOC be used
here?
(I probably don't know what i'm talking about but i just saw a reference
to it.)

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

Re: [OMPI users] performance question

2007-03-06 Thread Jeff Squyres


On Feb 19, 2007, at 1:53 PM, Mark Kosmowski wrote:


[snipped good description of cluster]


Sorry for the delay in replying -- traveling for a week-long OMPI  
developer meeting and trying to get v1.2 out the door has sucked up  
all of our time recently.  :-(



For just the one system with two processors:

CPU time: 32:43
Elapsed time: 36:52
Peak memory: 373 Mb

For just the cluster:

CPU time: 12:23
Elapsed time: 20:30
Peak memory: 131 Mb

Is this a typical scaling or should I be thinking about doing some
sort of tweaking to the [network / ompi] system at some point?  The


Unfortunately, there is no "typical" scaling -- every application is  
different.  I'm unfortunately unfamiliar with the application you  
mentioned (CPMD), so I don't know how it runs (memory footprint,  
communication pattern, etc.).



cpu time is scaling about right, but elapsed time is getting hammered
- with the low memory overhead it has to be a communications issue
rather than a swap issue, right?


Possibly.  But even with low memory usage, there can be other factors  
that create low CPU utilization (e.g., other IO, such as disk),  
processor/memory hierarchy issues (are your motherboards NUMA?), etc.



Would it be helpful to see a serial time point using the same
executable (if so, I'd probably repeat all the runs with a smaller job
- I don't know that I want to spend half a week just for
benchmarking)?


I'm not sure what you mean -- see *what* at a serial point in time?


I have included the appropriate btl_tcp_if_include configuration so
that OMPI only uses the gigabit ports (and not the internet
connections that some of the machines have).


Gotcha.

OMPI's TCP support is "ok" -- it's not great (we've spent much more  
time optimizing the low latency/high bandwidth interconnects).  We do  
intend to go back to optimize TCP, but it's one of those time and  
monkeys issues (don't have enough time or monkeys to do it...).  But  
it shouldn't be a major slowdown, particularly over a 12 or 32 hour run.


Do you have any idea what the communication pattern is for CPMD?   
Does it send a little data, or a lot?  How often does it communicate  
between the MPI processes, and how big are the messages?  Etc.



I am already planning on doing some benchmark comparisons to determine
the effect of compiler / math library on speed.


Depending on the app, this can have a big impact.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] Fortran90 interfaces--problem?

2007-03-06 Thread Jeff Squyres


On Mar 6, 2007, at 10:23 AM, Åke Sandgren wrote:


What about the "Fortran 2003 ISO_C_BINDING" couldn't a C_LOC be used
here?
(I probably don't know what i'm talking about but i just saw a  
reference

to it.)


FWIW, we wrote a paper about a proposed Fortran 2003 bindings that  
uses the ISO_C_BINDINGS stuff:


http://www.open-mpi.org/papers/euro-pvmmpi-2005-fortran/

We haven't spent many cycles implementing it, but it's on the long- 
term to-do list.  Contributions would be great!  ;-)


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI users] configure is too smart !

2007-03-06 Thread Brian Barrett


Sure, we can add a FAQ entry on that :).

At present, configure decides whether Open MPI will be installed on a  
case sensitive file-system or not based on what the build file system  
does.  Which is far from perfect, but covers 99.9% of the cases.  You  
happen to be the .1%, but we do have an option for you.  You can  
specify --with-cs-fs or --without-cs-fs to specify whether the  
installation filesystem is case sensitive or not (overriding the auto- 
detection).


Of course, I suppose I could add a sanity check during "make install"  
to ensure that the installation filesystem really is case sensitive  
if we expect it to be.  mmm...  I'll add that to the long term todo  
list.  For now, I think a FAQ entry will do.


Brian


On Mar 6, 2007, at 2:24 AM, Christian Simon wrote:


Dear developers,

I "switched" from Lam-MPI to Open MPI recently. I am using MacOS X
server
on small clusters, previously with XLF/XLC on G5, now gfortran/gcc
with Intels.

Since users are used to Unix file systems, since most applications/
libraries compilations are not aware of HFS+ file system case
insensitivity, I have installed a UFS formatted disk on our new  
cluster.


Being a careful administrator, I configured/compiled OpenMPI as a
user, on the UFS partition.
Then I installed it as root, on an HFS+ system partition.

When I tried to install Scalapack, BLACS compilation failed miserably:

BI_EmergencyBuff.c: In function 'void BI_EmergencyBuff(int)':
BI_EmergencyBuff.c:34: error: invalid conversion from 'void*' to  
'char*'

make[2]: *** [BI_EmergencyBuff.o] Error 1
make[1]: *** [INTERN] Error 2
make: *** [MPI] Error 2

This is, I guess, due to confusion between wrappers :

$/usr/local/openmpi-1.1.4_32bits/bin/mpic++
i686-apple-darwin8-g++-4.0.1: no input files

seems ok, but:

$ /usr/local/openmpi-1.1.4_32bits/bin/mpicc
i686-apple-darwin8-g++-4.0.1: no input files

is wrong...
Re-compiling OpenMPI on an HFS+ filesystem, I get:

$ /usr/local/openmpi-1.1.4_32bits_hfs/bin/mpic++
i686-apple-darwin8-g++-4.0.1: no input files

and

$ /usr/local/openmpi-1.1.4_32bits_hfs/bin/mpicc
i686-apple-darwin8-gcc-4.0.1: no input files

which is correct.
Then BLACS/Scalapack and others get compiled without troubles.
(I have not tested execution yet !)

Is my explanation right ?

If yes, although the documentation is excellent, and FAQ already well
detailed, could you please add a caveat somewhere:
OpenMPI's configure is smarter than the average: it is case
sensitiveness aware.

Anyway, many thanks for your great great job !
--
Dr. Christian SIMON, Maitre de Conferences
Laboratoire LI2C-UMR7612Bat. F74, piece 757
Universite Pierre et Marie CurieTel:+33.1.44.27.32.65
Case 51 Fax:+33.1.44.27.32.28
4 Place Jussieu
75252 Paris Cedex 05
France/Europe


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Comm_Spawn

2007-03-06 Thread Rozzen . VINCONT

Hi Tim, I get back to you

"What kind of system is it?"
=>The system is a "Debian Sarge".
"How many nodes are you running on?"
=> There is no cluster configured, so I guess I work with no node environnement.
"Have you been able to try a more recent version of Open MPI?"
=>Today, I tried with version 1.1.4, but the results are not better.
I tested 2 cases :
Test 1 : with the sames configuration options (./configure  
--enable-mpi-threads --enable-progress-threads --with-threads=posix 
--enable-smp-locks)
The program stopped on MPI_Init_thread in __lll_mutex_lock_wait () from 
/lib/tls/libpthread.so.0

Test 2 : with the default configuration options (./configure  
--prefix=/usr/local/Mpi/openmpi-1.1.4-noBproc-noThread)
The program stoped on the "node allocation" after the spawn n°31.
Maybe the problem comes from the lack of node definition?
Thanks for your help.

Here below, the different log files of the 2 tests

/**TEST 1***/
GNU gdb 6.3-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux"...Using host libthread_db library 
"/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /home/workspace/test_spaw1/src/spawn
[Thread debugging using libthread_db enabled]
[New Thread 1076646560 (LWP 5178)]
main***
main : Lancement MPI*
[New Thread 1085225904 (LWP 5181)]
[New Thread 1094495152 (LWP 5182)]

Program received signal SIGINT, Interrupt.
[Switching to Thread 1076646560 (LWP 5178)]
0x4018a436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
(gdb) where
#0  0x4018a436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1  0x40187893 in _L_mutex_lock_26 () from /lib/tls/libpthread.so.0
#2  0xb508 in ?? ()
#3  0x4000bcd0 in _dl_map_object_deps () from /lib/ld-linux.so.2
#4  0x40b9f8cb in mca_btl_tcp_component_create_listen () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
#5  0x40b9f8cb in mca_btl_tcp_component_create_listen () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
#6  0x40b9eef4 in mca_btl_tcp_component_init () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
#7  0x4008c652 in mca_btl_base_select () from 
/usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
#8  0x40b8dd28 in mca_bml_r2_component_init () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_bml_r2.so
#9  0x4008bf54 in mca_bml_base_init () from 
/usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
#10 0x40b7e5c9 in mca_pml_ob1_component_init () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_pml_ob1.so
#11 0x40094192 in mca_pml_base_select () from 
/usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
#12 0x4005742c in ompi_mpi_init () from 
/usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
#13 0x4007c182 in PMPI_Init_thread () from 
/usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
#14 0x080489f3 in main (argc=1, argv=0xb8a4) at spawn6.c:33



/**TEST 2***/

GNU gdb 6.3-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux"...Using host libthread_db library 
"/lib/tls/libthread_db.so.1".

(gdb) run -np 1 --host myhost spawn6
Starting program: /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/bin/mpirun -np 
1 --host myhost spawn6
[Thread debugging using libthread_db enabled]
[New Thread 1076121728 (LWP 4022)]
main***
main : Lancement MPI*
Exe : Lance
Exe: lRankExe  = 1   lRankMain  = 0
1 main***MPI_Comm_spawn return : 0
1 main***Rang main : 0   Rang exe : 1
Exe : Lance
Exe: Fin.


Exe: lRankExe  = 1   lRankMain  = 0
2 main***MPI_Comm_spawn return : 0
2 main***Rang main : 0   Rang exe : 1
Exe : Lance
Exe: Fin.

...

Exe: lRankExe  = 1   lRankMain  = 0
30 main***MPI_Comm_spawn return : 0
30 main***Rang main : 0   Rang exe : 1
Exe : Lance
Exe: Fin.

Exe: lRankExe  = 1   lRankMain  = 0
31 main***MPI_Comm_spawn return : 0
31 main***Rang main : 0   Rang exe : 1

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1076121728 (LWP 4022)]
0x4018833b in strlen () from /lib/tls/libc.so.6
(gdb) where
#0  0x4018833b in strlen () from /lib/tls/libc.so.6
#1  0x40297c5e in orte_gpr_replica_create_itag () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/openmpi/mca_gpr_replica.so
#2  0x4029d2df in orte_gpr_replica_put_fn () from 
/usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/openmpi

[OMPI users] MPI_PACK very slow?

2007-03-06 Thread Michael

I have a section of code were I need to send 8 separate integers via  
BCAST.


Initially I was just putting the 8 integers into an array and then  
sending that array.


I just tried using MPI_PACK on those 8 integers and I'm seeing a  
massive slow down in the code, I have a lot of other communication  
and this section is being used only 5 times.  I went from 140 seconds  
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet  
setup (I'm the only user working on this system today).


This was run with OpenMPI 1.1.2 to maintain compatibility with a  
major HPC site.


Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

Re: [OMPI users] MPI_PACK very slow?

2007-03-06 Thread George Bosilca

I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137  
seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,  
for packing 8 integers. I know the code and I'm affirmative there is  
no way to spend 27 seconds over there.


Can you run your application using valgrind with the callgrind tool.  
This will give you some basic informations about where the time is  
spend. This will give us additional information about where to look.


  Thanks,
george.

On Mar 6, 2007, at 11:26 AM, Michael wrote:


I have a section of code were I need to send 8 separate integers via
BCAST.

Initially I was just putting the 8 integers into an array and then
sending that array.

I just tried using MPI_PACK on those 8 integers and I'm seeing a
massive slow down in the code, I have a lot of other communication
and this section is being used only 5 times.  I went from 140 seconds
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet
setup (I'm the only user working on this system today).

This was run with OpenMPI 1.1.2 to maintain compatibility with a
major HPC site.

Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran

Re: [OMPI users] configure is too smart !

2007-03-06 Thread Christian Simon


Brian Barrett wrote:


specify --with-cs-fs or --without-cs-fs


Unbeliveable !

Thanks again.
--
Christian SIMON

Re: [OMPI users] MPI_Comm_Spawn

2007-03-06 Thread Ralph Castain

I believe I know what is happening here. My availability in the next week is
pretty limited due to a family emergency, but I'll take a look when I get
back. In brief, this is a resource starvation issue where the system thinks
your node is unable to support any further processes and so it blocks.

On a separate note, I never use threaded configurations due to the lack of
any real thread-safety review or testing on Open MPI to-date (per Tim's
earlier comment). My "standard" configuration for development and testing is
with --disable-progress-threads --without-threads.

I'll post something back to the list when I get it resolved.

Thanks
Ralph


On 3/6/07 9:00 AM, "rozzen.vinc...@fr.thalesgroup.com"
 wrote:

> Hi Tim, I get back to you
> 
> "What kind of system is it?"
> =>The system is a "Debian Sarge".
> "How many nodes are you running on?"
> => There is no cluster configured, so I guess I work with no node
> environnement.
> "Have you been able to try a more recent version of Open MPI?"
> =>Today, I tried with version 1.1.4, but the results are not better.
> I tested 2 cases :
> Test 1 : with the sames configuration options (./configure
> --enable-mpi-threads --enable-progress-threads --with-threads=posix
> --enable-smp-locks)
> The program stopped on MPI_Init_thread in __lll_mutex_lock_wait () from
> /lib/tls/libpthread.so.0
> 
> Test 2 : with the default configuration options (./configure
> --prefix=/usr/local/Mpi/openmpi-1.1.4-noBproc-noThread)
> The program stoped on the "node allocation" after the spawn n°31.
> Maybe the problem comes from the lack of node definition?
> Thanks for your help.
> 
> Here below, the different log files of the 2 tests
> 
> /**TEST 1***/
> GNU gdb 6.3-debian
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-linux"...Using host libthread_db library
> "/lib/tls/libthread_db.so.1".
> 
> (gdb) run
> Starting program: /home/workspace/test_spaw1/src/spawn
> [Thread debugging using libthread_db enabled]
> [New Thread 1076646560 (LWP 5178)]
> main***
> main : Lancement MPI*
> [New Thread 1085225904 (LWP 5181)]
> [New Thread 1094495152 (LWP 5182)]
> 
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 1076646560 (LWP 5178)]
> 0x4018a436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
> (gdb) where
> #0  0x4018a436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
> #1  0x40187893 in _L_mutex_lock_26 () from /lib/tls/libpthread.so.0
> #2  0xb508 in ?? ()
> #3  0x4000bcd0 in _dl_map_object_deps () from /lib/ld-linux.so.2
> #4  0x40b9f8cb in mca_btl_tcp_component_create_listen () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
> #5  0x40b9f8cb in mca_btl_tcp_component_create_listen () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
> #6  0x40b9eef4 in mca_btl_tcp_component_init () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_btl_tcp.so
> #7  0x4008c652 in mca_btl_base_select () from
> /usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
> #8  0x40b8dd28 in mca_bml_r2_component_init () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_bml_r2.so
> #9  0x4008bf54 in mca_bml_base_init () from
> /usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
> #10 0x40b7e5c9 in mca_pml_ob1_component_init () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc/lib/openmpi/mca_pml_ob1.so
> #11 0x40094192 in mca_pml_base_select () from
> /usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
> #12 0x4005742c in ompi_mpi_init () from
> /usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
> #13 0x4007c182 in PMPI_Init_thread () from
> /usr/local/Mpi/CURRENT_MPI/lib/libmpi.so.0
> #14 0x080489f3 in main (argc=1, argv=0xb8a4) at spawn6.c:33
> 
> 
> 
> /**TEST 2***/
> 
> GNU gdb 6.3-debian
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-linux"...Using host libthread_db library
> "/lib/tls/libthread_db.so.1".
> 
> (gdb) run -np 1 --host myhost spawn6
> Starting program: /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/bin/mpirun -np
> 1 --host myhost spawn6
> [Thread debugging using libthread_db enabled]
> [New Thread 1076121728 (LWP 4022)]
> main***
> main : Lancement MPI*
> Exe : Lance
> Exe: lRankExe  = 1   lRankMain  = 0
> 1 main***MPI_Comm_spawn return : 0
> 1 ma

Re: [OMPI users] MPI_PACK very slow?

2007-03-06 Thread Michael

I discovered I made a minor change that cost me dearly (I had thought  
I had tested this single change but perhaps didn't track the timing  
data closely).


MPI_Type_creat_struct performs well only when all the data is  
continuous in memory (at least for OpenMPI  1.1.2).


Is this normal or expected?

In my case the program has a f90 structure with 11 integers, 2  
logicals, and five 50 element integer arrays.  But at the first stage  
of the program only the first element of those arrays are used.  But  
using MPI_Type_create_struct it is more efficient to send the entire  
263 words of continuous memory (58 sec's) than to try and send only  
18 words of noncontinuous memory (64 sec's).  At the second stage  
it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an  
extra 116 seconds, which dominates the push of my overall wall clock  
time from 130 to 278 seconds.  The third stage increases from 13  
seconds to 37 seconds.


Because I need to send this block of data back and forward a lot I  
was hoping to find a way to speed up this data transfer of this odd  
block of data and a couple other variables.  I may try PACK and  
UNPACK on the structure, but calling those lots of times can't be  
more efficient.


Previously I was equivalencing the structure to a integer array and  
sending the integer array as a fast dirty solution to get started and  
it worked.  Not completely portable no doubt.


Michael

ps. I don't currently have valgrind installed on this cluster and  
valgrind is not part of the Debian Linux 3.1r3 distribution.  Without  
any experience with valgrind  I'm not sure how useful valgrind will  
be with a MPI program of 500+ subroutines and 50K+ lines running on  
16 processes.  It took us a bit to get profiling working for the  
OpenMP version of this code.


On Mar 6, 2007, at 11:28 AM, George Bosilca wrote:


I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137
seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,
for packing 8 integers. I know the code and I'm affirmative there is
no way to spend 27 seconds over there.

Can you run your application using valgrind with the callgrind tool.
This will give you some basic informations about where the time is
spend. This will give us additional information about where to look.

   Thanks,
 george.

On Mar 6, 2007, at 11:26 AM, Michael wrote:


I have a section of code were I need to send 8 separate integers via
BCAST.

Initially I was just putting the 8 integers into an array and then
sending that array.

I just tried using MPI_PACK on those 8 integers and I'm seeing a
massive slow down in the code, I have a lot of other communication
and this section is being used only 5 times.  I went from 140 seconds
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet
setup (I'm the only user working on this system today).

This was run with OpenMPI 1.1.2 to maintain compatibility with a
major HPC site.

Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other
half may reach you"
   Kahlil Gibran


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_PACK very slow?

2007-03-06 Thread George Bosilca



On Mar 6, 2007, at 4:51 PM, Michael wrote:


MPI_Type_creat_struct performs well only when all the data is
continuous in memory (at least for OpenMPI  1.1.2).


There are always benefits for sending contiguous data, especially  
when the message is small. Packing and unpacking, are costly  
operations. Even having a highly optimized version, cannot beat a  
user hand pack routine when the data is small. Increase the size of  
your message to over 64K and you will see another story.



In my case the program has a f90 structure with 11 integers, 2
logicals, and five 50 element integer arrays.  But at the first stage
of the program only the first element of those arrays are used.  But
using MPI_Type_create_struct it is more efficient to send the entire
263 words of continuous memory (58 sec's) than to try and send only
18 words of noncontinuous memory (64 sec's).  At the second stage
it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an
extra 116 seconds, which dominates the push of my overall wall clock
time from 130 to 278 seconds.  The third stage increases from 13
seconds to 37 seconds.

Because I need to send this block of data back and forward a lot I
was hoping to find a way to speed up this data transfer of this odd
block of data and a couple other variables.  I may try PACK and
UNPACK on the structure, but calling those lots of times can't be
more efficient.


Is there any way I can get access to your software ? Or at least the  
data-type related code ?



ps. I don't currently have valgrind installed on this cluster and
valgrind is not part of the Debian Linux 3.1r3 distribution.  Without
any experience with valgrind  I'm not sure how useful valgrind will
be with a MPI program of 500+ subroutines and 50K+ lines running on
16 processes.  It took us a bit to get profiling working for the
OpenMP version of this code.


It will be seamless. What I'm doing is the following:

instead of: mpirun -np 16 my_program my_args

I'm using: mpirun -np 16 valgrind --tool=callgrind my_program my_args

Once the execution is completed (which will usually take about 20  
times more than without valgrind) I gather all resulting files on a  
common location (if not already over NFS) and analyze them with  
kcachegrind (comming by default with kde).


  george.



On Mar 6, 2007, at 11:28 AM, George Bosilca wrote:


I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137
seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,
for packing 8 integers. I know the code and I'm affirmative there is
no way to spend 27 seconds over there.

Can you run your application using valgrind with the callgrind tool.
This will give you some basic informations about where the time is
spend. This will give us additional information about where to look.

   Thanks,
 george.

On Mar 6, 2007, at 11:26 AM, Michael wrote:


I have a section of code were I need to send 8 separate integers via
BCAST.

Initially I was just putting the 8 integers into an array and then
sending that array.

I just tried using MPI_PACK on those 8 integers and I'm seeing a
massive slow down in the code, I have a lot of other communication
and this section is being used only 5 times.  I went from 140  
seconds
to 277 seconds on 16 processors using TCP via a dual gigabit  
ethernet

setup (I'm the only user working on this system today).

This was run with OpenMPI 1.1.2 to maintain compatibility with a
major HPC site.

Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other
half may reach you"
   Kahlil Gibran


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran

[OMPI users] configure is too smart !

Re: [OMPI users] Current working directory issue

Re: [OMPI users] Fortran90 interfaces--problem?

Re: [OMPI users] BLACS tests fails on IPF

Re: [OMPI users] Fortran90 interfaces--problem?

Re: [OMPI users] performance question

Re: [OMPI users] Fortran90 interfaces--problem?

Re: [OMPI users] configure is too smart !

Re: [OMPI users] MPI_Comm_Spawn

[OMPI users] MPI_PACK very slow?

Re: [OMPI users] MPI_PACK very slow?

Re: [OMPI users] configure is too smart !

Re: [OMPI users] MPI_Comm_Spawn

Re: [OMPI users] MPI_PACK very slow?

Re: [OMPI users] MPI_PACK very slow?

15 matches

Site Navigation

Mail list logo

Footer information