Re: [OMPI users] OMPI seg fault by a class with weird address.

2011-03-16 Thread Jeff Squyres (jsquyres)
Did you run with a memory checking debugger like Valgrind?

Sent from my phone. No type good. 

On Mar 15, 2011, at 8:30 PM, "Jack Bryan"  wrote:

> Hi, 
> 
> I have installed a new open MPI 1.3.4. 
> 
> But I got more weird errors: 
> 
> *** glibc detected *** /lustre/nsga2b: malloc(): memory corruption (fast): 
> 0x1cafc450 ***
> === Backtrace: =
> /lib64/libc.so.6[0x3c50272aeb]
> /lib64/libc.so.6(__libc_malloc+0x7a)[0x3c5027402a]
> /usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x3c590bd17d]
> /lustre/jxding/netplan49/nsga2b[0x445bc6]
> /lustre/jxding/netplan49/nsga2b[0x44f43b]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x3c5021d974]
> /lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499)[0x443909]
> === Memory map: 
> 0040-00f33000 r-xp  6ac:e3210 685016360  
> /lustre/netplan49/nsga2b
> 01132000-0117e000 rwxp 00b32000 6ac:e3210 685016360  
> /lustre/netplan49/nsga2b
> 0117e000-01188000 rwxp 0117e000 00:00 0
> 1ca11000-1ca78000 rwxp 1ca11000 00:00 0
> 1ca78000-1ca79000 rwxp 1ca78000 00:00 0
> 1ca79000-1ca7a000 rwxp 1ca79000 00:00 0
> 1ca7a000-1cab8000 rwxp 1ca7a000 00:00 0
> 1cab8000-1cac7000 rwxp 1cab8000 00:00 0
> 1cac7000-1cacf000 rwxp 1cac7000 00:00 0
> 1cacf000-1cad rwxp 1cacf000 00:00 0
> 1cad-1cad1000 rwxp 1cad 00:00 0
> 1cad1000-1cad2000 rwxp 1cad1000 00:00 0
> 1cad2000-1cada000 rwxp 1cad2000 00:00 0
> 1cada000-1cadc000 rwxp 1cada000 00:00 0
> 1cadc000-1cae rwxp 1cadc000 00:00 0
> 
> .
> 51260-3512605000 r-xp  00:11 12043  
> /usr/lib64/librdmacm.so.1
> 3512605000-3512804000 ---p 5000 00:11 12043  
> /usr/lib64/librdmacm.so.1
> 3512804000-3512805000 rwxp 4000 00:11 12043  
> /usr/lib64/librdmacm.so.1
> 3512e0-3512e0c000 r-xp  00:11 5545   
> /usr/lib64/libibverbs.so.1
> 3512e0c000-351300b000 ---p c000 00:11 5545   
> /usr/lib64/libibverbs.so.1
> 351300b000-351300c000 rwxp b000 00:11 5545   
> /usr/lib64/libibverbs.so.1
> 3c4f20-3c4f21c000 r-xp  00:11 2853   
> /lib64/ld-2.5.so
> 3c4f41b000-3c4f41c000 r-xp 0001b000 00:11 2853   
> /lib64/ld-2.5.so
> 3c4f41c000-3c4f41d000 rwxp 0001c000 00:11 2853   
> /lib64/ld-2.5.so
> 3c5020-3c5034c000 r-xp  00:11 897
> /lib64/libc.so.6
> 3c5034c000-3c5054c000 ---p 0014c000 00:11 897
> /lib64/libc.so.6
> 3c5054c000-3c5055 r-xp 0014c000 00:11 897
> /lib64/libc.so.6
> 3c5055-3c50551000 rwxp 0015 00:11 897
> /lib64/libc.so.6
> 3c50551000-3c50556000 rwxp 3c50551000 00:00 0
> 3c5060-3c50682000 r-xp  00:11 2924   
> /lib64/libm.so.6
> 3c50682000-3c50881000 ---p 00082000 00:11 2924   
> /lib64/libm.so.6
> 3c50881000-3c50882000 r-xp 00081000 00:11 2924   
> /lib64/libm.so.6
> 3c50882000-3c50883000 rwxp 00082000 00:11 2924   
> /lib64/libm.so.6
> 3c50a0-3c50a02000 r-xp  00:11 923
> /lib64/libdl.so.2
> 3c50a02000-3c50c02000 ---p 2000 00:11 923
> /lib64/libdl.so.2
> 3c50c02000-3c50c03000 r-xp 2000 00:11 923
> /lib64/libdl.so.2
> 3c50c03000-3c50c04000 rwxp 3000 00:11 923
> /lib64/libdl.so.2
> 3c50e0-3c50e16000 r-xp  00:11 1011   
> /lib64/libpthread.so.0
> 
> .
> 2ae87b05e000-2ae87b075000 r-xp  6ac:e3210 686492235  
> /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.0
> 2ae87b075000-2ae87b274000 ---p 00017000 6ac:e3210 686492235  
> /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.0
> 2ae87b274000-2ae87b277000 rwxp 00016000 6ac:e3210 686492235  
> /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.0
>  
> 
> 
> fff2fa38000-7fff2fa4e000 rwxp 7ffe9000 00:00 0  
> [stack]
> ff60-ffe0 ---p  00:00 0  
> [vdso]
> [n332:82320] *** Process received signal ***
> [n332:82320] Signal: Aborted (6)
> [n332:82320] Signal code:  (-6)
> [n332:82320] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0]
> [n332:82320] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3c50230215]
> [n332:82320] [ 2] /lib64/libc.so.6(abort+0x110) [0x3c50231cc0]
> [n332:82320] [ 3] /lib64/libc.so.6 [0x3c5026a7fb]
> [n332:82320] [ 4] /lib64/libc.so.6 [0x3c50272aeb]
> [n332:82320] [ 5] /lib64/libc.so.6(__libc_malloc+0x7a) [0x3c5027402a]
> [n332:82320] [ 6] /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x3c590bd17d]
> [n332:82320] [ 7] /lustre/jxding/netplan49/nsga2b [0x445bc6]
> [n33

[OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread hi
Greetings!!!



I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 (64-bit
OS), but getting some difficuty...



My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"



Approach: followed the "First Approach" described in README.WINDOWS file.



*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This error is related to
getting SVN version information as described in
http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we are
using this openmpi-1.3.4 stable version on Linux platform, is there any fix
to this compile time error?



*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e. Skipping MPI F77
interface).

Now to get the "mpif*.exe" for fortran programs, I provided proper
"ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or
OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...

*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 (OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!



*   2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Skipping MPI F77 interface

CMake Error: File
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):

configure_file Problem configuring file

looking for ccp...

looking for ccp...not found.

looking for ccp...

looking for ccp...not found.

Configuring incomplete, errors occurred!



*   2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and
OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 (OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!



Any idea on resolving above errors to get mpif*.exe generated on Windows
platform using "ifort"?

Please let me know if more information is required.
Thank you in advance.

-Hiral


Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread Damien

Hi Hiral,

The 1.4 series doesn't have Fortran support on Windows.  You need to use 
1.5.


Damien

On 16/03/2011 4:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 
(64-bit OS), but getting some difficuty...


My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This error is 
related to getting SVN version information as described in 
http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we 
are using this openmpi-1.3.4 stable version on Linux platform, is 
there any fix to this compile time error?


*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e. Skipping MPI 
F77 interface).


Now to get the "mpif*.exe" for fortran programs, I provided proper 
"ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or 
OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...


*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at 
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76 (MESSAGE):


Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 
(OMPI_F77_GET_SIZEOF)


contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123 
(OMPI_F77_CHECK)


CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

*2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Skipping MPI F77 interface

CMake Error: File 
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake 
does not exist.


CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File 
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake 
does not exist.


CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):

configure_file Problem configuring file

looking for ccp...

looking for ccp...not found.

looking for ccp...

looking for ccp...not found.

Configuring incomplete, errors occurred!

*2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and 
OMPI_WANT_F90_BINDINGS=ON gave following errors... *


Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at 
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76 (MESSAGE):


Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 
(OMPI_F77_GET_SIZEOF)


contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123 
(OMPI_F77_CHECK)


CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

Any idea on resolving above errors to get mpif*.exe generated on 
Windows platform using "ifort"?


Please let me know if more information is required.
Thank you in advance.

-Hiral


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread Shiqing Fan

Hi Hiral,

1.3.4 is quite old, please use the latest version. As Damien noted, the 
full fortran support is in 1.5 series, it's only experimental in 1.4 
series. And there is only F77 bingdings on Windows, no F90 bindings. 
Another choice is to use the released binary installers to avoid 
compiling everything by yourself.



Best Regards,
Shiqing

On 3/16/2011 11:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 
(64-bit OS), but getting some difficuty...


My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This error is 
related to getting SVN version information as described in 
http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we 
are using this openmpi-1.3.4 stable version on Linux platform, is 
there any fix to this compile time error?


*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e. Skipping MPI 
F77 interface).


Now to get the "mpif*.exe" for fortran programs, I provided proper 
"ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or 
OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...


*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at 
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76 (MESSAGE):


Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 
(OMPI_F77_GET_SIZEOF)


contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123 
(OMPI_F77_CHECK)


CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

*2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Skipping MPI F77 interface

CMake Error: File 
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake 
does not exist.


CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File 
C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake 
does not exist.


CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):

configure_file Problem configuring file

looking for ccp...

looking for ccp...not found.

looking for ccp...

looking for ccp...not found.

Configuring incomplete, errors occurred!

*2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and 
OMPI_WANT_F90_BINDINGS=ON gave following errors... *


Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at 
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76 (MESSAGE):


Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82 
(OMPI_F77_GET_SIZEOF)


contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123 
(OMPI_F77_CHECK)


CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

Any idea on resolving above errors to get mpif*.exe generated on 
Windows platform using "ifort"?


Please let me know if more information is required.
Thank you in advance.

-Hiral


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread hi
Hi Shiquing / Damien,

Thanks for the quick reply.

> it's only experimental in 1.4 series. And there is only F77 bingdings on
Windows, no F90 bindings.
Can you please provide steps to build 1.4.3 with experimental f77 bindings
on Windows?

BTW: Do you have any idea on: when next stable release with full fortran
support on Windows would be available?

Thank you.
-Hiral
On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan  wrote:

> Hi Hiral,
>
> 1.3.4 is quite old, please use the latest version. As Damien noted, the
> full fortran support is in 1.5 series, it's only experimental in 1.4 series.
> And there is only F77 bingdings on Windows, no F90 bindings. Another choice
> is to use the released binary installers to avoid compiling everything by
> yourself.
>
>
> Best Regards,
> Shiqing
>
>
> On 3/16/2011 11:47 AM, hi wrote:
>
>   Greetings!!!
>
>
>
> I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 (64-bit
> OS), but getting some difficuty...
>
>
>
> My build environment:
>
> OS : Windows 7 (64-bit)
>
> C/C++ compiler : Visual Studio 2008 and Visual Studio 2010
>
> Fortran compiler: Intel "ifort"
>
>
>
> Approach: followed the "First Approach" described in README.WINDOWS file.
>
>
>
> *1) Using openmpi-1.3.4:***
>
> Observed build time error in version.cc(136). This error is related to
> getting SVN version information as described in
> http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we are
> using this openmpi-1.3.4 stable version on Linux platform, is there any fix
> to this compile time error?
>
>
>
> *2) Using openmpi-1.4.3:***
>
> Builds properly without F77/F90 support (i.e. i.e. Skipping MPI F77
> interface).
>
> Now to get the "mpif*.exe" for fortran programs, I provided proper
> "ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or
> OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...
>
> *   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following errors... *
>
> Check ifort external symbol convention...
>
> Check ifort external symbol convention...single underscore
>
> Check if Fortran 77 compiler supports LOGICAL...
>
> Check if Fortran 77 compiler supports LOGICAL...done
>
> Check size of Fortran 77 LOGICAL...
>
> CMake Error at contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
> (MESSAGE):
>
> Could not determine size of LOGICAL.
>
> Call Stack (most recent call first):
>
> contrib/platform/win32/CMakeModules/f77_check.cmake:82
> (OMPI_F77_GET_SIZEOF)
>
> contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
> (OMPI_F77_CHECK)
>
> CMakeLists.txt:87 (INCLUDE)
>
> Configuring incomplete, errors occurred!
>
>
>
> *   2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following errors... *
>
> Skipping MPI F77 interface
>
> CMake Error: File
> C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
> does not exist.
>
> CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):
>
> configure_file Problem configuring file
>
> CMake Error: File
> C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
> does not exist.
>
> CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):
>
> configure_file Problem configuring file
>
> looking for ccp...
>
> looking for ccp...not found.
>
> looking for ccp...
>
> looking for ccp...not found.
>
> Configuring incomplete, errors occurred!
>
>
>
> *   2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and
> OMPI_WANT_F90_BINDINGS=ON gave following errors... *
>
> Check ifort external symbol convention...
>
> Check ifort external symbol convention...single underscore
>
> Check if Fortran 77 compiler supports LOGICAL...
>
> Check if Fortran 77 compiler supports LOGICAL...done
>
> Check size of Fortran 77 LOGICAL...
>
> CMake Error at contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
> (MESSAGE):
>
> Could not determine size of LOGICAL.
>
> Call Stack (most recent call first):
>
> contrib/platform/win32/CMakeModules/f77_check.cmake:82
> (OMPI_F77_GET_SIZEOF)
>
> contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
> (OMPI_F77_CHECK)
>
> CMakeLists.txt:87 (INCLUDE)
>
> Configuring incomplete, errors occurred!
>
>
>
> Any idea on resolving above errors to get mpif*.exe generated on Windows
> platform using "ifort"?
>
> Please let me know if more information is required.
> Thank you in advance.
>
> -Hiral
>
>
> ___
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> ---
> Shiqing Fan
> High Performance Computing Center Stuttgart (HLRS)
> Tel: ++49(0)711-685-87234  Nobelstrasse 19
> Fax: ++49(0)711-685-65832  70569 
> Stuttgarthttp://www.hlrs.de/organization/people/shiqing-fan/
> email: f...@hlrs.de
>
>


Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread Shiqing Fan

Hi Hiral,

> it's only experimental in 1.4 series. And there is only F77 
bingdings on Windows, no F90 bindings.
Can you please provide steps to build 1.4.3 with experimental f77 
bindings on Windows?
Well, I highly recommend to use 1.5 series, but I can also take a look 
and probably provide you a patch for 1.4 .


BTW: Do you have any idea on: when next stable release with full 
fortran support on Windows would be available?

There is no plan yet.


Regards,
Shiqing


Thank you.
-Hiral
On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan > wrote:


Hi Hiral,

1.3.4 is quite old, please use the latest version. As Damien
noted, the full fortran support is in 1.5 series, it's only
experimental in 1.4 series. And there is only F77 bingdings on
Windows, no F90 bindings. Another choice is to use the released
binary installers to avoid compiling everything by yourself.


Best Regards,
Shiqing


On 3/16/2011 11:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7
(64-bit OS), but getting some difficuty...

My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in
README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This error is
related to getting SVN version information as described in
http://www.open-mpi.org/community/lists/users/2010/01/11860.php.
As we are using this openmpi-1.3.4 stable version on Linux
platform, is there any fix to this compile time error?

*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e. Skipping
MPI F77 interface).

Now to get the "mpif*.exe" for fortran programs, I provided
proper "ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON"
and/or OMPI_WANT_F90_BINDINGS=ON flag; but getting following
errors...

*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following
errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82
(OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

*2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following
errors... *

Skipping MPI F77 interface

CMake Error: File

C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File

C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):

configure_file Problem configuring file

looking for ccp...

looking for ccp...not found.

looking for ccp...

looking for ccp...not found.

Configuring incomplete, errors occurred!

*2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and
OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82
(OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

Any idea on resolving above errors to get mpif*.exe generated on
Windows platform using "ifort"?

Please let me know if more information is required.
Thank you in advance.

-Hiral


___
users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
---

Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
 

Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-16 Thread Damien

Hiral,

To add to Shiqing's comments, 1.5 has been running great for me on 
Windows for over 6 months since it was in beta.  You should give it a try.


Damien

On 16/03/2011 8:34 AM, Shiqing Fan wrote:

Hi Hiral,

> it's only experimental in 1.4 series. And there is only F77 
bingdings on Windows, no F90 bindings.
Can you please provide steps to build 1.4.3 with experimental f77 
bindings on Windows?
Well, I highly recommend to use 1.5 series, but I can also take a look 
and probably provide you a patch for 1.4 .


BTW: Do you have any idea on: when next stable release with full 
fortran support on Windows would be available?

There is no plan yet.


Regards,
Shiqing


Thank you.
-Hiral
On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan > wrote:


Hi Hiral,

1.3.4 is quite old, please use the latest version. As Damien
noted, the full fortran support is in 1.5 series, it's only
experimental in 1.4 series. And there is only F77 bingdings on
Windows, no F90 bindings. Another choice is to use the released
binary installers to avoid compiling everything by yourself.


Best Regards,
Shiqing


On 3/16/2011 11:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows
7 (64-bit OS), but getting some difficuty...

My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in
README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This error is
related to getting SVN version information as described in
http://www.open-mpi.org/community/lists/users/2010/01/11860.php.
As we are using this openmpi-1.3.4 stable version on Linux
platform, is there any fix to this compile time error?

*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e. Skipping
MPI F77 interface).

Now to get the "mpif*.exe" for fortran programs, I provided
proper "ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON"
and/or OMPI_WANT_F90_BINDINGS=ON flag; but getting following
errors...

*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following
errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82
(OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

*2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following
errors... *

Skipping MPI F77 interface

CMake Error: File

C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File

C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):

configure_file Problem configuring file

looking for ccp...

looking for ccp...not found.

looking for ccp...

looking for ccp...not found.

Configuring incomplete, errors occurred!

*2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and
OMPI_WANT_F90_BINDINGS=ON gave following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82
(OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

Any idea on resolving above errors to get mpif*.exe generated on
Windows platform using "ifort"?

Please let me know if more information is required.
Thank you in advance.

-Hiral


___
users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] MPI_AllReduce() deadlock on IB

2011-03-16 Thread Brock Palen
I have a user whos code when ran on ethernet performs fine. When ran on verbs 
based IB the code deadlocks in an MPI_AllReduce() call.

We are using openmpi/1.4.3  with the intel compilers.

I poked at the running code with padb and I get the following:

0515253545
,,---,-,-,,--,--,,-,,---,,,--,-,-,
,,-,-,,,-,,--,-,,-,-,-,--,-,---,,,--,,---,
,,---,--,-,,-,-,,,-,--,,-,
--,,-,--,,--,,--,,,,--,--,


For multiple runs which ranks are stuck in AllReduce() changes, 
Is there any open bugs?  I found one but only on shared memory and our version 
should be new enough (from what I could tell) to avoid it.

Thanks,  what should I look for to diagnose the issue?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985






Re: [OMPI users] OMPI seg fault by a class with weird address.

2011-03-16 Thread Jack Bryan

Although I do not think that Valgrind supports mpic++, I have tried to run it. 
This is what I got, 
thanks==18729== 
Memcheck, a memory error detector==18729== Copyright (C) 2002-2009, and GNU 
GPL'd, by Julian Seward et al.==18729== Using Valgrind-3.5.0 and LibVEX; rerun 
with -h for copyright info==18729== Command: ./nsga2b==18729== Parent PID: 
18726==18729==--1872918729-- Valgrind options:--18729--
--tool=memcheck--18729----error-limit=no--18729--
--leak-check=full--18729----log-file=nsga2b_valg.log--18729--
-v--18729-- Contents of /proc/version:--18729--   Linux version 
2.6.18-128.1.1.el5.530g (root@kalamata) (gcc version 4.1.2 20071124 (Red 
Hat 4.1.2-42)) #1 SMP Tue Mar 17 21:49:24 EDT 2009--18729-- Arch and hwcaps: 
AMD64, amd64-sse3-cx16--18729-- Page sizes: currently 4096, max supported 
4096--18729-- Valgrind library directory: /usr/lib64/valgrind--18729-- Reading 
syms from /lustre/nsga2b (0x40)--18729-- warning: DiCfSI 0x0 .. 0x0 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x1 .. 0x3 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x4 .. 0x2a outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x0 .. 0x0 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x1 .. 0x3 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x4 .. 0x2a outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x0 .. 0x0 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x1 .. 0x3 outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0x4 .. 0xb outside 
segment 0x4438f0 .. 0xd81e77--18729-- warning: DiCfSI 0xc .. 0xaa outside 
segment 0x4438f0 .. 0xd81e77--18729-- Reading syms from 
/usr/lib64/valgrind/memcheck-amd64-linux (0x3800)--18729--object 
doesn't have a dynamic symbol table--18729-- Reading syms from /lib64/ld-2.5.so 
(0x3f75c0)--18729-- Reading suppressions file: 
/usr/lib64/valgrind/default.supp--18729-- REDIR: 0x3f75c145d0 (strlen) 
redirected to 0x3803e767 (vgPlain_amd64_linux_REDIR_FOR_strlen)--18729-- 
Reading syms from /usr/lib64/valgrind/vgpreload_core-amd64-linux.so 
(0x4802000)--18729-- Reading syms from 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so (0x4a03000)==18729== 
WARNING: new redirection conflicts with existing -- ignoring it--18729-- 
new: 0x3f75c145d0 (strlen  ) R-> 0x04a06dc0 strlen--18729-- REDIR: 
0x3f75c143f0 (index) redirected to 0x4a06c30 (index)--18729-- REDIR: 
0x3f75c145a0 (strcmp) redirected to 0x4a06e90 (strcmp)--18729-- Reading syms 
from /opt/openmpi-1.3.4-gnu/lib/libmpi_cxx.so.0.0.0 (0x4c0a000)--18729-- 
Reading syms from /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0.0.1 (0x4e2600
--18729-- Reading syms from /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0.0.1 
(0x4e26000)--18729-- Reading syms from 
/opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0.0.0 (0x5258000)--18729-- Reading 
syms from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0.0.0 (0x54db000)--18729-- 
Reading syms from /usr/lib64/librdmacm.so.1.0.0 (0x3f7700)--18729--
object doesn't have a symbol table--18729-- Reading syms from 
/usr/lib64/libibverbs.so.1.0.0 (0x3f7640)--18729--object doesn't have a 
symbol table--18729-- Reading syms from /usr/lib64/libdat.so.1.0.2 
(0x5778000)--18729--object doesn't have a symbol table--18729-- Reading 
syms from /scratch/torque-2.4.2/lib/libtorque.so.2.0.0 (0x5982000)--18729-- 
Reading syms from /lib64/libdl-2.5.so (0x3f7680)--18729-- Reading syms from 
/lib64/libnsl-2.5.so (0x3f7fe0)--18729-- Reading syms from 
/lib64/libutil-2.5.so (0x3f84e0)--18729-- Reading syms from 
/lib64/libm-2.5.so (0x5c97000)--18729-- Reading syms from 
/usr/lib64/libstdc++.so.6.0.8 (0x3f7c80)--18729--object doesn't have a 
symbol table--18729-- Reading syms from /lib64/libgcc_s-4.1.2-20080825.so.1 
(0x3f7b80)--18729--object doesn't have a symbol table--18729-- Reading 
syms from /lib64/libpthread-2.5.so (0x3f76c0)--18729-- Reading syms from 
/lib64/libc-2.5.so (0x3f7600)--18729-- REDIR: 0x3f7607ae00 (memset) 
redirected to 0x4a07030 (memset)--18729-- REDIR: 0x3f7607c240 (memcpy) 
redirected to 0x4a08030 (memcpy)--18729-- REDIR: 0x3f76079f40 (rindex) 
redirected to 0x4a06ae0 (rindex)--18729-- REDIR: 0x3f76079b50 (strlen) 
redirected to 0x4a06d80 (strlen)--18729-- REDIR: 0x3f76074dc0 (malloc) 
redirected to 0x4a05d9a (malloc)--18729-- REDIR: 0x3f76072870 (free) redirected 
to 0x4a059aa (free)--18729-- REDIR: 0x3f76079e90 (strncpy) redirected to 
0x4a081a0 (strncpy)--18729-- REDIR: 0x3f76079dd0 (strncmp) redirected to 
0x4a06de0 (strncmp)--18729-- REDIR: 0x3f760749e0 (calloc) redirected to 
0x4a05092 (calloc)--18729-- REDIR: 0x3f7c8bd1c0 (operator new(unsigned long)) 
redirected to 0x4a065ea (operator new(unsigned long))--18729-- REDIR: 
0x3f7607b930 (mempcpy) redirected to 0x4a07870 (mempcpy)--18729-- REDIR: 
0xff600400 (???) redirected to 0

[OMPI users] Issues in 1.4.3 version and system cpu usage

2011-03-16 Thread Claudio Baeza Retamal

Dears folks,

Recently,  I was upgrading openmpi from 1.4.2 to 1.4.3, now, I can see 
that system cpu usage is too high, for example, namd application has 40% 
system cpu and HPL benchmark has 80%! of system cpu usage, in the 
previous version, the system cpu  usage was never more than  5%.


I was reading user list archives and I found threads that speak about 
this, my question is, version 1.4.3 has issues ?  Downloadable version 
is not patched?


Regards

--
Claudio Baeza Retamal
CDO
National Laboratory for High Performance Computing (NLHPC)
Center for Mathematical Modeling (CMM)
School of Engineering and Sciences
Universidad de Chile





Re: [OMPI users] OMPI seg fault by a class with weird address.

2011-03-16 Thread Jeff Squyres
Make sure you have the latest version of valgrind.

But it definitely does highlight what could be real problems if you read down 
far enough in the output.

> ==18729== Invalid write of size 8
> ==18729==at 0x443BEF: initPopPara(population*, 
> std::vector std::allocator >&, initParaType&, int, int, 
> std::vector >&) (main-parallel2.cpp:552)
> ==18729==by 0x44F12E: main (main-parallel2.cpp:204)
> ==18729==  Address 0x62c9da0 is 0 bytes after a block of size 0 alloc'd
> ==18729==at 0x4A0666E: operator new(unsigned long) 
> (vg_replace_malloc.c:220)
> ==18729==by 0x4573E4: void 
> std::__uninitialized_fill_n_aux message_para_to_workersT>(message_para_to_workersT*, unsigned long, 
> message_para_to_workersT const&, __false_type) (new_allocator.h:88)
> ==18729==by 0x4576CF: void 
> std::__uninitialized_fill_n_a message_para_to_workersT, 
> message_para_to_workersT>(message_para_to_workersT*, unsigned long, 
> message_para_to_workersT const&, std::allocator) 
> (stl_uninitialized.h:218)
> ==18729==by 0x44EE2E: main (stl_vector.h:218)

The above is an invalid read of write of size 8 -- you're essentially writing 
outside of an array. 

Valgrind is showing you the call stack to how it got there.  Looks like you 
new'ed or malloc'ed a block of size 0 and then tried to write something to it.  
Writing to memory that you don't own is a no-no; it can cause Very Bad Things 
to happen.

You should probably investigate this, and the other issues that it is reporting 
(e.g., the next invalid read of size 8).

> ==18729==
> ==18729== Invalid read of size 8
> ==18729==at 0x44F13A: main (main-parallel2.cpp:208)
> ==18729==  Address 0x62c9d60 is 0 bytes after a block of size 0 alloc'd
> ==18729==at 0x4A0666E: operator new(unsigned long) 
> (vg_replace_malloc.c:220)
> ==18729==by 0x45733D: void 
> std::__uninitialized_fill_n_aux message_para_to_workersT>(message_para_to_workersT*, unsigned long, 
> message_para_to_workersT const&, __false_type) (new_allocator.h:88)
> ==18729==by 0x4576CF: void 
> std::__uninitialized_fill_n_a message_para_to_workersT, 
> message_para_to_workersT>(message_para_to_workersT*, unsigned long, 
> message_para_to_workersT const&, std::allocator) 
> (stl_uninitialized.h:218)
> ==18729==by 0x44EE2E: main (stl_vector.h:218)
> ==18729==
> 
> valgrind: m_mallocfree.c:225 (mk_plain_bszB): Assertion 'bszB != 0' failed.
> valgrind: This is probably caused by your program erroneously writing past the
> end of a heap block and corrupting heap metadata.  If you fix any
> invalid writes reported by Memcheck, this assertion failure will
> 
> probably go away.  Please try that before reporting this as a bug.
> 
> ==18729==at 0x38029D5C: report_and_quit (m_libcassert.c:145)
> ==18729==by 0x3802A032: vgPlain_assert_fail (m_libcassert.c:217)
> ==18729==by 0x38035645: vgPlain_arena_malloc (m_mallocfree.c:225)
> ==18729==by 0x38002BB5: vgMemCheck_new_block (mc_malloc_wrappers.c:199)
> ==18729==by 0x38002F6B: vgMemCheck___builtin_new 
> (mc_malloc_wrappers.c:246)
> ==18729==by 0x3806070C: do_client_request (scheduler.c:1362)
> ==18729==by 0x38061D30: vgPlain_scheduler (scheduler.c:1061)
> ==18729==by 0x38085E6E: run_a_thread_NORETURN (syswrap-linux.c:91)
> 
> sched status:
>   running_tid=1
> 
> Thread 1: status = VgTs_Runnable
> ==18729==at 0x4A0666E: operator new(unsigned long) 
> (vg_replace_malloc.c:220)
> ==18729==by 0x464506: __gnu_cxx::new_allocator::allocate(unsigned 
> long, void const*) (new_allocator.h:88)
> ==18729==by 0x46452E: std::_Vector_base 
> >::_M_allocate(unsigned long) (stl_vector.h:127)
> ==18729==by 0x464560: std::_Vector_base 
> >::_Vector_base(unsigned long, std::allocator const&) (stl_vector.h:113)
> ==18729==by 0x464B6A: std::vector 
> >::vector(unsigned long, int const&, std::allocator const&) 
> (stl_vector.h:216)
> ==18729==by 0x488F62: Index::Index() (index.cpp:20)
> ==18729==by 0x489147: ReadFile(char const*) (index.cpp:86)
> ==18729==by 0x48941C: ImportIndices() (index.cpp:121)
> ==18729==by 0x445D00: myNeplanTaskScheduler(CNSGA2*, int, int, int, 
> population*, char, int, std::vector std::allocator >&, ompi_datatype_t*, int&, int&, 
> std::vector >, 
> std::allocator > > >&, 
> std::vector >, 
> std::allocator > > >&, 
> std::vector >&, int, 
> std::vector >, 
> std::allocator > > >&, 
> ompi_datatype_t*, int, ompi_datatype_t*, int) (myNetplanScheduler.cpp:109)
> ==18729==by 0x44F2DF: main (main-parallel2.cpp:216)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] MPI_AllReduce() deadlock on IB

2011-03-16 Thread Jeff Squyres
This could be related to https://svn.open-mpi.org/trac/ompi/ticket/2714 and/or 
https://svn.open-mpi.org/trac/ompi/ticket/2722.

There isn't much info in the ticket, but we've been talking about it a bunch 
offline.  IBM and Mellanox have had reports of the error, but haven't been able 
to reproduce it reliably.  It *seems* to be a race condition in the "oob" 
connection model of the openib BTL.

If you run with --mca btl_openib_cpc_include rdmacm, does the problem go away?


On Mar 16, 2011, at 11:27 AM, Brock Palen wrote:

> I have a user whos code when ran on ethernet performs fine. When ran on verbs 
> based IB the code deadlocks in an MPI_AllReduce() call.
> 
> We are using openmpi/1.4.3  with the intel compilers.
> 
> I poked at the running code with padb and I get the following:
> 
> 0515253545
> ,,---,-,-,,--,--,,-,,---,,,--,-,-,
> ,,-,-,,,-,,--,-,,-,-,-,--,-,---,,,--,,---,
> ,,---,--,-,,-,-,,,-,--,,-,
> --,,-,--,,--,,--,,,,--,--,
> 
> 
> For multiple runs which ranks are stuck in AllReduce() changes, 
> Is there any open bugs?  I found one but only on shared memory and our 
> version should be new enough (from what I could tell) to avoid it.
> 
> Thanks,  what should I look for to diagnose the issue?
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Issues in 1.4.3 version and system cpu usage

2011-03-16 Thread Jeff Squyres
CPU time is fairly meaningless.

Is the overall run time of your application significantly different?


On Mar 16, 2011, at 1:45 PM, Claudio Baeza Retamal wrote:

> Dears folks,
> 
> Recently,  I was upgrading openmpi from 1.4.2 to 1.4.3, now, I can see that 
> system cpu usage is too high, for example, namd application has 40% system 
> cpu and HPL benchmark has 80%! of system cpu usage, in the previous version, 
> the system cpu  usage was never more than  5%.
> 
> I was reading user list archives and I found threads that speak about this, 
> my question is, version 1.4.3 has issues ?  Downloadable version is not 
> patched?
> 
> Regards
> 
> -- 
> Claudio Baeza Retamal
> CDO
> National Laboratory for High Performance Computing (NLHPC)
> Center for Mathematical Modeling (CMM)
> School of Engineering and Sciences
> Universidad de Chile
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/