Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-17 Thread hi
Hi,

I tried building openmpi-1.5.2 on Windows 7 (as described below environment)
with OMPI_WANT_F77_BINDINGS_ON and
OMPI_WANT_F90_BINDINGS_ON using "ifort".

I observed that it has generated mpif77.exe but didn't generated mpif90.exe,
any idea???

BTW: while using above generated mpif77.exe to compile hello_f77.f got
following errors...

c:\openmpi-1.5.2\examples> mpif77 hello_f77.f
Intel(R) Visual Fortran Compiler Professional for applications running on
IA-32,
 Version 11.1Build 20100414 Package ID: w_cprof_p_11.1.065
Copyright (C) 1985-2010 Intel Corporation.  All rights reserved.
C:/openmpi-1.5.2/installed/include\mpif-config.h(91): error #5082: Syntax
error,
 found ')' when expecting one of: (  
   ...
  parameter (MPI_STATUS_SIZE=)
-^
compilation aborted for hello_f77.f (code 1)

Thank you.
-Hiral


On Wed, Mar 16, 2011 at 8:11 PM, Damien  wrote:


> Hiral,
>
> To add to Shiqing's comments, 1.5 has been running great for me on Windows
> for over 6 months since it was in beta.  You should give it a try.
>
> Damien
>
> On 16/03/2011 8:34 AM, Shiqing Fan wrote:
>
> Hi Hiral,
>
>
>
> > it's only experimental in 1.4 series. And there is only F77 bingdings on
> Windows, no F90 bindings.
> Can you please provide steps to build 1.4.3 with experimental f77 bindings
> on Windows?
>
> Well, I highly recommend to use 1.5 series, but I can also take a look and
> probably provide you a patch for 1.4 .
>
>
>
> BTW: Do you have any idea on: when next stable release with full fortran
> support on Windows would be available?
>
> There is no plan yet.
>
>
> Regards,
> Shiqing
>
>
>
>
> Thank you.
> -Hiral
>
> On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan  wrote:
>
>
>> Hi Hiral,
>>
>> 1.3.4 is quite old, please use the latest version. As Damien noted, the
>> full fortran support is in 1.5 series, it's only experimental in 1.4 series.
>> And there is only F77 bingdings on Windows, no F90 bindings. Another choice
>> is to use the released binary installers to avoid compiling everything by
>> yourself.
>>
>>
>> Best Regards,
>> Shiqing
>>
>> On 3/16/2011 11:47 AM, hi wrote:
>>
>>  Greetings!!!
>>
>>
>>
>> I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 (64-bit
>> OS), but getting some difficuty...
>>
>>
>>
>> My build environment:
>>
>> OS : Windows 7 (64-bit)
>>
>> C/C++ compiler : Visual Studio 2008 and Visual Studio 2010
>>
>> Fortran compiler: Intel "ifort"
>>
>>
>>
>> Approach: followed the "First Approach" described in README.WINDOWS file.
>>
>>
>>
>> *1) Using openmpi-1.3.4:***
>>
>> Observed build time error in version.cc(136). This error is related to
>> getting SVN version information as described in
>> http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we
>> are using this openmpi-1.3.4 stable version on Linux platform, is there any
>> fix to this compile time error?
>>
>>
>>
>> *2) Using openmpi-1.4.3:***
>>
>> Builds properly without F77/F90 support (i.e. i.e. Skipping MPI F77
>> interface).
>>
>> Now to get the "mpif*.exe" for fortran programs, I provided proper
>> "ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or
>> OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...
>>
>> *   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following errors...
>> *
>>
>> Check ifort external symbol convention...
>>
>> Check ifort external symbol convention...single underscore
>>
>> Check if Fortran 77 compiler supports LOGICAL...
>>
>> Check if Fortran 77 compiler supports LOGICAL...done
>>
>> Check size of Fortran 77 LOGICAL...
>>
>> CMake Error at contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
>> (MESSAGE):
>>
>> Could not determine size of LOGICAL.
>>
>> Call Stack (most recent call first):
>>
>> contrib/platform/win32/CMakeModules/f77_check.cmake:82
>> (OMPI_F77_GET_SIZEOF)
>>
>> contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
>> (OMPI_F77_CHECK)
>>
>> CMakeLists.txt:87 (INCLUDE)
>>
>> Configuring incomplete, errors occurred!
>>
>>
>>
>> *   2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following errors...
>> *
>>
>> Skipping MPI F77 interface
>>
>> CMake Error: File
>> C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
>> does not exist.
>>
>> CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):
>>
>> configure_file Problem configuring file
>>
>> CMake Error: File
>> C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
>> does not exist.
>>
>> CMake Error at ompi/tools/CMakeLists.txt:97 (CONFIGURE_FILE):
>>
>> configure_file Problem configuring file
>>
>> looking for ccp...
>>
>> looking for ccp...not found.
>>
>> looking for ccp...
>>
>> looking for ccp...not found.
>>
>> Configuring incomplete, errors occurred!
>>
>>
>>
>> *   2.c) "ifort" with OMPI_WANT_F77_BINDINGS=ON and
>> OMPI_WANT_F90_BINDINGS=ON gave following errors... *
>>
>> Check ifort external symbol convention...
>>
>> Check ifor

Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-17 Thread Shiqing Fan


I tried building openmpi-1.5.2 on Windows 7 (as described below 
environment) with OMPI_WANT_F77_BINDINGS_ON and

OMPI_WANT_F90_BINDINGS_ON using "ifort".
I observed that it has generated mpif77.exe but didn't generated 
mpif90.exe, any idea???


There is no f90 bindings at moment for Windows.

BTW: while using above generated mpif77.exe to compile hello_f77.f got 
following errors...


c:\openmpi-1.5.2\examples> mpif77 hello_f77.f
Intel(R) Visual Fortran Compiler Professional for applications
running on IA-32,
 Version 11.1Build 20100414 Package ID: w_cprof_p_11.1.065
Copyright (C) 1985-2010 Intel Corporation.  All rights reserved.
C:/openmpi-1.5.2/installed/include\mpif-config.h(91): error #5082:
Syntax error,
 found ')' when expecting one of: ( 
...
  parameter (MPI_STATUS_SIZE=)
-^
compilation aborted for hello_f77.f (code 1)

It seems MPI_STATUS_SIZE is not set. Could you please send me your 
CMakeCache.txt to me off the mailing list, so that I can check what is 
going wrong? A quick solution would be just set it to 0.


Regards,
Shiqing


Thank you.
-Hiral
On Wed, Mar 16, 2011 at 8:11 PM, Damien > wrote:


Hiral,
To add to Shiqing's comments, 1.5 has been running great for me on
Windows for over 6 months since it was in beta.  You should give
it a try.
Damien
On 16/03/2011 8:34 AM, Shiqing Fan wrote:

Hi Hiral,

> it's only experimental in 1.4 series. And there is only F77
bingdings on Windows, no F90 bindings.
Can you please provide steps to build 1.4.3 with experimental
f77 bindings on Windows?

Well, I highly recommend to use 1.5 series, but I can also take a
look and probably provide you a patch for 1.4 .

BTW: Do you have any idea on: when next stable release with full
fortran support on Windows would be available?

There is no plan yet.
Regards,
Shiqing

Thank you.
-Hiral
On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan mailto:f...@hlrs.de>> wrote:

Hi Hiral,
1.3.4 is quite old, please use the latest version. As Damien
noted, the full fortran support is in 1.5 series, it's only
experimental in 1.4 series. And there is only F77 bingdings
on Windows, no F90 bindings. Another choice is to use the
released binary installers to avoid compiling everything by
yourself.
Best Regards,
Shiqing
On 3/16/2011 11:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on
Windows 7 (64-bit OS), but getting some difficuty...

My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in
README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This
error is related to getting SVN version information as
described in
http://www.open-mpi.org/community/lists/users/2010/01/11860.php.
As we are using this openmpi-1.3.4 stable version on Linux
platform, is there any fix to this compile time error?

*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e.
Skipping MPI F77 interface).

Now to get the "mpif*.exe" for fortran programs, I
provided proper "ifort" path and enabled
"OMPI_WANT_F77_BINDINGS=ON" and/or
OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...

*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave
following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeModules/f77_check.cmake:82
(OMPI_F77_GET_SIZEOF)

contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
(OMPI_F77_CHECK)

CMakeLists.txt:87 (INCLUDE)

Configuring incomplete, errors occurred!

*2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following
errors... *

Skipping MPI F77 interface

CMake Error: File

C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/mpif90-wrapper-data.txt.cmake
does not exist.

CMake Error at ompi/tools/CMakeLists.txt:93 (CONFIGURE_FILE):

configure_file Problem configuring file

CMake Error: File

C:/op

Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-17 Thread hi
Hi Shiqing,

Yes, it was parameter (MPI_STATUS_SIZE=) in mpif-config.h file.

BTW: see the attached CMakeCache.txt.

> There is no f90 bindings at moment for Windows.
Any idea when this available?

Thank you.
-Hiral

On Thu, Mar 17, 2011 at 5:21 PM, Shiqing Fan  wrote:

>
>  I tried building openmpi-1.5.2 on Windows 7 (as described below
> environment) with OMPI_WANT_F77_BINDINGS_ON and
> OMPI_WANT_F90_BINDINGS_ON using "ifort".
>
> I observed that it has generated mpif77.exe but didn't generated
> mpif90.exe, any idea???
>
>
> There is no f90 bindings at moment for Windows.
>
>
>  BTW: while using above generated mpif77.exe to compile hello_f77.f got
> following errors...
>
> c:\openmpi-1.5.2\examples> mpif77 hello_f77.f
> Intel(R) Visual Fortran Compiler Professional for applications running on
> IA-32,
>  Version 11.1Build 20100414 Package ID: w_cprof_p_11.1.065
> Copyright (C) 1985-2010 Intel Corporation.  All rights reserved.
> C:/openmpi-1.5.2/installed/include\mpif-config.h(91): error #5082: Syntax
> error,
>  found ')' when expecting one of: (  
>  _KIND_PARAM>   ...
>   parameter (MPI_STATUS_SIZE=)
> -^
> compilation aborted for hello_f77.f (code 1)
>
> It seems MPI_STATUS_SIZE is not set. Could you please send me your
> CMakeCache.txt to me off the mailing list, so that I can check what is going
> wrong? A quick solution would be just set it to 0.
>
>
> Regards,
> Shiqing
>
>  Thank you.
> -Hiral
>
>
> On Wed, Mar 16, 2011 at 8:11 PM, Damien  wrote:
>
>
>> Hiral,
>>
>> To add to Shiqing's comments, 1.5 has been running great for me on Windows
>> for over 6 months since it was in beta.  You should give it a try.
>>
>> Damien
>>
>> On 16/03/2011 8:34 AM, Shiqing Fan wrote:
>>
>> Hi Hiral,
>>
>>
>>
>> > it's only experimental in 1.4 series. And there is only F77 bingdings on
>> Windows, no F90 bindings.
>> Can you please provide steps to build 1.4.3 with experimental f77 bindings
>> on Windows?
>>
>> Well, I highly recommend to use 1.5 series, but I can also take a look and
>> probably provide you a patch for 1.4 .
>>
>>
>>
>> BTW: Do you have any idea on: when next stable release with full fortran
>> support on Windows would be available?
>>
>> There is no plan yet.
>>
>>
>> Regards,
>> Shiqing
>>
>>
>>
>>
>> Thank you.
>> -Hiral
>>
>> On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan  wrote:
>>
>>
>>> Hi Hiral,
>>>
>>> 1.3.4 is quite old, please use the latest version. As Damien noted, the
>>> full fortran support is in 1.5 series, it's only experimental in 1.4 series.
>>> And there is only F77 bingdings on Windows, no F90 bindings. Another choice
>>> is to use the released binary installers to avoid compiling everything by
>>> yourself.
>>>
>>>
>>> Best Regards,
>>> Shiqing
>>>
>>> On 3/16/2011 11:47 AM, hi wrote:
>>>
>>>  Greetings!!!
>>>
>>>
>>>
>>> I am trying to build openmpi-1.3.4 and openmpi-1.4.3 on Windows 7 (64-bit
>>> OS), but getting some difficuty...
>>>
>>>
>>>
>>> My build environment:
>>>
>>> OS : Windows 7 (64-bit)
>>>
>>> C/C++ compiler : Visual Studio 2008 and Visual Studio 2010
>>>
>>> Fortran compiler: Intel "ifort"
>>>
>>>
>>>
>>> Approach: followed the "First Approach" described in README.WINDOWS file.
>>>
>>>
>>>
>>> *1) Using openmpi-1.3.4:***
>>>
>>> Observed build time error in version.cc(136). This error is related
>>> to getting SVN version information as described in
>>> http://www.open-mpi.org/community/lists/users/2010/01/11860.php. As we
>>> are using this openmpi-1.3.4 stable version on Linux platform, is there any
>>> fix to this compile time error?
>>>
>>>
>>>
>>> *2) Using openmpi-1.4.3:***
>>>
>>> Builds properly without F77/F90 support (i.e. i.e. Skipping MPI F77
>>> interface).
>>>
>>> Now to get the "mpif*.exe" for fortran programs, I provided proper
>>> "ifort" path and enabled "OMPI_WANT_F77_BINDINGS=ON" and/or
>>> OMPI_WANT_F90_BINDINGS=ON flag; but getting following errors...
>>>
>>> *   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave following
>>> errors... *
>>>
>>> Check ifort external symbol convention...
>>>
>>> Check ifort external symbol convention...single underscore
>>>
>>> Check if Fortran 77 compiler supports LOGICAL...
>>>
>>> Check if Fortran 77 compiler supports LOGICAL...done
>>>
>>> Check size of Fortran 77 LOGICAL...
>>>
>>> CMake Error at
>>> contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76 (MESSAGE):
>>>
>>> Could not determine size of LOGICAL.
>>>
>>> Call Stack (most recent call first):
>>>
>>> contrib/platform/win32/CMakeModules/f77_check.cmake:82
>>> (OMPI_F77_GET_SIZEOF)
>>>
>>> contrib/platform/win32/CMakeModules/ompi_configure.cmake:1123
>>> (OMPI_F77_CHECK)
>>>
>>> CMakeLists.txt:87 (INCLUDE)
>>>
>>> Configuring incomplete, errors occurred!
>>>
>>>
>>>
>>> *   2.b) "ifort" with OMPI_WANT_F90_BINDINGS=ON gave following
>>> errors... *
>>>
>>> Skipping MPI F77 interface
>>>
>>> CMake Error: File
>>> C:/openmpi-1.4.3/contrib/platform/win32/ConfigFiles/

[OMPI users] Comparison among OpenMPI, MPICH2 and MSPICH on Windows

2011-03-17 Thread hi
Hi,

Does anybody have idea about...
 Where I can get comparison of OpenMPI, MPICH2 and MSPICH? What they support
and
 Which one is performance wise better on Windows platform ?

Thank you in advance.
-Hiral


[OMPI users] gadget2 infiniband openmpi hang

2011-03-17 Thread Craig West
Hi,
I'm a system administrator trying to help users resolve gadget 2 code hangs
doing MPI_Sendrecv (similar to
http://www.open-mpi.org/community/lists/users/2010/05/13057.php).
I'm trying to determine appropriate values for mpool_rdma_rcache_size_limit
for our hardware, and to make sure RDMA settings are appropriate and do not
lead to data corruption (
http://www.open-mpi.org/faq/?category=openfabrics#setting-mpi-leave-pinned-1.3.2
).
The gadget code was running fine under openmpi 1.2.9 and the hangs showed up
in 1.4.3 (actually also 1.3.2).

code runs using tcp (-mca btl tcp,self,sm)

code hangs using infiniband

code runs using infiniband with "-mca btl_openib_flags 1" and "-mca
mpool_rdma_rcache_size_limit 209715200" (suggestion from poster from the
referenced link above)

Any suggestions would be appreciated.
Regards,
Gretchen
0. openmpi 1.4.3 (ompi_info attached, config.log is missing but may not be
needed as this is a more general usage/settings question)
1. OFED 1.4.2 from git.openfabrics.org
2. Debian 5.0, kernel 2.6.26-2-amd64
3. opensm-3.2.6
4. ibv_devinfo
hca_id:mlx4_0
fw_ver:2.6.000
node_guid:0002:c903:0002:848c
sys_image_guid:0002:c903:0002:848f
vendor_id:0x02c9
vendor_part_id:25408
hw_ver:0xA0
board_id:MT_04A0130005
phys_port_cnt:2
port:1
state:PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu:2048 (4)
sm_lid:30
port_lid:99
port_lmc:0x00

5. ifconfig
ib0   Link encap:UNSPEC  HWaddr
80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:10.16.10.20  Bcast:10.16.10.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c903:2:848d/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
  RX packets:1936 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:5 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:189055 (184.6 KiB)  TX bytes:0 (0.0 B)
6. unlimited
 Package: Open MPI xxx@xxx Distribution
Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
   Open RTE SVN revision: r23834
   Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
   OPAL SVN revision: r23834
   OPAL release date: Oct 05, 2010
Ident string: 1.4.3
  Prefix: /usr/local/openmpi-1.4.3
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: xxx
   Configured by: xxx
   Configured on: Tue Nov 30 16:24:27 EST 2010
  Configure host: xxx
Built by: xxx
Built on: Tue Nov 30 16:31:33 EST 2010
  Built host: xxx
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: yes
  Thread support: posix (mpi: no, progress: no)
   Sparse Groups: no
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.3)
  MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.3)
   MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.3)
   MCA carto: file (MCA v2.0, API v2.0, Component v1.4.3)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3)
   MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.3)
   MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.3)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.3)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.3)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.3)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.3)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.3)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.3)
  

Re: [OMPI users] Segmentation faults

2011-03-17 Thread Jeff Squyres
Sorry for the delay in replying.

You might want to run your code through a debugger, particularly a 
memory-checking debugger.  Your stack trace shows that it's segv'ing in main(), 
so it might not be that difficult to find.


On Mar 8, 2011, at 12:48 AM, arep isa wrote:

> Hi,
> I need to use Open MPI to distribute 2d-array in the PGM file among 10
> working computers. Then I need to manipulate each value of the array
> to get a negative image (255-i) and then print the output back. I'm
> thinking of using mpi_scatterv and mpi_gatherv to distribute the data.
> After i compile the program, it got segmentation faults. I dont know
> what is the problem whether my code wrong or compiler. I integrate the
> code to read/write pgm from pgm_RW_1.c and the MPI code in exmpi_2.c.
> 
> --I install OPEN MPI version 1.4.1-2 via Synaptic Package Manager on
> UBUNTU 10.04.
> 
> --I compile with:
>   mpicc -o exmpi_2 exmpi_2.c
> --I run for testing (segmentation faults):
>   mpirun -np 10 ./exmpi_2 2.pgm out.pgm
> --Then I run with hostfile:
>   mpirun -np 10 --hostfile .mpi_hostfile ./exmpi_2 2.pgm out.pgm
> 
> 
> Here is the error:
> 
> arep@ubuntu:~/Desktop/fyp$ mpirun -np 10 ./exmpi_2 2.pgm out.pgm
> [ubuntu:02948] *** Process received signal ***
> [ubuntu:02948] Signal: Segmentation fault (11)
> [ubuntu:02948] Signal code: Address not mapped (1)
> [ubuntu:02948] Failing at address: (nil)
> [ubuntu:02948] [ 0] [0x792410]
> [ubuntu:02948] [ 1] ./exmpi_2(main+0x1f6) [0x8048d2a]
> [ubuntu:02948] [ 2]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x126bd6]
> [ubuntu:02948] [ 3] ./exmpi_2() [0x8048aa1]
> [ubuntu:02948] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 2948 on node ubuntu exited
> on signal 11 (Segmentation fault).
> --
> 
> 
> Here is the input 2.pgm image :
> http://orion.math.iastate.edu/burkardt/data/pgm/balloons.pgm
> 
> TQ for your help.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-17 Thread Jeff Squyres
Sorry for the delayed reply.

I'm afraid I haven't done much with SE Linux -- I don't know if there are any 
"gotchas" that would show up there.  SE Linux support is not something we've 
gotten a lot of request for.  I doubt that anyone in the community has done 
much testing in this area.  :-\

I suspect that Open MPI is trying to access something that your user (under SE 
Linux) doesn't have permission to.  

So I'm afraid I don't have much of an answer for you -- sorry!  If you do 
figure it out, though, if a fix is not too intrusive, we can probably 
incorporate it upstream.


On Mar 4, 2011, at 7:31 AM, Youri LACAN-BARTLEY wrote:

> Hi,
>  
> This is my first post to this mailing-list so I apologize for maybe being a 
> little rough on the edges.
> I’ve been digging into OpenMPI for a little while now and have come across 
> one issue that I just can’t explain and I’m sincerely hoping someone can put 
> me on the right track here.
>  
> I’m using a fresh install of openmpi-1.2.7 and I systematically get a 
> segmentation fault at the end of my mpirun calls if I’m logged in as a 
> regular user.
> However, as soon as I switch to the root account, the segfault does not 
> appear.
> The jobs actually run to their term but I just can’t find a good reason for 
> this to be happening and I haven’t been able to reproduce the problem on 
> another machine.
>  
> Any help or tips would be greatly appreciated.
>  
> Thanks,
>  
> Youri LACAN-BARTLEY
>  
> Here’s an example running osu_latency locally (I’ve “blacklisted” openib to 
> make sure it’s not to blame):
>  
> [user@server ~]$ mpirun --mca btl ^openib  -np 2 
> /opt/scripts/osu_latency-openmpi-1.2.7
> # OSU MPI Latency Test v3.3
> # SizeLatency (us)
> 0 0.76
> 1 0.89
> 2 0.89
> 4 0.89
> 8 0.89
> 160.91
> 320.91
> 640.92
> 128   0.96
> 256   1.13
> 512   1.31
> 1024  1.69
> 2048  2.51
> 4096  5.34
> 8192  9.16
> 1638417.47
> 3276831.79
> 6553651.10
> 131072   92.41
> 262144  181.74
> 524288  512.26
> 10485761238.21
> 20971522280.28
> 41943044616.67
> [server:15586] *** Process received signal ***
> [server:15586] Signal: Segmentation fault (11)
> [server:15586] Signal code: Address not mapped (1)
> [server:15586] Failing at address: (nil)
> [server:15586] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15586] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15586] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15586] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) 
> [0x3cd120fe61]
> [server:15586] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15586] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15586] *** End of error message ***
> [server:15587] *** Process received signal ***
> [server:15587] Signal: Segmentation fault (11)
> [server:15587] Signal code: Address not mapped (1)
> [server:15587] Failing at address: (nil)
> [server:15587] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15587] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15587] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15587] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) 
> [0x3cd120fe61]
> [server:15587] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15587] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15587] *** End of error message ***
> mpirun noticed that job rank 0 with PID 15586 on node server exited on signal 
> 11 (Segmentation fault).
> 1 additional process aborted (not shown)
> [server:15583] *** Process received signal ***
> [server:15583] Signal: Segmentation fault (11)
> [server:15583] Signal code: Address not mapped (1)
> [server:15583] Failing at address: (nil)
> [server:15583] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15583] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15583] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15583] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) 
> [0x3cd120fe61]
> [server:15583] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15583] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15583] *** End of error message ***
> Segmentation fault
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.

2011-03-17 Thread Jeff Squyres
Sorry for the delayed reply.

Is there any chance you can upgrade to the latest version of Open MPI?

Also, I'm not an IPv6 expert -- could you try disabling IPv6?  (I can't tell 
offhand from your output whether it's enabled or disabled)

I say this because we *did* have a whacko problem on OS X regarding IPv6 (see 
http://blogs.cisco.com/performance/why_mpi_is_good_for_you/ and the linked Open 
MPI commit message for some details, if you care).  This fix was included in 
Open MPI 1.4.2 and the entire 1.5.x series.  If you can upgrade to 1.4.2, you 
may not need to change your IPv6 settings.


On Mar 5, 2011, at 12:43 AM,  
 wrote:

> Dear Open-mpi users,
> Currently we are running on 4 imacs 10.5.8 all identical and all on the same 
> network using MPI version 1.4.1.
> We get an error that we cannot seem to find any help on. 
> Sometimes we get the error Socket Connection (79)
> [30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] 
> send() failed: Socket is not connected (57)
> The strangest thing is the error only happens when we run with certain 
> machines in a certain order.
> 
> 
> ECHO $Path 
> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/texbin
> 
> mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher
> 
> The strange issues all dispatchers are able to send a one small message  to 
> each other before this error occurs.
> Does not work:
> mpirun -H juhu,hama -n 2 dispatcher
> mpirun -H hama,juhu -n 2 dispatcher
> mpirun -H hama,tuvalu -n 2 dispatcher
> mpirun -H juhu,tuvalu -n 2 dispatcher
> Works: 
> mpirun -H tuvalu,juhu -n 2 dispatcher
> mpirun -H tuvalu,hama -n 2 dispatcher
> 
> Dispatcher is a multithreaded application that sends messages to other 
> dispatchers.
> 
> 
> ifconfig output for machine 1 with the problem
> 
> lo0: flags=8049 mtu 16384
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
> inet 127.0.0.1 netmask 0xff00 
> inet6 ::1 prefixlen 128 
> gif0: flags=8010 mtu 1280
> stf0: flags=0<> mtu 1280
> fw0: flags=8863 mtu 4078
> lladdr 00:1f:f3:ff:fe:6e:5d:26 
> media: autoselect  status: inactive
> supported media: autoselect 
> en1: flags=8823 mtu 1500
> ether 00:1f:5b:c9:3b:8f 
> media: autoselect () status: inactive
> supported media: autoselect
> en0: flags=8863 mtu 1500
> inet 131.179.224.186 netmask 0xff00 broadcast 131.179.224.255
> ether 00:1f:f3:59:d2:3d 
> media: autoselect (100baseTX ) status: active
> supported media: autoselect 10baseT/UTP  10baseT/UTP 
>  10baseT/UTP  10baseT/UTP 
>  100baseTX  100baseTX  
> 100baseTX  100baseTX  
> 1000baseT  1000baseT  1000baseT 
>  none
> vmnet8: flags=8863 mtu 1500
> inet 172.16.181.1 netmask 0xff00 broadcast 172.16.181.255
> ether 00:50:56:c0:00:08 
> vmnet1: flags=8863 mtu 1500
> inet 172.16.32.1 netmask 0xff00 broadcast 172.16.32.255
> ether 00:50:56:c0:00:01 
> 
> ifconfig output for machine 2 with the problem
> 
> 
> lo0: flags=8049 mtu 16384
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
> inet 127.0.0.1 netmask 0xff00 
> inet6 ::1 prefixlen 128 
> gif0: flags=8010 mtu 1280
> stf0: flags=0<> mtu 1280
> fw0: flags=8863 mtu 4078
> lladdr 00:1f:5b:ff:fe:20:ae:1e 
> media: autoselect  status: inactive
> supported media: autoselect 
> en1: flags=8823 mtu 1500
> ether 00:1f:5b:c9:10:1d 
> media: autoselect () status: inactive
> supported media: autoselect
> en0: flags=8863 mtu 1500
> inet6 fe80::21e:c2ff:fe1a:c673%en0 prefixlen 64 scopeid 0x6 
> inet 131.179.224.185 netmask 0xff00 broadcast 131.179.224.255
> ether 00:1e:c2:1a:c6:73 
> media: autoselect (100baseTX ) status: active
> supported media: autoselect 10baseT/UTP  10baseT/UTP 
>  10baseT/UTP  10baseT/UTP 
>  100baseTX  100baseTX  
> 100baseTX  100baseTX  
> 1000baseT  1000baseT  1000baseT 
>  none
> vboxnet0: flags=8842 mtu 1500
> ether 0a:00:27:00:00:00 
> vmnet1: flags=8863 mtu 1500
> inet 192.168.138.1 netmask 0xff00 broadcast 192.168.138.255
> ether 00:50:56:c0:00:01 
> vmnet8: flags=8863 mtu 1500
> inet 192.168.56.1 netmask 0xff00 broadcast 192.168.56.255
> ether 00:50:56:c0:00:08 
> 
> 
> Thanks!
> Oren
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Potential bug in creating MPI_GROUP_EMPTY handling

2011-03-17 Thread Jeff Squyres
Sorry for the late reply, but many thanks for the bug report and reliable 
reproducer.

I've confirmed the problem and filed a bug about this:

 https://svn.open-mpi.org/trac/ompi/ticket/2752


On Mar 6, 2011, at 6:12 PM, Dominik Goeddeke wrote:

> The attached example code (stripped down from a bigger app) demonstrates a 
> way to trigger a severe crash in all recent ompi releases but not in a bunch 
> of latest MPICH2 releases. The code is minimalistic and boils down to the call
> 
> MPI_Comm_create(MPI_COMM_WORLD, MPI_GROUP_EMPTY, &dummy_comm);
> 
> which isn't supposed to be illegal. Please refer to the (well-documented) 
> code for details on the high-dimensional cross product I tested (on ubuntu 
> 10.04 LTS), a potential workaround (which isn't supposed to be necessary I 
> think) and an exemplary stack trace.
> 
> Instructions: mpicc test.c -Wall -O0 && mpirun -np 2 ./a.out
> 
> Thanks!
> 
> dom
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] gadget2 infiniband openmpi hang

2011-03-17 Thread Jeff Squyres
Are you able to run if you use --mca btl_openib_cpc_include rdmacm ?


On Mar 17, 2011, at 10:57 AM, Craig West wrote:

> Hi,
> I'm a system administrator trying to help users resolve gadget 2 code hangs 
> doing MPI_Sendrecv (similar to 
> http://www.open-mpi.org/community/lists/users/2010/05/13057.php).
> I'm trying to determine appropriate values for mpool_rdma_rcache_size_limit 
> for our hardware, and to make sure RDMA settings are appropriate and do not 
> lead to data corruption 
> (http://www.open-mpi.org/faq/?category=openfabrics#setting-mpi-leave-pinned-1.3.2).
> The gadget code was running fine under openmpi 1.2.9 and the hangs showed up 
> in 1.4.3 (actually also 1.3.2). 
> 
> code runs using tcp (-mca btl tcp,self,sm)
> 
> code hangs using infiniband 
> 
> code runs using infiniband with "-mca btl_openib_flags 1" and "-mca 
> mpool_rdma_rcache_size_limit 209715200" (suggestion from poster from the 
> referenced link above)
> 
> Any suggestions would be appreciated.
> Regards,
> Gretchen
> 0. openmpi 1.4.3 (ompi_info attached, config.log is missing but may not be 
> needed as this is a more general usage/settings question)
> 1. OFED 1.4.2 from git.openfabrics.org
> 2. Debian 5.0, kernel 2.6.26-2-amd64
> 3. opensm-3.2.6
> 4. ibv_devinfo
> hca_id:mlx4_0
> fw_ver:2.6.000
> node_guid:0002:c903:0002:848c
> sys_image_guid:0002:c903:0002:848f
> vendor_id:0x02c9
> vendor_part_id:25408
> hw_ver:0xA0
> board_id:MT_04A0130005
> phys_port_cnt:2
> port:1
> state:PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu:2048 (4)
> sm_lid:30
> port_lid:99
> port_lmc:0x00
> 
> 5. ifconfig
> ib0   Link encap:UNSPEC  HWaddr 
> 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
>   inet addr:10.16.10.20  Bcast:10.16.10.255  Mask:255.255.255.0
>   inet6 addr: fe80::202:c903:2:848d/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>   RX packets:1936 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:0 errors:0 dropped:5 overruns:0 carrier:0
>   collisions:0 txqueuelen:256 
>   RX bytes:189055 (184.6 KiB)  TX bytes:0 (0.0 B)
> 6. unlimited
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread vaibhav dutt
Hi,

I am trying to perform an experiment in which I can spawn 2 MPI processes,
one on each socket in a 4 core node
having 2 dual cores. I used the option  "bind to socket" which mpirun for
that but I am getting an error like:

An attempt was made to bind a process to a specific hardware topology
mapping (e.g., binding to a socket) but the operating system does not
support such topology-aware actions.  Talk to your local system
administrator to find out if your system can support topology-aware
functionality (e.g., Linux Kernels newer than v2.6.18).

Systems that do not support processor topology-aware functionality cannot
use "bind to socket" and other related functionality.


Can anybody please tell me what is this error about. Is there any other
option than "bind to socket"
that I can use.

Thanks.


Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread Ralph Castain
The error is telling you that your OS doesn't support queries telling us what 
cores are on which sockets, so we can't perform a "bind to socket" operation. 
You can probably still "bind to core", so if you know what cores are in which 
sockets, then you could use the rank_file mapper to assign processes to groups 
of cores in a socket.

It's just that we can't do it automatically because the OS won't give us the 
required info.

See "mpirun -h" for more info on slot lists.

On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:

> Hi,
> 
> I am trying to perform an experiment in which I can spawn 2 MPI processes, 
> one on each socket in a 4 core node
> having 2 dual cores. I used the option  "bind to socket" which mpirun for 
> that but I am getting an error like:
> 
> An attempt was made to bind a process to a specific hardware topology
> mapping (e.g., binding to a socket) but the operating system does not
> support such topology-aware actions.  Talk to your local system
> administrator to find out if your system can support topology-aware
> functionality (e.g., Linux Kernels newer than v2.6.18).
> 
> Systems that do not support processor topology-aware functionality cannot
> use "bind to socket" and other related functionality.
> 
> 
> Can anybody please tell me what is this error about. Is there any other 
> option than "bind to socket"
> that I can use.
> 
> Thanks.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread vaibhav dutt
Hi,

Thanks for your reply. I tried to execute first a process by using

mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1

but it gives the same as error as mentioned previously.

Then, I created a rankfile with contents"

rank 0=t1.tools.xxx  slot=0:0
rank 1=t1.tools.xxx  slot=1:0.

and the  used command

mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2

but ended  up getting same error. Is there any patch that I can install in
my system to make it
topology aware?

Thanks


On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain  wrote:

> The error is telling you that your OS doesn't support queries telling us
> what cores are on which sockets, so we can't perform a "bind to socket"
> operation. You can probably still "bind to core", so if you know what cores
> are in which sockets, then you could use the rank_file mapper to assign
> processes to groups of cores in a socket.
>
> It's just that we can't do it automatically because the OS won't give us
> the required info.
>
> See "mpirun -h" for more info on slot lists.
>
> On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:
>
> > Hi,
> >
> > I am trying to perform an experiment in which I can spawn 2 MPI
> processes, one on each socket in a 4 core node
> > having 2 dual cores. I used the option  "bind to socket" which mpirun for
> that but I am getting an error like:
> >
> > An attempt was made to bind a process to a specific hardware topology
> > mapping (e.g., binding to a socket) but the operating system does not
> > support such topology-aware actions.  Talk to your local system
> > administrator to find out if your system can support topology-aware
> > functionality (e.g., Linux Kernels newer than v2.6.18).
> >
> > Systems that do not support processor topology-aware functionality cannot
> > use "bind to socket" and other related functionality.
> >
> >
> > Can anybody please tell me what is this error about. Is there any other
> option than "bind to socket"
> > that I can use.
> >
> > Thanks.
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread Ralph Castain
What OS version is it?

uname -a

will tell you, if you are on linux.

On Mar 17, 2011, at 1:31 PM, vaibhav dutt wrote:

> Hi,
> 
> Thanks for your reply. I tried to execute first a process by using
> 
> mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1
> 
> but it gives the same as error as mentioned previously.
> 
> Then, I created a rankfile with contents"
> 
> rank 0=t1.tools.xxx  slot=0:0
> rank 1=t1.tools.xxx  slot=1:0.
> 
> and the  used command
> 
> mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2 
> 
> but ended  up getting same error. Is there any patch that I can install in my 
> system to make it
> topology aware?
> 
> Thanks
> 
> 
> On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain  wrote:
> The error is telling you that your OS doesn't support queries telling us what 
> cores are on which sockets, so we can't perform a "bind to socket" operation. 
> You can probably still "bind to core", so if you know what cores are in which 
> sockets, then you could use the rank_file mapper to assign processes to 
> groups of cores in a socket.
> 
> It's just that we can't do it automatically because the OS won't give us the 
> required info.
> 
> See "mpirun -h" for more info on slot lists.
> 
> On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:
> 
> > Hi,
> >
> > I am trying to perform an experiment in which I can spawn 2 MPI processes, 
> > one on each socket in a 4 core node
> > having 2 dual cores. I used the option  "bind to socket" which mpirun for 
> > that but I am getting an error like:
> >
> > An attempt was made to bind a process to a specific hardware topology
> > mapping (e.g., binding to a socket) but the operating system does not
> > support such topology-aware actions.  Talk to your local system
> > administrator to find out if your system can support topology-aware
> > functionality (e.g., Linux Kernels newer than v2.6.18).
> >
> > Systems that do not support processor topology-aware functionality cannot
> > use "bind to socket" and other related functionality.
> >
> >
> > Can anybody please tell me what is this error about. Is there any other 
> > option than "bind to socket"
> > that I can use.
> >
> > Thanks.
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread vaibhav dutt
2.6.9-78.0.17.ELpapismp #1 SMP Tue Apr 7 13:14:04 CDT 2009 x86_64 x86_64
x86_64 GNU/Linux


On Thu, Mar 17, 2011 at 2:55 PM, Ralph Castain  wrote:

> What OS version is it?
>
> uname -a
>
> will tell you, if you are on linux.
>
> On Mar 17, 2011, at 1:31 PM, vaibhav dutt wrote:
>
> Hi,
>
> Thanks for your reply. I tried to execute first a process by using
>
> mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1
>
> but it gives the same as error as mentioned previously.
>
> Then, I created a rankfile with contents"
>
> rank 0=t1.tools.xxx  slot=0:0
> rank 1=t1.tools.xxx  slot=1:0.
>
> and the  used command
>
> mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2
>
> but ended  up getting same error. Is there any patch that I can install in
> my system to make it
> topology aware?
>
> Thanks
>
>
> On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain  wrote:
>
>> The error is telling you that your OS doesn't support queries telling us
>> what cores are on which sockets, so we can't perform a "bind to socket"
>> operation. You can probably still "bind to core", so if you know what cores
>> are in which sockets, then you could use the rank_file mapper to assign
>> processes to groups of cores in a socket.
>>
>> It's just that we can't do it automatically because the OS won't give us
>> the required info.
>>
>> See "mpirun -h" for more info on slot lists.
>>
>> On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:
>>
>> > Hi,
>> >
>> > I am trying to perform an experiment in which I can spawn 2 MPI
>> processes, one on each socket in a 4 core node
>> > having 2 dual cores. I used the option  "bind to socket" which mpirun
>> for that but I am getting an error like:
>> >
>> > An attempt was made to bind a process to a specific hardware topology
>> > mapping (e.g., binding to a socket) but the operating system does not
>> > support such topology-aware actions.  Talk to your local system
>> > administrator to find out if your system can support topology-aware
>> > functionality (e.g., Linux Kernels newer than v2.6.18).
>> >
>> > Systems that do not support processor topology-aware functionality
>> cannot
>> > use "bind to socket" and other related functionality.
>> >
>> >
>> > Can anybody please tell me what is this error about. Is there any other
>> option than "bind to socket"
>> > that I can use.
>> >
>> > Thanks.
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-17 Thread Ralph Castain
That is an awfully old kernel, so it may indeed not have support for binding. 
I'd have to defer to someone more knowledgable to say for sure, but you might 
consider updating to something newer.


On Mar 17, 2011, at 1:57 PM, vaibhav dutt wrote:

> 2.6.9-78.0.17.ELpapismp #1 SMP Tue Apr 7 13:14:04 CDT 2009 x86_64 x86_64 
> x86_64 GNU/Linux
> 
> 
> On Thu, Mar 17, 2011 at 2:55 PM, Ralph Castain  wrote:
> What OS version is it?
> 
> uname -a
> 
> will tell you, if you are on linux.
> 
> On Mar 17, 2011, at 1:31 PM, vaibhav dutt wrote:
> 
>> Hi,
>> 
>> Thanks for your reply. I tried to execute first a process by using
>> 
>> mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1
>> 
>> but it gives the same as error as mentioned previously.
>> 
>> Then, I created a rankfile with contents"
>> 
>> rank 0=t1.tools.xxx  slot=0:0
>> rank 1=t1.tools.xxx  slot=1:0.
>> 
>> and the  used command
>> 
>> mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2 
>> 
>> but ended  up getting same error. Is there any patch that I can install in 
>> my system to make it
>> topology aware?
>> 
>> Thanks
>> 
>> 
>> On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain  wrote:
>> The error is telling you that your OS doesn't support queries telling us 
>> what cores are on which sockets, so we can't perform a "bind to socket" 
>> operation. You can probably still "bind to core", so if you know what cores 
>> are in which sockets, then you could use the rank_file mapper to assign 
>> processes to groups of cores in a socket.
>> 
>> It's just that we can't do it automatically because the OS won't give us the 
>> required info.
>> 
>> See "mpirun -h" for more info on slot lists.
>> 
>> On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:
>> 
>> > Hi,
>> >
>> > I am trying to perform an experiment in which I can spawn 2 MPI processes, 
>> > one on each socket in a 4 core node
>> > having 2 dual cores. I used the option  "bind to socket" which mpirun for 
>> > that but I am getting an error like:
>> >
>> > An attempt was made to bind a process to a specific hardware topology
>> > mapping (e.g., binding to a socket) but the operating system does not
>> > support such topology-aware actions.  Talk to your local system
>> > administrator to find out if your system can support topology-aware
>> > functionality (e.g., Linux Kernels newer than v2.6.18).
>> >
>> > Systems that do not support processor topology-aware functionality cannot
>> > use "bind to socket" and other related functionality.
>> >
>> >
>> > Can anybody please tell me what is this error about. Is there any other 
>> > option than "bind to socket"
>> > that I can use.
>> >
>> > Thanks.
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Building OpenMPI on Windows 7

2011-03-17 Thread Shiqing Fan

Hi Hiral,


> There is no f90 bindings at moment for Windows.
Any idea when this available?

At moment, no. But only if there is strong requirements.


Regards,
Shiqing


Thank you.
-Hiral
On Thu, Mar 17, 2011 at 5:21 PM, Shiqing Fan > wrote:




I tried building openmpi-1.5.2 on Windows 7 (as described below
environment) with OMPI_WANT_F77_BINDINGS_ON and
OMPI_WANT_F90_BINDINGS_ON using "ifort".
I observed that it has generated mpif77.exe but didn't generated
mpif90.exe, any idea???


There is no f90 bindings at moment for Windows.



BTW: while using above generated mpif77.exe to compile
hello_f77.f got following errors...

c:\openmpi-1.5.2\examples> mpif77 hello_f77.f
Intel(R) Visual Fortran Compiler Professional for
applications running on IA-32,
 Version 11.1Build 20100414 Package ID: w_cprof_p_11.1.065
Copyright (C) 1985-2010 Intel Corporation.  All rights reserved.
C:/openmpi-1.5.2/installed/include\mpif-config.h(91): error
#5082: Syntax error,
 found ')' when expecting one of: ( 
...
  parameter (MPI_STATUS_SIZE=)
-^
compilation aborted for hello_f77.f (code 1)


It seems MPI_STATUS_SIZE is not set. Could you please send me your
CMakeCache.txt to me off the mailing list, so that I can check
what is going wrong? A quick solution would be just set it to 0.


Regards,
Shiqing


Thank you.
-Hiral
On Wed, Mar 16, 2011 at 8:11 PM, Damien mailto:dam...@khubla.com>> wrote:

Hiral,
To add to Shiqing's comments, 1.5 has been running great for
me on Windows for over 6 months since it was in beta.  You
should give it a try.
Damien
On 16/03/2011 8:34 AM, Shiqing Fan wrote:

Hi Hiral,

> it's only experimental in 1.4 series. And there is only
F77 bingdings on Windows, no F90 bindings.
Can you please provide steps to build 1.4.3 with
experimental f77 bindings on Windows?

Well, I highly recommend to use 1.5 series, but I can also
take a look and probably provide you a patch for 1.4 .

BTW: Do you have any idea on: when next stable release with
full fortran support on Windows would be available?

There is no plan yet.
Regards,
Shiqing

Thank you.
-Hiral
On Wed, Mar 16, 2011 at 6:59 PM, Shiqing Fan mailto:f...@hlrs.de>> wrote:

Hi Hiral,
1.3.4 is quite old, please use the latest version. As
Damien noted, the full fortran support is in 1.5
series, it's only experimental in 1.4 series. And there
is only F77 bingdings on Windows, no F90 bindings.
Another choice is to use the released binary installers
to avoid compiling everything by yourself.
Best Regards,
Shiqing
On 3/16/2011 11:47 AM, hi wrote:


Greetings!!!

I am trying to build openmpi-1.3.4 and openmpi-1.4.3
on Windows 7 (64-bit OS), but getting some difficuty...

My build environment:

OS : Windows 7 (64-bit)

C/C++ compiler : Visual Studio 2008 and Visual Studio 2010

Fortran compiler: Intel "ifort"

Approach: followed the "First Approach" described in
README.WINDOWS file.

*1) Using openmpi-1.3.4:***

Observed build time error in version.cc(136). This
error is related to getting SVN version information as
described in
http://www.open-mpi.org/community/lists/users/2010/01/11860.php.
As we are using this openmpi-1.3.4 stable version on
Linux platform, is there any fix to this compile time
error?

*2) Using openmpi-1.4.3:***

Builds properly without F77/F90 support (i.e. i.e.
Skipping MPI F77 interface).

Now to get the "mpif*.exe" for fortran programs, I
provided proper "ifort" path and enabled
"OMPI_WANT_F77_BINDINGS=ON" and/or
OMPI_WANT_F90_BINDINGS=ON flag; but getting following
errors...

*   2.a) "ifort" with OMPI_WANT_F77_BINDINGS=ON gave
following errors... *

Check ifort external symbol convention...

Check ifort external symbol convention...single underscore

Check if Fortran 77 compiler supports LOGICAL...

Check if Fortran 77 compiler supports LOGICAL...done

Check size of Fortran 77 LOGICAL...

CMake Error at
contrib/platform/win32/CMakeModules/f77_get_sizeof.cmake:76
(MESSAGE):

Could not determine size of LOGICAL.

Call Stack (most recent call first):

contrib/platform/win32/CMakeMod

Re: [OMPI users] Potential bug in creating MPI_GROUP_EMPTY handling

2011-03-17 Thread Dominik Goeddeke
glad we could help and the two hours of stripping things down were 
effectively not wasted. Also good to hear (implicitly) that we were not 
too stupid to understand the MPI standard...


Since to the best of my understanding, our workaround is practically 
overhead-free, we went ahead and coded everything up analogously to the 
workaround, i.e. we don't rely on / wait for an immediate fix.


Please let us know if further information is needed.

Thanks,

dom

On 03/17/2011 05:10 PM, Jeff Squyres wrote:

Sorry for the late reply, but many thanks for the bug report and reliable 
reproducer.

I've confirmed the problem and filed a bug about this:

  https://svn.open-mpi.org/trac/ompi/ticket/2752


On Mar 6, 2011, at 6:12 PM, Dominik Goeddeke wrote:


The attached example code (stripped down from a bigger app) demonstrates a way 
to trigger a severe crash in all recent ompi releases but not in a bunch of 
latest MPICH2 releases. The code is minimalistic and boils down to the call

MPI_Comm_create(MPI_COMM_WORLD, MPI_GROUP_EMPTY,&dummy_comm);

which isn't supposed to be illegal. Please refer to the (well-documented) code 
for details on the high-dimensional cross product I tested (on ubuntu 10.04 
LTS), a potential workaround (which isn't supposed to be necessary I think) and 
an exemplary stack trace.

Instructions: mpicc test.c -Wall -O0&&  mpirun -np 2 ./a.out

Thanks!

dom


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Dr. Dominik Göddeke
Institut für Angewandte Mathematik
Technische Universität Dortmund
http://www.mathematik.tu-dortmund.de/~goeddeke
Tel. +49-(0)231-755-7218  Fax +49-(0)231-755-5933