[OMPI users] Windows CMake build problems ... (cont.)

2010-01-14 Thread cjohnson
The OpenMPI build problem I'm having occurs in both OpenMPI 1.4 and 1.3.4.I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual Studio for S/W Architects (release 8.0.50727.867).[That release has everything the platform SDK would have.]I'm using CMake 2.8 to generate code, I used it correctly, pointing at the root directory where the makelists are located for the source side and to an empty directory for the build side: did configure, I did not click debug this time as suggested by Shiqing, configure again, generate and opened the OpenMPI.sln file created by CMake. Then I right-clicked on the "ALL_BUILD" project and selected "build". Then did one "rebuild", just in case build order might get one more success (which it seemed to, but I could not find).2 projects built, 12 did not. I have the build listing. [I'm afraid of what the mailing list server would do if I attached it to this email.]All the compiles were successful (warnings at most.) All the errors were were from linking the VC projects:1>libopen-pal - 0 error(s), 9 warning(s)3>opal-restart - 32 error(s), 0 warning(s)4>opal-wrapper - 21 error(s), 0 warning(s)2>libopen-rte - 749 error(s), 7 warning(s)5>orte-checkpoint - 32 error(s), 0 warning(s)7>orte-ps - 28 error(s), 0 warning(s)8>orted - 2 error(s), 0 warning(s)9>orte-clean - 13 error(s), 0 warning(s)10>orterun - 100 error(s), 3 warning(s)6>libmpi - 2133 error(s), 42 warning(s)12>ompi-server - 27 error(s), 0 warning(s)11>ompi_info - 146 error(s), 0 warning(s)13>libmpi_cxx - 456 error(s), 61 warning(s)== Rebuild All: 2 succeeded, 12 failed, 0 skipped ==It said that 2 succeeded, I could not find the second build success in the listing.However, everything did compile, and thank you Shiqing !Here is the listing for the first failed link, on "opal-restart":3>-- Rebuild All started: Project: opal-restart, Configuration: Debug Win32 --3>Deleting intermediate and output files for project 'opal-restart', configuration 'Debug|Win32'3>Compiling...3>opal-restart.c2>Compiling...2>snapc_base_select.c3>Compiling manifest to resources...3>Linking...2>snapc_base_open.c3>opal-restart.obj : error LNK2001: unresolved external symbol __imp__opal_crs3>opal-restart.obj : error LNK2001: unresolved external symbol __imp__opal_crs_base_snapshot_t_class3>opal-restart.obj : error LNK2001: unresolved external symbol __imp__opal_crs_base_selected_component3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_crs_base_select referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_crs_base_open referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_output_verbose referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_crs_base_extract_expected_component referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_crs_base_get_snapshot_directory referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_setenv referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__mca_base_param_env_var referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_show_help referenced in function _main3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_class_initialize referenced in function "struct opal_object_t * __cdecl opal_obj_new(struct opal_class_t *)" (?opal_obj_new@@YAPAUopal_object_t@@PAUopal_class_t@@@Z)3>opal-restart.obj : error LNK2001: unresolved external symbol __imp__opal_cr_is_tool3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_init referenced in function "int __cdecl initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_output_set_verbosity referenced in function "int __cdecl initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_output_open referenced in function "int __cdecl initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_init_util referenced in function "int __cdecl initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_finalize referenced in function "int __cdecl finalize(void)" (?finalize@@YAHXZ)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_argv_join referenced in function "int __cdecl parse_args(int,char * * const)" (?parse_args@@YAHHQAPAD@Z)3>opal-restart.obj : error LNK2019: unresolved external symbol __imp__opal_cmd_line_get_tail referenced in function "int __cdecl parse_args(int,char * * const)" (?parse_arg

[OMPI users] OpenMPI checkpoint/restart

2010-01-14 Thread Andreea Costea
Hei there

I have some questions regarding checkpoint/restart:

1. Until recently I thought that ompi-restart and ompi-restart are used to
checkpoint a process inside an MPI application. Now I reread
thisand I
realized that actually what it does is to checkpoint the mpirun
process. Does this mean that if I run my application with multiple processes
and on multiple nodes in my network the checkpoint file will contain the
states of all the processes of my MPI application?

2. Can I restart the application on a different node?

Thanks a lot,
Andreea


Re: [OMPI users] OpenMPI less fast than MPICH

2010-01-14 Thread Mathieu Gontier





Thank you very much to react.
stdout/stderr (or the fortran equivalents) are indeed used to follow
the progression, but during my bench, they are directed in a file (2>$1
| tee log). But I do not understand how it can influence OpenMPI?

Aaron Knister wrote:

  Does your application do a lot of printing to stdout/stderr?
  
  
  On Jan 11, 2010, at 8:00 AM, Mathieu Gontier wrote:
  
  
Hi all

I want to migrate my CFD application from MPICH-1.2.4 (ch_p4 device) to
OpenMPI-1.4. Hence, I compared the two libraries compiled with my
application and I noted OpenMPI is less efficient thant MPICH on
ethernet (170min with MPICH against 200min with OpenMPI). So, I wonder
if someone has more information/explanation.

Here the configure options of OpenMPI:

export FC=gfortran
export F77=$FC
export CC=gcc
export PREFIX=/usr/local/bin/openmpi-1.4
./configure --prefix=$PREFIX --enable-cxx-exceptions --enable-mpi-f77
--enable-mpi-f90 --enable-mpi-cxx --enable-mpi-cxx-seek --enable-dist
--enable-mpi-profile --enable-binaries --enable-cxx-exceptions
--enable-mpi-threads --enable-memchecker --with-pic --with-threads
--with-valgrind --with-libnuma --with-openib

Despite my OpenMPI compilation supports OpenIB, I did not specified any
mca/btl options because the machine does not have access to a
Infiniband interconnect. So, I guess tcp, sm and self are used (or at
least something close).

Thank you for your help.
Mathieu.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  
  
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-14 Thread Shiqing Fan


Hi Charlie,

It seems that the dependencies among the projects are corrupted. Don't 
know how this could happen, CMake build system should set them 
automatically.


Take the opal-restart as an example, if you right click on that project, 
and select "Project Dependencies", you will see a list of projects that 
it depends on, in this case opal-restart depends on libopen-pal project. 
I tried to dis-select the dependency for opal-restart, and I can get 
exactly the same linking errors that you got.


I've updated my CMake to 2.8, and I didn't see the problem either. So 
probably you should do a fresh build and check if the dependencies are 
correct. If not, there might be something wrong, and I'd like to help 
you. Thanks.



Best Regards,
Shiqing


cjohn...@valverdecomputing.com wrote:

The OpenMPI build problem I'm having occurs in both OpenMPI 1.4 and 1.3.4.

I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with 
Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual Studio for 
S/W Architects (release 8.0.50727.867).


[That release has everything the platform SDK would have.]

I'm using CMake 2.8 to generate code, I used it correctly, pointing at 
the root directory where the makelists are located for the source side 
and to an empty directory for the build side: did configure, _*I did 
not click debug this time as suggested by Shiqing*_, configure again, 
generate and opened the OpenMPI.sln file created by CMake. Then I 
right-clicked on the "ALL_BUILD" project and selected "build". Then 
did one "rebuild", just in case build order might get one more success 
(which it seemed to, but I could not find).


2 projects built, 12 did not. I have the build listing. [I'm afraid of 
what the mailing list server would do if I attached it to this email.]


All the compiles were successful (warnings at most.) All the errors 
were were from linking the VC projects:


*1>libopen-pal - 0 error(s), 9 warning(s)*
3>opal-restart - 32 error(s), 0 warning(s)
4>opal-wrapper - 21 error(s), 0 warning(s)
2>libopen-rte - 749 error(s), 7 warning(s)
5>orte-checkpoint - 32 error(s), 0 warning(s)
7>orte-ps - 28 error(s), 0 warning(s)
8>orted - 2 error(s), 0 warning(s)
9>orte-clean - 13 error(s), 0 warning(s)
10>orterun - 100 error(s), 3 warning(s)
6>libmpi - 2133 error(s), 42 warning(s)
12>ompi-server - 27 error(s), 0 warning(s)
11>ompi_info - 146 error(s), 0 warning(s)
13>libmpi_cxx - 456 error(s), 61 warning(s)
== Rebuild All: 2 succeeded, 12 failed, 0 skipped ==

It said that 2 succeeded, I could not find the second build success in 
the listing.


*However, everything did compile, and thank you Shiqing !*

Here is the listing for the first failed link, on "opal-restart":

3>-- Rebuild All started: Project: opal-restart, Configuration: 
Debug Win32 --
3>Deleting intermediate and output files for project 'opal-restart', 
configuration 'Debug|Win32'

3>Compiling...
3>opal-restart.c
2>Compiling...
2>snapc_base_select.c
3>Compiling manifest to resources...
3>Linking...
2>snapc_base_open.c
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_snapshot_t_class
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_selected_component
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_select referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_open referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_verbose referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_extract_expected_component referenced in function 
_main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_get_snapshot_directory referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_setenv referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__mca_base_param_env_var referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_show_help referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_class_initialize referenced in function "struct 
opal_object_t * __cdecl opal_obj_new(struct opal_class_t *)" 
(?opal_obj_new@@YAPAUopal_object_t@@PAUopal_class_t@@@Z)
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_cr_is_tool
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_init referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_set_verbosity referenced in function "int __cdecl 
initialize(int,char

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-14 Thread Shiqing Fan

Hi Charlie,

Could you also try to use CMake 2.6.4 and see what happens? I found 
there might be some incompatibility between CMake 2.8 and 2.6 series.



Thanks,
Shiqing


cjohn...@valverdecomputing.com wrote:

The OpenMPI build problem I'm having occurs in both OpenMPI 1.4 and 1.3.4.

I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with 
Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual Studio for 
S/W Architects (release 8.0.50727.867).


[That release has everything the platform SDK would have.]

I'm using CMake 2.8 to generate code, I used it correctly, pointing at 
the root directory where the makelists are located for the source side 
and to an empty directory for the build side: did configure, _*I did 
not click debug this time as suggested by Shiqing*_, configure again, 
generate and opened the OpenMPI.sln file created by CMake. Then I 
right-clicked on the "ALL_BUILD" project and selected "build". Then 
did one "rebuild", just in case build order might get one more success 
(which it seemed to, but I could not find).


2 projects built, 12 did not. I have the build listing. [I'm afraid of 
what the mailing list server would do if I attached it to this email.]


All the compiles were successful (warnings at most.) All the errors 
were were from linking the VC projects:


*1>libopen-pal - 0 error(s), 9 warning(s)*
3>opal-restart - 32 error(s), 0 warning(s)
4>opal-wrapper - 21 error(s), 0 warning(s)
2>libopen-rte - 749 error(s), 7 warning(s)
5>orte-checkpoint - 32 error(s), 0 warning(s)
7>orte-ps - 28 error(s), 0 warning(s)
8>orted - 2 error(s), 0 warning(s)
9>orte-clean - 13 error(s), 0 warning(s)
10>orterun - 100 error(s), 3 warning(s)
6>libmpi - 2133 error(s), 42 warning(s)
12>ompi-server - 27 error(s), 0 warning(s)
11>ompi_info - 146 error(s), 0 warning(s)
13>libmpi_cxx - 456 error(s), 61 warning(s)
== Rebuild All: 2 succeeded, 12 failed, 0 skipped ==

It said that 2 succeeded, I could not find the second build success in 
the listing.


*However, everything did compile, and thank you Shiqing !*

Here is the listing for the first failed link, on "opal-restart":

3>-- Rebuild All started: Project: opal-restart, Configuration: 
Debug Win32 --
3>Deleting intermediate and output files for project 'opal-restart', 
configuration 'Debug|Win32'

3>Compiling...
3>opal-restart.c
2>Compiling...
2>snapc_base_select.c
3>Compiling manifest to resources...
3>Linking...
2>snapc_base_open.c
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_snapshot_t_class
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_selected_component
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_select referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_open referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_verbose referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_extract_expected_component referenced in function 
_main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_get_snapshot_directory referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_setenv referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__mca_base_param_env_var referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_show_help referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_class_initialize referenced in function "struct 
opal_object_t * __cdecl opal_obj_new(struct opal_class_t *)" 
(?opal_obj_new@@YAPAUopal_object_t@@PAUopal_class_t@@@Z)
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_cr_is_tool
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_init referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_set_verbosity referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_open referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_init_util referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_finalize referenced in function "int __cdecl 
finalize(void)" (?finalize@@YAHXZ)
3>opal-restart.obj : err

[OMPI users] Checkpoint/Restart error

2010-01-14 Thread Andreea Costea
Hi,

I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
downloaded today. When I want to checkpoint I am having the following error
message:
[[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
399
HNP with PID 2337 Not found!

I tried the same thing with version 1.3.3 and it works perfectly.

Any idea why?

thanks,
Andreea


Re: [OMPI users] OpenMPI checkpoint/restart

2010-01-14 Thread Joshua Hursey

On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote:

> Hei there
> 
> I have some questions regarding checkpoint/restart:
> 
> 1. Until recently I thought that ompi-restart and ompi-restart are used to 
> checkpoint a process inside an MPI application. Now I reread this and I 
> realized that actually what it does is to checkpoint the mpirun process. Does 
> this mean that if I run my application with multiple processes and on 
> multiple nodes in my network the checkpoint file will contain the states of 
> all the processes of my MPI application?

I think you slightly misread the entry. ompi-checkpoint checkpoints the entire 
MPI application, across node boundaries. It requires that the user pass the PID 
of mpirun to server as a reference point for the command. This way a user can 
run multiple mpiruns from the same machine and only checkpoint a subset of 
those.

> 2. Can I restart the application on a different node? 

Yes. If you have trouble doing this, then I would suggest following the 
directions in the BLCR FAQ entry below (it usually addressed 99% of the 
problems people have doing this):
  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink

-- Josh

> 
> Thanks a lot,
> Andreea
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Checkpoint/Restart error

2010-01-14 Thread Joshua Hursey
On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote:

> Hi,
> 
> I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have 
> downloaded today. When I want to checkpoint I am having the following error 
> message:
> [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 399
> HNP with PID 2337 Not found! 

This looks like an error coming from the 1.3.3 install. In 1.4.1 there is no 
error at line 399, in 1.3.3 there is. Check your installation of Open MPI, I 
bet you are mixing 1.4.1 and 1.3.3, which can cause unexpected problems.

Try a clean installation of 1.4.1 and double check that 1.3.3 is not in your 
path/lib_path any longer.

-- Josh

> 
> I tried the same thing with version 1.3.3 and it works perfectly.
> 
> Any idea why?
> 
> thanks,
> Andreea
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] configure script fails - fixed?

2010-01-14 Thread von Tycowicz, Christoph

Hi,

I just compiled the most recent version of the GNU compilers (4.4.2).
It turned out that running the configure script succeeds providing:
./configure --prefix=/opt/openmpi F77=/opt/gcc/lib/gcc4.4/bin/gfortran 
FC=/opt/gcc/lib/gcc4.4/bin/gfortran

Note that I did not changed the c/c++ compilers - as soon as I changed them to 
the new /opt/gcc/lib/gcc4.4/bin/gcc the script would declare them as broked, 
just as it did with gfortran.
(This is also true if I provide the shipped 4.2.1 version of gcc.)

So far everything seams to work but i think this solution is rather kludgy.

Hope this helps fixing this problem.

Cheers
Christoph

Begin forwarded message:

From: "von Tycowicz, Christoph" 
mailto:christoph.vontycow...@fu-berlin.de>>
List-Post: users@lists.open-mpi.org
Date: 14. Januar 2010 01:09:10 MEZ
To: "us...@open-mpi.org" 
mailto:us...@open-mpi.org>>
Subject: configure script fails

Hi,

when running the configure script it breaks with:
configure: error: Could not run a simple Fortran 77 program.  Aborting.
(logs with details attached)

I don't know how to interpret this error since I already successfully compiled 
fortran code using these compilers(gcc/gfortran 4.5).
If would be really grateful for any clues on this.

best regards
Christoph



logs.tar.gz
Description: logs.tar.gz


Re: [OMPI users] configure script fails - fixed?

2010-01-14 Thread Jeff Squyres
On Jan 14, 2010, at 11:22 AM, von Tycowicz, Christoph wrote:

> I just compiled the most recent version of the GNU compilers (4.4.2).
> It turned out that running the configure script succeeds providing:
> ./configure --prefix=/opt/openmpi F77=/opt/gcc/lib/gcc4.4/bin/gfortran 
> FC=/opt/gcc/lib/gcc4.4/bin/gfortran
> 
> Note that I did not changed the c/c++ compilers - as soon as I changed them 
> to the new /opt/gcc/lib/gcc4.4/bin/gcc the script would declare them as 
> broked, just as it did with gfortran.
> (This is also true if I provide the shipped 4.2.1 version of gcc.)
> 
> So far everything seams to work but i think this solution is rather kludgy.

I'm not sure what to tell you -- OMPI's configure script was simply trying to 
compile a trivial Fortran program just to verify that the compiler works (we 
had so many users with borked Fortran compilers that we put this test in 
configure).  Here's the detailed part from config.log:

-
configure:35493: checking if Fortran 77 compiler works
configure:35553: gfortran -o conftest   conftest.f  >&5
Undefined symbols:
  "__gfortran_set_options", referenced from:
  _main in cccpfkqB.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
configure:35557: $? = 1
configure: program exited with status 1
configure: failed program was:
|   program main
| 
|   end
configure:35590: result: no
configure:35604: error: Could not run a simple Fortran 77 program.  Aborting.
-

That is, the contents of conftest.f were

   program main

   end

And trying to compile that with your previous gfortran didn't work (see the 
linker error above).  I'm not sure why your other programs compiled fine, but 
something caused OMPI's configure script to fail compiling/linking this trivial 
Fortran program.  That was the cause of the problem.

-- 
Jeff Squyres
jsquy...@cisco.com




[OMPI users] More NetBSD fixes

2010-01-14 Thread Aleksej Saushev
  Hello!

Flawed logic: Linux isn't the only system with procfs.
NetBSD has procfs too and may have /proc/cpuinfo as well,
but it isn't Linux.
I didn't check if FreeBSD has "cpuinfo" as well,
FreeBSD 6.3 doesn't but it's being desupported soon.

Difference against openmpi-1.5a1r22193 snapshot.

--- opal/mca/pstat/linux/configure.m4.orig  2009-11-04 17:57:36.0 
+0300
+++ opal/mca/pstat/linux/configure.m4   2010-01-14 02:16:08.0 +0300
@@ -23,7 +23,7 @@
 AC_DEFUN([MCA_pstat_linux_CONFIG],[

case "${host}" in
-   i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*)
+   
i?86-*linux*|x86_64*linux*|ia64-*linux*|powerpc-*linux*|powerpc64-*linux*|sparc*-*linux*)
   AS_IF([test -r "/proc/cpuinfo"],
  [pstat_linux_happy="yes"],
  [pstat_linux_happy="no"])


-- 
HE CE3OH...


Re: [OMPI users] Problems Using PVFS2 with OpenMPI

2010-01-14 Thread Evan Smyth
I had been using an older variant of the needed flag for building romio 
(because the newer one was failing as the preceding suggests). I made this 
change and built with the correct romio flag. I next need to fix the ways pvfs2 
build so that is uses -fPIC. Interestingly, about 95% of pvfs2 builds with this 
flag by default but the final 5% does not. It needs to. With that fixed, built 
and installed, I was able to rebuild openmpi correctly. My test program now 
works like a charm. I will give the *precise* steps I needed to build pvfs2 
2.8.1 with openmpi 1.4 here for the record...


1. Determine where openmpi will be installed. I'm not certain that it needs to 
actually be installed there for this to work. If so, you will need to install 
openmpi twice. The first time, it clearly need not be built entirely correctly 
for pvfs2 (it can't be because setp 2 is a prerequisite for that) but probably 
building something without the "--with-io-romio-flags=..." should do if this 
actually must be installed at all. I'm betting it is not required but as I say, 
I have not verified this. It certainly works if it has been pre-installed as I 
just indicated.


2. Build pvfs2 correctly (I get conflicting info on whether the 
"--with-mpi=..." is needed but FWIW, this is how I built it and it installs 
into /usr/local which is it's default location...


cd 
setenv CFLAGS -fPIC
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \ 
--enable-verbose-build

make all

make install
exit

3. Build openmpi correctly. This is straightforward at this point. Also, the 
--enable-mpi-threads is not required for pvfs2 to work but I happen to also 
want this flag


cd 

./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \ 
--enable-mpi-threads --with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs"

make all

make install
exit

... and that's it. Hopefully, the next person who needs to figure this out will 
be helped by these instructions.


Evan

This seems to have done the trick.

Edgar Gabriel wrote:
I don't know whether its relevant for this problem or not, but a couple 
of weeks ago we also found that we had to apply the following patch to 
to compile ROMIO with OpenMPI over pvfs2. There is an additional header 
pvfs2-compat.h included in the ROMIO version of MPICH, but is somehow 
missing in the OpenMPI version


ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h
--- a/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Thu Sep 03
11:55:51 2009 -0500
+++ b/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Mon Sep 21
10:16:27 2009 -0500
@@ -11,6 +11,10 @@
  #include "adio.h"
  #ifdef HAVE_PVFS2_H
  #include "pvfs2.h"
+#endif
+
+#ifdef PVFS2_VERSION_MAJOR
+#include "pvfs2-compat.h"
  #endif


Thanks
Edgar


Rob Latham wrote:

On Tue, Jan 12, 2010 at 02:15:54PM -0800, Evan Smyth wrote:

OpenMPI 1.4 (had same issue with 1.3.3) is configured with
./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \
--enable-mpi-threads --with-io-romio-flags="--with-filesystems=pvfs2+ufs+nfs"
PVFS 2.8.1 is configured to install in the default location (/usr/local) with
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs

In addition to Jeff's request for the build logs, do you have
'pvfs2-config' in your path?   
 

I build and install these (in this order) and setup my PVFS2 space using
instructions at pvfs.org. I am able to use this space using the
/usr/local/bin/pvfs2-ls types of commands. I am simply running a 2-server
config (2 data servers and the same 2 hosts are metadata servers). As I say,
manually, this all seems fine (even when I'm not root). It may be
relevant that I am *not* using the kernel interface for PVFS2 as I
am just trying to get a
better understanding of how this works.

That's a good piece of information.  I run in that configuration
often, so we should be able to make this work.


It is perhaps relevant that I have not had to explicitly tell
OpenMPI where I installed PVFS. I have told PVFS where I installed
OpenMPI, though. This does seem slightly odd but there does not
appear to be a way of telling OpenMPI this information. Perhaps it
is not needed.

PVFS needs an MPI library only to build MPI-based testcases.  The
servers, client libraries, and utilities do not use MPI.


In any event, I then build my test program against this OpenMPI and
in that program I have the following call sequence (i is 0 and where
mntPoint is the path to my pvfs2 mount point -- I also tried
prefixing a "pvfs2:" in the front of this as I read somewhere that
that was optional).

In this case, since you do not have the PVFS file system mounted, the
'pvfs2:' prefix is mandatory.  Otherwise, the MPI-IO library will try
to look for a directory that does not exist.


Which will only execute on one of my ranks (the way I'm running it).
No matter what I try, the MPI_File_open call fails with an
MPI_ERR_ACCESS error code.  This suggests a permission problem

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-14 Thread Shiqing Fan


Hi Charlie,

The problem turns out to be the different behavior of one CMake macro in 
different version of CMake. And it's fixed in Open MPI trunk with 
r22405. I also created a ticket to move the fix over to 1.4 branch, see 
#2169: https://svn.open-mpi.org/trac/ompi/ticket/2169 .


So you could either switch to use OMPI trunk or use CMake 2.6 to solve 
the problem. Thanks a lot.



Best Regards,
Shiqing


cjohn...@valverdecomputing.com wrote:

The OpenMPI build problem I'm having occurs in both OpenMPI 1.4 and 1.3.4.

I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with 
Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual Studio for 
S/W Architects (release 8.0.50727.867).


[That release has everything the platform SDK would have.]

I'm using CMake 2.8 to generate code, I used it correctly, pointing at 
the root directory where the makelists are located for the source side 
and to an empty directory for the build side: did configure, _*I did 
not click debug this time as suggested by Shiqing*_, configure again, 
generate and opened the OpenMPI.sln file created by CMake. Then I 
right-clicked on the "ALL_BUILD" project and selected "build". Then 
did one "rebuild", just in case build order might get one more success 
(which it seemed to, but I could not find).


2 projects built, 12 did not. I have the build listing. [I'm afraid of 
what the mailing list server would do if I attached it to this email.]


All the compiles were successful (warnings at most.) All the errors 
were were from linking the VC projects:


*1>libopen-pal - 0 error(s), 9 warning(s)*
3>opal-restart - 32 error(s), 0 warning(s)
4>opal-wrapper - 21 error(s), 0 warning(s)
2>libopen-rte - 749 error(s), 7 warning(s)
5>orte-checkpoint - 32 error(s), 0 warning(s)
7>orte-ps - 28 error(s), 0 warning(s)
8>orted - 2 error(s), 0 warning(s)
9>orte-clean - 13 error(s), 0 warning(s)
10>orterun - 100 error(s), 3 warning(s)
6>libmpi - 2133 error(s), 42 warning(s)
12>ompi-server - 27 error(s), 0 warning(s)
11>ompi_info - 146 error(s), 0 warning(s)
13>libmpi_cxx - 456 error(s), 61 warning(s)
== Rebuild All: 2 succeeded, 12 failed, 0 skipped ==

It said that 2 succeeded, I could not find the second build success in 
the listing.


*However, everything did compile, and thank you Shiqing !*

Here is the listing for the first failed link, on "opal-restart":

3>-- Rebuild All started: Project: opal-restart, Configuration: 
Debug Win32 --
3>Deleting intermediate and output files for project 'opal-restart', 
configuration 'Debug|Win32'

3>Compiling...
3>opal-restart.c
2>Compiling...
2>snapc_base_select.c
3>Compiling manifest to resources...
3>Linking...
2>snapc_base_open.c
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_snapshot_t_class
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_crs_base_selected_component
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_select referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_open referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_verbose referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_extract_expected_component referenced in function 
_main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_crs_base_get_snapshot_directory referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_setenv referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__mca_base_param_env_var referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_show_help referenced in function _main
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_class_initialize referenced in function "struct 
opal_object_t * __cdecl opal_obj_new(struct opal_class_t *)" 
(?opal_obj_new@@YAPAUopal_object_t@@PAUopal_class_t@@@Z)
3>opal-restart.obj : error LNK2001: unresolved external symbol 
__imp__opal_cr_is_tool
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_init referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_set_verbosity referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_output_open referenced in function "int __cdecl 
initialize(int,char * * const)" (?initialize@@YAHHQAPAD@Z)
3>opal-restart.obj : error LNK2019: unresolved external symbol 
__imp__opal_init_util referenced in function "int __cdecl 
i

[OMPI users] Rapid I/O support

2010-01-14 Thread TONY BASIL
Hi,
I am doing a project with an HPC set up on multicore Power PC..Nodes will be
connected
using Rapid I/O instead for Gigabit Ethernet...I would like to know if
OpenMPI supports
Rapid I/O...If not is there any alternative other than Gigabit
ethernet...The network traffic will be
huge as data comes from a sensor...so a faster alternative is preferred...
Thank you
Tony Basil


[OMPI users] Setting MCA parameter from within program

2010-01-14 Thread Roland Schulz
Hi all,

is it possible to set MCA parameters from within the MPI program itself? The
FAQ only mentions how to set them through files or environment variables.

I would like to set coll_tuned_use_dynamic_rules and
coll_tuned_alltoall_algorithm.

I assume there is a function to do this  in include/opal/mca but I'm not
sure which one. And I couldn't find any documentation explaining it.
I'm aware that this will only work with OpenMPI and is probably not part of
the public api thus the interface might change between version.

My plan is to benchmark all alltoall algorithms at start up and then use
this algorithm for all later AllToAll calls. I have benchmarked that
manually choosing the algorithm can make a large  difference. Also all my
alltoall communication is of the same data size thus tuning is easy.

Thanks

> Roland
>

-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309


Re: [OMPI users] Rapid I/O support

2010-01-14 Thread Jeff Squyres
On Jan 14, 2010, at 1:59 PM, TONY BASIL wrote:

> I am doing a project with an HPC set up on multicore Power PC..Nodes will be 
> connected
> using Rapid I/O instead for Gigabit Ethernet...I would like to know if 
> OpenMPI supports
> Rapid I/O...

I'm afraid not.  Before your post, I had never heard of Rapid IO.

That being said, Open MPI is based on plugins, so someone could write a plugin 
for Radio IO support, if they were so inclined.

> If not is there any alternative other than Gigabit ethernet...The network 
> traffic will be
> huge as data comes from a sensor...so a faster alternative is preferred...

Open MPI supports a wide variety of networks:

- OpenFabrics: InfiniBand and iWARP
- Myrinet: GM and MX
- Qlogic PSM
- Portals
- Quadrics Elan
- Shared memory
- TCP
- SCTP
- uDAPL
- Loopback (send-to-self)

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] More NetBSD fixes

2010-01-14 Thread Jeff Squyres
Thanks!  I have this queued up to commit in a few hours (we try not to commit 
autogen/configure-worthy changes during the US workday).


On Jan 14, 2010, at 11:54 AM, Aleksej Saushev wrote:

> 
>   Hello!
> 
> Flawed logic: Linux isn't the only system with procfs.
> NetBSD has procfs too and may have /proc/cpuinfo as well,
> but it isn't Linux.
> I didn't check if FreeBSD has "cpuinfo" as well,
> FreeBSD 6.3 doesn't but it's being desupported soon.
> 
> Difference against openmpi-1.5a1r22193 snapshot.
> 
> --- opal/mca/pstat/linux/configure.m4.orig  2009-11-04 17:57:36.0 
> +0300
> +++ opal/mca/pstat/linux/configure.m4   2010-01-14 02:16:08.0 +0300
> @@ -23,7 +23,7 @@
>  AC_DEFUN([MCA_pstat_linux_CONFIG],[
> 
> case "${host}" in
> -   i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*)
> +   
> i?86-*linux*|x86_64*linux*|ia64-*linux*|powerpc-*linux*|powerpc64-*linux*|sparc*-*linux*)
>AS_IF([test -r "/proc/cpuinfo"],
>   [pstat_linux_happy="yes"],
>   [pstat_linux_happy="no"])
> 
> 
> --
> HE CE3OH...
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] Setting MCA parameter from within program

2010-01-14 Thread Jeff Squyres
On Jan 14, 2010, at 3:08 PM, Roland Schulz wrote:

> is it possible to set MCA parameters from within the MPI program itself? The 
> FAQ only mentions how to set them through files or environment variables.

Not really (other than setenv()).  Most MCA parameters are read during MPI_INIT 
and not examined again afterwards.

> I would like to set coll_tuned_use_dynamic_rules and 
> coll_tuned_alltoall_algorithm.

I'm not sure offhand if the coll tuned module re-examines these values after 
MPI_INIT?  It *might* look at them as each communicator is created, but I don't 
know.

George?  (if George doesn't answer here, go knock on his door -- I assume 
you're close by ;-) )

> I assume there is a function to do this  in include/opal/mca but I'm not sure 
> which one. And I couldn't find any documentation explaining it.
> I'm aware that this will only work with OpenMPI and is probably not part of 
> the public api thus the interface might change between version.

Yep -- we do have some internal functions to do this, but they are not public 
functions.

> My plan is to benchmark all alltoall algorithms at start up and then use this 
> algorithm for all later AllToAll calls. I have benchmarked that manually 
> choosing the algorithm can make a large  difference. Also all my alltoall 
> communication is of the same data size thus tuning is easy.

It might actually be easier to write up a shell/perl/whatever script to iterate 
over all the values that you want to run -- setenv the values you want and then 
mpirun (or set the appropriate mpirun command line params, etc.).  I have done 
this kind of thing in the past and it's worked out easier than I thought it 
would.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] mca_btl_tcp_frag_recv: readv failed: Connection resetby peer (104)

2010-01-14 Thread Jeff Squyres
On Jan 13, 2010, at 9:58 PM, SpiduS Okami wrote:

> I would like to know if someone could help me with the following error:
> 
> [fenrir][[9567,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>  mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> 
> I am trying to run the hpcc program in a beowulf type cluster with 2,3 and 4 
> machines. When I use 10.000 problems and up it gives me this error. Any one 
> know what could be this? and how can I solve this problem.

This *usually* means that an MPI process has died unexpectedly; one of its 
peers noticed that it died by the fact that a socket closed.

You might want to poke around and see if there are corefiles or somesuch that 
explain why an MPI process died...?

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] 1.4 OpenMPI build not working well with TotalView onDarwin

2010-01-14 Thread Jeff Squyres
On Jan 8, 2010, at 5:03 PM, Peter Thompson wrote:

> I've tried a few builds of 1.4 on Snow Leopard, and trying to start up 
> TotalView
> gets some of the more 'standard' problems.  

I don't quite know what you mean by "standard" problems...?

> Either the typdef for MPIR_PROCDESC
> can't be found, or MPIR_PROCTABLE is missing.  You can get things to work if 
> you
> start up TotalView first and then pick your program and go to the Parallel tab
> and pick OpenMPI.  But it would be nice to get the classic launch working as 
> well.

I'm unclear on how you could find these symbols if you start TV first, etc., 
but it won't work automatically.

Do you have deeper knowledge (given your email address) on exactly what is 
going wrong?

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] More NetBSD fixes

2010-01-14 Thread Aleksej Saushev
Jeff Squyres  writes:

> Thanks!  I have this queued up to commit in a few hours (we try not to
> commit autogen/configure-worthy changes during the US workday). 

While we're at autogen stuff, could you drop trailing space on
interpreter line? NetBSD doesn't like it.
While I'm not sure it isn't bug, it is better to have no trailing
whitespace for style matters.


-- 
HE CE3OH...


Re: [OMPI users] More NetBSD fixes

2010-01-14 Thread Jeff Squyres
On Jan 14, 2010, at 7:00 PM, Aleksej Saushev wrote:

> > Thanks!  I have this queued up to commit in a few hours (we try not to
> > commit autogen/configure-worthy changes during the US workday).
> 
> While we're at autogen stuff, could you drop trailing space on
> interpreter line? NetBSD doesn't like it.
> While I'm not sure it isn't bug, it is better to have no trailing
> whitespace for style matters.

What line, specifically, are you talking about?

(can we move future patch conversations to the devel list?)

-- 
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] More NetBSD fixes

2010-01-14 Thread Aleksej Saushev
Jeff Squyres  writes:

> On Jan 14, 2010, at 7:00 PM, Aleksej Saushev wrote:
>
>> > Thanks!  I have this queued up to commit in a few hours (we try not to
>> > commit autogen/configure-worthy changes during the US workday).
>> 
>> While we're at autogen stuff, could you drop trailing space on
>> interpreter line? NetBSD doesn't like it.
>> While I'm not sure it isn't bug, it is better to have no trailing
>> whitespace for style matters.
>
> What line, specifically, are you talking about?

The first one, "#! /usr/bin/env bash " should be "#! /usr/bin/env bash"
at the very least.

> (can we move future patch conversations to the devel list?)

Can I post there unsubscribed? I don't object registration, but
receiving more and more mail just to filter it out isn't nice.
Too many mailing lists...


-- 
HE CE3OH...


Re: [OMPI users] More NetBSD fixes

2010-01-14 Thread Jeff Squyres
On Jan 14, 2010, at 7:38 PM, Aleksej Saushev wrote:

> The first one, "#! /usr/bin/env bash " should be "#! /usr/bin/env bash"
> at the very least.

Ah, in autogen.sh -- I thought you were referring to configure.

Strange that NetBSD doesn't like it; I wonder why...  

Regardless, the extra space is now gone: 
https://svn.open-mpi.org/trac/ompi/changeset/22418.

> > (can we move future patch conversations to the devel list?)
> 
> Can I post there unsubscribed? I don't object registration, but
> receiving more and more mail just to filter it out isn't nice.
> Too many mailing lists...

Unfortunately, no -- we only allow posting to subscribed members as an 
anti-spam measure, sorry.  :-(

That being said, you could sign up on it and then set your membership to 
receive no mail...?

-- 
Jeff Squyres
jsquy...@cisco.com