[OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
Greeting Open MPI users! After being off this list for several years, 
I'm back! And I need help:


I'm trying to compile OpenMPI 1.10.3 with the PGI compilers, version 
17.3. I'm using the following configure options:


./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work around 
from 2009:


https://www.open-mpi.org/community/lists/users/2009/04/8724.php

Interestingly, I participated in the discussion that lead to that 
workaround, stating that I had no problem compiling Open MPI with PGI 
v9. I'm assuming the problem now is that I'm specifying 
--enable-mpi-thread-multiple, which I'm doing because a user requested 
that feature.


It's been exactly 8 years and 2 days since that workaround was posted to 
the list. Please tell me a better way of dealing with this issue than 
writing a 'fakepgf90' script. Any suggestions?



--
Prentice

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//'/lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How would 
I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI 
the only option to fix this in slurm, or use Åke's suggestion above?


If I did use Åke's suggestion above, how would that affect the operation 
of Slurm, or future builds of OpenMPI and any other software that might 
rely on Slurm, particulary with regards to building those apps with 
non-PGI compilers?


Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la 
<http://libmpi.la> and/or libslurm.la <http://libslurm.la>

Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal <mailto:pbis...@pppl.gov>> wrote:


Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php
<https://www.open-mpi.org/community/lists/users/2009/04/8724.php>

Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions?


-- 
Prentice


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed expression 
works on the command-line, but not in the script. I'm essentially 
reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? It's 
a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I add 
more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution I 
was thinking about implementing. Having said that, I can't think of a 
situation in which the -pthread/-lpthread argument would be required 
other than linking against statically compiled SLURM libraries and 
even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation

of Slurm, or future builds of OpenMPI and any other software that might
rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la
<http://libmpi.la> and/or libslurm.la <http://libslurm.la>
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php
<https://www.open-mpi.org/community/lists/users/2009/04/8724.php>

Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions?


--
Prentice

___
users mailing list
users@lists.open-mpi.org
htt

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
Nevermind. A coworker helped me figure this one out. Echo is treating 
the '-E' as an argument to echo and interpreting it instead of passing 
it to sed. Since that's used by the configure tests, that's a bit of a 
problem, Just adding another -E before $@, should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed expression 
works on the command-line, but not in the script. I'm essentially 
reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I 
add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution I 
was thinking about implementing. Having said that, I can't think of a 
situation in which the -pthread/-lpthread argument would be required 
other than linking against statically compiled SLURM libraries and 
even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software that 
might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

<http://libmpi.la> and/or libslurm.la <http://libslurm.la>
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php
<https://www.open-mpi.org/community/lists/users/2009/04/8724.php>

Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
 

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is treating 
the '-E' as an argument to echo and interpreting it instead of passing 
it to sed. Since that's used by the configure tests, that's a bit of a 
problem, Just adding another -E before $@, should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I 
add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution 
I was thinking about implementing. Having said that, I can't think 
of a situation in which the -pthread/-lpthread argument would be 
required other than linking against statically compiled SLURM 
libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software that 
might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

<http://libmpi.la> and/or libslurm.la <http://libslurm.la>
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php
<https://www.open-mpi.org/community/lists/users/2009/04/8724.php>

Interestingly, I participated in the discussion that lead to 
that
workaround, stati

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

A coworker came up with another idea that works, too:

newargs=sed s/-pthread//g <
Try
$ printf -- "-E" ...

On 04/03/2017 04:03 PM, Prentice Bisbal wrote:

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is 
treating the '-E' as an argument to echo and interpreting it instead 
of passing it to sed. Since that's used by the configure tests, 
that's a bit of a problem, Just adding another -E before $@, should 
fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper 
script for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If 
I add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la 
files with -lpthread? I ran into this last week and this was the 
solution I was thinking about implementing. Having said that, I 
can't think of a situation in which the -pthread/-lpthread 
argument would be required other than linking against statically 
compiled SLURM libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, �ke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build 
quite a

lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from �ke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. 
How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm 
with PGI
the only option to fix this in slurm, or use �ke's suggestion 
above?


If I did use �ke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software 
that might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

<http://libmpi.la> and/or libslurm.la <http://libslurm.la>
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
--prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
FYI - the proposed 'here-doc' solution below didn't work for me, it 
produced an error. Neither did printf. When I used printf, only the 
first arg was passed along:


#!/bin/bash

realcmd=/usr/pppl/pgi/17.3/linux86-64/17.3/bin/pgcc.real
echo "original args: $@"
newargs=$(printf -- "$@" | sed s/-pthread//g)
echo "new args: $newargs"
#$realcmd $newargs
exit

$ pgcc -tp=x64 -fast conftest.c
original args: -tp=x64 -fast conftest.c
new args: -tp=x64

Any ideas what I might be doing wrong here?

So, my original echo "" "$@" solution works, and another colleague also 
suggested this expressions, which appears to work, too:


newargs=${@/-pthread/}

Although I don't know how portable that is. I'm guessing that's very 
bash-specific syntax.


Prentice

On 04/03/2017 04:26 PM, Prentice Bisbal wrote:

A coworker came up with another idea that works, too:

newargs=sed s/-pthread//g <
Try
$ printf -- "-E" ...

On 04/03/2017 04:03 PM, Prentice Bisbal wrote:

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is 
treating the '-E' as an argument to echo and interpreting it 
instead of passing it to sed. Since that's used by the configure 
tests, that's a bit of a problem, Just adding another -E before $@, 
should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper 
script for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If 
I add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la 
files with -lpthread? I ran into this last week and this was the 
solution I was thinking about implementing. Having said that, I 
can't think of a situation in which the -pthread/-lpthread 
argument would be required other than linking against statically 
compiled SLURM libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, �ke Sandgren wrote:
We build slurm with GCC, drop the -pthread arg in the .la files, 
and
have never seen any problems related to that. And we do build 
quite a
lot of code. And lots of versions of OpenMPI with multiple 
different

compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from �ke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. 
How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm 
with PGI
the only option to fix this in slurm, or use �ke's suggestion 
above?


If I did use �ke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software 
that might
rely on Slurm, particulary with regards to building those apps 
with

non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

<http://libmpi.la> and/or libslurm.la <http://libslurm.la>
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for 
several

years, I'm back! And I need help:

I'm trying to compile

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-04 Thread Prentice Bisbal

Matt,

Thank you so much! I think you might have cracked the case for me. Yes, 
I'm on Linux, and I just looked up siterc and userrc files in the PGI 
userguide. I think I'm going to start with a userrc file, since I prefer 
to minimize customization as much as possible, and to test without 
affecting other users. I have run into other issues with PGI, too, after 
fixing the -pthread issue, which I'll bring up in a separate email.


Prentice

On 04/03/2017 06:24 PM, Matt Thompson wrote:
Coming in near the end here. I've had "fun" with PGI + Open MPI + 
macOS (and still haven't quite solved it, see: 
https://www.mail-archive.com/users@lists.open-mpi.org//msg30865.html, 
still unanswered!) The solution that PGI gave me, and which seems the 
magic sauce on macOS is to use a siterc file 
(http://www.pgroup.com/userforum/viewtopic.php?p=21105#21105):


=
siterc for gcc commands PGI does not support
=
switch -ffast-math is hide;

switch -pipe is hide;

switch -fexpensive-optimizations is hide;

switch -pthread is
append(LDLIB1= -lpthread);

switch -qversion is
early
help(Display compiler version)
helpgroup(overall)
set(VERSION=YES);

switch -Wno-deprecated-declarations is hide;

switch -flat_namespace is hide;


If you use that, -pthread is "rerouted" to append -lpthread. You might 
try that and see if that helps. Since you are on Linux (I assume?), 
then you should be able to proceed as you shouldn't encounter the 
libtool bug/issue/*shrug* that is breaking macOS use.


On Mon, Apr 3, 2017 at 5:14 PM, Reuti <mailto:re...@staff.uni-marburg.de>> wrote:


-BEGIN PGP SIGNED MESSAGE-
    Hash: SHA1


Am 03.04.2017 um 23:07 schrieb Prentice Bisbal:

> FYI - the proposed 'here-doc' solution below didn't work for me,
it produced an error. Neither did printf. When I used printf, only
the first arg was passed along:
>
> #!/bin/bash
>
> realcmd=/usr/pppl/pgi/17.3/linux86-64/17.3/bin/pgcc.real
> echo "original args: $@"
> newargs=$(printf -- "$@" | sed s/-pthread//g)

The format string is missing:

printf "%s " "$@"


> echo "new args: $newargs"
> #$realcmd $newargs
> exit
>
> $ pgcc -tp=x64 -fast conftest.c
> original args: -tp=x64 -fast conftest.c
> new args: -tp=x64
>
> Any ideas what I might be doing wrong here?
>
> So, my original echo "" "$@" solution works, and another
colleague also suggested this expressions, which appears to work, too:
    >
    > newargs=${@/-pthread/}
>
> Although I don't know how portable that is. I'm guessing that's
very bash-specific syntax.
>
> Prentice
>
> On 04/03/2017 04:26 PM, Prentice Bisbal wrote:
>> A coworker came up with another idea that works, too:
>>
>> newargs=sed s/-pthread//g <> $@
>> EOF
>>
>> That should work, too, but I haven't test it.
>>
>> Prentice
>>
>> On 04/03/2017 04:11 PM, Andy Riebs wrote:
    >>> Try
>>> $ printf -- "-E" ...
>>>
>>> On 04/03/2017 04:03 PM, Prentice Bisbal wrote:
>>>> Okay. the additional -E doesn't work,either. :(
>>>>
>>>> Prentice Bisbal Lead Software Engineer Princeton Plasma
Physics Laboratory http://www.pppl.gov
>>>> On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
>>>>> Nevermind. A coworker helped me figure this one out. Echo is
treating the '-E' as an argument to echo and interpreting it
instead of passing it to sed. Since that's used by the configure
tests, that's a bit of a problem, Just adding another -E before
$@, should fix the problem.
>>>>>
>>>>> Prentice
>>>>>
>>>>> On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
>>>>>> I've decided to work around this problem by creating a
wrapper script for pgcc that strips away the -pthread argument,
but my sed expression works on the command-line, but not in the
script. I'm essentially reproducing the workaround from
https://www.open-mpi.org/community/lists/users/2009/04/8724.php
<https://www.open-mpi.org/community/lists/users/2009/04/8724.php>.
>>>>>>
>>>>>> Can anyone see what's wrong with my implementation the
workaround? It's a very simple sed expression. Here's my script:
>>>>>>
>>>>>> #!/bin/bash

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-26 Thread Prentice Bisbal

Everyone,

I just wanted to follow up on this, to help others, or possibly even a 
future me, having problems compiling OpenMPI with the PGI compilers. I 
did get it to work a few weeks ago, but I've been too busy to share my 
solution here. I need to give  a shout out to Matt Thompson for 
providing the missing link: the siterc file (See 
https://www.mail-archive.com/users@lists.open-mpi.org/msg30918.html)


Here's what I did, step by step. This is on a CentOS 6.8 system:

1. Create a siterc file with the following contents in the bin directory 
where your pgcc, pgfortran, etc. live. For me, I installed PGI 17.3 in 
/usr/pppl/pgi/17.3, so this file is located at 
/usr/pppl/pgi/17.3/linux86-64/17.3/bin/siterc:


$ cat  /usr/pppl/pgi/17.3/linux86-64/17.3/bin/siterc
#
# siterc for gcc commands PGI does not support
#

switch -pthread is
 append(LDLIB1=-lpthread);




This will prevent this error:

pgcc-Error-Unknown switch: -pthread


2. During the configure step, specify the -fPIC explicitly in CFLAGS, 
CXXFLAGS, and FCFLAGS. For some reason, this isn't added automatically 
for PGI, which leads to a linking failure deep into the build process. 
Here's my configure command:


./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-fPIC -tp=x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-fPIC -tp=x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-fPIC -tp=x64 -fast" \
  2>&1 | tee configure.log

Obviously, you probably won't be specifying all the same options. The 
'-tp=x64' tells PGI to create a 'unified binary' that will run optimally 
on all the 64-bit x86 processors (according to PGI). Technically, I 
should be specifying '-fpic' instead of '-fPIC', but PGI accepts '-fPIC' 
for compatibility with other compilers, and I typed '-fPIC' out of habit.


That's it! Those two changes allowed me to build and install OpenMPI 
1.10.3 with PGI 17.3



Prentice

On 04/03/2017 10:20 AM, Prentice Bisbal wrote:
Greeting Open MPI users! After being off this list for several years, 
I'm back! And I need help:


I'm trying to compile OpenMPI 1.10.3 with the PGI compilers, version 
17.3. I'm using the following configure options:


./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work around 
from 2009:


https://www.open-mpi.org/community/lists/users/2009/04/8724.php

Interestingly, I participated in the discussion that lead to that 
workaround, stating that I had no problem compiling Open MPI with PGI 
v9. I'm assuming the problem now is that I'm specifying 
--enable-mpi-thread-multiple, which I'm doing because a user requested 
that feature.


It's been exactly 8 years and 2 days since that workaround was posted 
to the list. Please tell me a better way of dealing with this issue 
than writing a 'fakepgf90' script. Any suggestions?





___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] OpenMPI 2.1.0 build error: yes/lib: No such file or director

2017-04-26 Thread Prentice Bisbal

I'm getting the following error when I build OpenMPI 2.1.0 with GCC 5.4.0:

/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG 
-finline-functions -fno-strict-aliasing -pthread -module -avoid-version 
-Lyes/lib  -o libmca_fs_lustre.la  fs_lustre.lo fs_lustre_component.lo 
fs_lustre_file_open.lo fs_lustre_file_close.lo fs_lustre_file_delete.lo 
fs_lustre_file_sync.lo fs_lustre_file_set_size.lo 
fs_lustre_file_get_size.lo -llustreapi  -lrt -lm -lutil

../../../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[2]: *** [libmca_fs_lustre.la] Error 1
make[2]: Leaving directory `/local/pbisbal/openmpi-2.1.0/ompi/mca/fs/lustre'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/ompi'
make: *** [all-recursive] Error 1

Obviously, the problem is this argument to libtool in the above command:

-Lyes/lib

I've worked around this by going into ompi/mca/fs/lustre, running that 
same libtool command but changing "-Lyes/lib" to "-L/lib", and then 
resuming my build from the top level, I figured I'd report this error 
here, to see if this a problem caused by me, or a bug in the configure 
script.


When I do 'make check', I get another error caused by the same bad argument:

/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG 
-finline-functions -fno-strict-aliasing -pthread 
-L/usr/pppl/slurm/15.08.8/lib -Lyes/lib-Wl,-rpath 
-Wl,/usr/pppl/slurm/15.08.8/lib -Wl,-rpath -Wl,yes/lib -Wl,-rpath 
-Wl,/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0/lib -Wl,--enable-new-dtags  -o 
external32 external32.o ../../ompi/libmpi.la ../../opal/libopen-pal.la 
-lrt -lm -lutil

../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[3]: *** [external32] Error 1
make[3]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test'
make: *** [check-recursive] Error 1

For reference, here is my configure command:

./configure \
  --prefix=/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0 \
  --disable-silent-rules \
  --enable-mpi-fortran \
  --enable-mpi-cxx \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-cuda=/usr/pppl/cuda/cudatoolkit/6.5.14 \
  --with-pmix \
  --with-verbs \
  --with-hwloc \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-slurm \
  --with-lustre \
  --with-psm \
  CC=gcc \
  CXX=g++ \
  FC=gfortran \
  2>&1 | tee configure.log

--
Prentice

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread Prentice Bisbal
I'm trying to build OpenMPI 2.1.0 with GCC 5.4.0 on CentOS 6.8. After 
working around the '-Lyes/lib' errors I reported in my previous post, 
opal_path_nfs fails during 'make check' (see below). Is this failure 
critical, or is it something I can ignore and continue with my install? 
Googling only returned links to discussions of similar problems from 4-5 
years ago with earlier versions of OpenMPI.


STDOUT and STDERR from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/util'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/util'
PASS: opal_bit_ops
FAIL: opal_path_nfs

Testsuite summary for Open MPI 2.1.0

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

See test/util/test-suite.log
Please report to http://www.open-mpi.org/community/help/


Contents of test/util/test-suite.log:

cat test/util/test-suite.log
==
   Open MPI 2.1.0: test/util/test-suite.log
==

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: opal_path_nfs
===

Test usage: ./opal_path_nfs [DIR]
On Linux interprets output from mount(8) to check for nfs and verify 
opal_path_nfs()
Additionally, you may specify multiple DIR on the cmd-line, of which you 
the output

get_mounts: dirs[0]:/ fs:rootfs nfs:No
get_mounts: dirs[1]:/proc fs:proc nfs:No
get_mounts: dirs[2]:/sys fs:sysfs nfs:No
get_mounts: dirs[3]:/dev fs:devtmpfs nfs:No
get_mounts: dirs[4]:/dev/pts fs:devpts nfs:No
get_mounts: dirs[5]:/dev/shm fs:tmpfs nfs:No
get_mounts: already know dir[0]:/
get_mounts: dirs[0]:/ fs:nfs nfs:Yes
get_mounts: dirs[6]:/proc/bus/usb fs:usbfs nfs:No
get_mounts: dirs[7]:/var/lib/stateless/writable fs:tmpfs nfs:No
get_mounts: dirs[8]:/var/cache/man fs:tmpfs nfs:No
get_mounts: dirs[9]:/var/lock fs:tmpfs nfs:No
get_mounts: dirs[10]:/var/log fs:tmpfs nfs:No
get_mounts: dirs[11]:/var/run fs:tmpfs nfs:No
get_mounts: dirs[12]:/var/lib/dbus fs:tmpfs nfs:No
get_mounts: dirs[13]:/var/lib/nfs fs:tmpfs nfs:No
get_mounts: dirs[14]:/tmp fs:tmpfs nfs:No
get_mounts: dirs[15]:/var/cache/foomatic fs:tmpfs nfs:No
get_mounts: dirs[16]:/var/cache/hald fs:tmpfs nfs:No
get_mounts: dirs[17]:/var/cache/logwatch fs:tmpfs nfs:No
get_mounts: dirs[18]:/var/lib/dhclient fs:tmpfs nfs:No
get_mounts: dirs[19]:/var/tmp fs:tmpfs nfs:No
get_mounts: dirs[20]:/media fs:tmpfs nfs:No
get_mounts: dirs[21]:/etc/adjtime fs:tmpfs nfs:No
get_mounts: dirs[22]:/etc/ntp.conf fs:tmpfs nfs:No
get_mounts: dirs[23]:/etc/resolv.conf fs:tmpfs nfs:No
get_mounts: dirs[24]:/etc/lvm/archive fs:tmpfs nfs:No
get_mounts: dirs[25]:/etc/lvm/backup fs:tmpfs nfs:No
get_mounts: dirs[26]:/var/account fs:tmpfs nfs:No
get_mounts: dirs[27]:/var/lib/iscsi fs:tmpfs nfs:No
get_mounts: dirs[28]:/var/lib/logrotate.status fs:tmpfs nfs:No
get_mounts: dirs[29]:/var/lib/ntp fs:tmpfs nfs:No
get_mounts: dirs[30]:/var/spool fs:tmpfs nfs:No
get_mounts: dirs[31]:/var/lib/sss fs:tmpfs nfs:No
get_mounts: dirs[32]:/etc/sysconfig/network-scripts fs:tmpfs nfs:No
get_mounts: dirs[33]:/var fs:ext4 nfs:No
get_mounts: already know dir[14]:/tmp
get_mounts: dirs[14]:/tmp fs:ext4 nfs:No
get_mounts: dirs[34]:/local fs:ext4 nfs:No
get_mounts: dirs[35]:/proc/sys/fs/binfmt_misc fs:binfmt_misc nfs:No
get_mounts: dirs[36]:/local/cgroup/cpuset fs:cgroup nfs:No
get_mounts: dirs[37]:/local/cgroup/cpu fs:cgroup nfs:No
get_mounts: dirs[38]:/local/cgroup/cpuacct fs:cgroup nfs:No
get_mounts: dirs[39]:/local/cgroup/memory fs:cgroup nfs:No
get_mounts: dirs[40]:/local/cgroup/devices fs:cgroup nfs:No
get_mounts: dirs[41]:/local/cgroup/freezer fs:cgroup nfs:No
get_mounts: dirs[42]:/local/cgroup/net_cls fs:cgroup nfs:No
get_mounts: dirs[43]:/local/cgroup/blkio fs:cgroup nfs:No
get_mounts: dirs[44]:/usr/pppl fs:nfs nfs:Yes
get_mounts: dirs[45]:/misc fs:autofs nfs:No
get_mounts: dirs[46]:/net fs:autofs nfs:No
get_mounts: dirs[47]:/v fs:autofs nfs:No
get_mounts: dirs[48]:/u fs:autofs nfs:No
get_mounts: dirs[49]:/w fs:autofs nfs:No
get_mounts: dirs[50]:/l fs:autofs nfs:No
get_mounts: dirs[51]:/p fs:autofs nfs:No
get_mounts: dirs[52]:/pfs fs:autofs nfs:No
get_mounts: dirs[53]:/proc/fs/nfsd fs:nfsd nfs:No
get_mounts: dirs[54]:/u/gtchilin fs:nfs nfs:Yes
get_mounts: dirs[55]:/u/ldelgado fs:nfs nfs:Yes
get_mounts: dirs[56]:/p/incoherent fs:nfs nfs:Yes
get_mounts: dirs[57]:/u/bgriers fs:nfs nfs:Yes
get_mounts: dirs[58]:/p/beam fs:nfs nfs:Yes
get_mounts: dirs[59]:/u/ghao fs:nfs nfs:Yes
get_mounts: dirs[60]:/u/slazerso fs:nfs nfs:Yes
get_mounts: dirs[61]:/p/tsc fs:nfs nfs:Yes
get_mounts: dirs[62]:/p/stellopt fs:nfs nfs:Yes
get_mounts: d

Re: [OMPI users] OpenMPI 2.1.0 build error: yes/lib: No such file or director

2017-04-26 Thread Prentice Bisbal

Edgar,

Thank you for the suggestion. That fixed this problem.

Prentice

On 04/26/2017 05:25 PM, Edgar Gabriel wrote:
Can you try to just skip the --with-lustre option ? The option really 
is there to provide an alternative path, if the lustre libraries are 
not installed in the default directories ( e.g. 
--with-lustre=/opt/lustre/).  There is obviously a bug that the 
system did not recognize the missing argument. However, if the lustre 
libraries and headers are installed in the default location (i.e. 
/usr/), the configure logic will pick it up and compile it even if you 
do not provide the --with-lustre argument.


Thanks

Edgar


On 4/26/2017 4:18 PM, Prentice Bisbal wrote:
I'm getting the following error when I build OpenMPI 2.1.0 with GCC 
5.4.0:


/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -pthread -module -avoid-version
-Lyes/lib  -o libmca_fs_lustre.la  fs_lustre.lo fs_lustre_component.lo
fs_lustre_file_open.lo fs_lustre_file_close.lo fs_lustre_file_delete.lo
fs_lustre_file_sync.lo fs_lustre_file_set_size.lo
fs_lustre_file_get_size.lo -llustreapi  -lrt -lm -lutil
../../../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[2]: *** [libmca_fs_lustre.la] Error 1
make[2]: Leaving directory 
`/local/pbisbal/openmpi-2.1.0/ompi/mca/fs/lustre'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/ompi'
make: *** [all-recursive] Error 1

Obviously, the problem is this argument to libtool in the above command:

-Lyes/lib

I've worked around this by going into ompi/mca/fs/lustre, running that
same libtool command but changing "-Lyes/lib" to "-L/lib", and then
resuming my build from the top level, I figured I'd report this error
here, to see if this a problem caused by me, or a bug in the configure
script.

When I do 'make check', I get another error caused by the same bad 
argument:


/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -pthread
-L/usr/pppl/slurm/15.08.8/lib -Lyes/lib-Wl,-rpath
-Wl,/usr/pppl/slurm/15.08.8/lib -Wl,-rpath -Wl,yes/lib -Wl,-rpath
-Wl,/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0/lib -Wl,--enable-new-dtags  -o
external32 external32.o ../../ompi/libmpi.la ../../opal/libopen-pal.la
-lrt -lm -lutil
../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[3]: *** [external32] Error 1
make[3]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test'
make: *** [check-recursive] Error 1

For reference, here is my configure command:

./configure \
--prefix=/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0 \
--disable-silent-rules \
--enable-mpi-fortran \
--enable-mpi-cxx \
--enable-shared \
--enable-static \
--enable-mpi-thread-multiple \
--with-cuda=/usr/pppl/cuda/cudatoolkit/6.5.14 \
--with-pmix \
--with-verbs \
--with-hwloc \
--with-pmi=/usr/pppl/slurm/15.08.8 \
--with-slurm \
--with-lustre \
--with-psm \
CC=gcc \
CXX=g++ \
FC=gfortran \
2>&1 | tee configure.log



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread Prentice Bisbal
That's what I figured, but I wanted to check first. Any idea of exactly 
what it's trying to check?


Prentice

On 04/26/2017 05:54 PM, r...@open-mpi.org wrote:

You can probably safely ignore it.


On Apr 26, 2017, at 2:29 PM, Prentice Bisbal  wrote:

I'm trying to build OpenMPI 2.1.0 with GCC 5.4.0 on CentOS 6.8. After working 
around the '-Lyes/lib' errors I reported in my previous post, opal_path_nfs 
fails during 'make check' (see below). Is this failure critical, or is it 
something I can ignore and continue with my install? Googling only returned 
links to discussions of similar problems from 4-5 years ago with earlier 
versions of OpenMPI.

STDOUT and STDERR from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/util'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/util'
PASS: opal_bit_ops
FAIL: opal_path_nfs

Testsuite summary for Open MPI 2.1.0

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

See test/util/test-suite.log
Please report to http://www.open-mpi.org/community/help/


Contents of test/util/test-suite.log:

cat test/util/test-suite.log
==
   Open MPI 2.1.0: test/util/test-suite.log
==

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: opal_path_nfs
===

Test usage: ./opal_path_nfs [DIR]
On Linux interprets output from mount(8) to check for nfs and verify 
opal_path_nfs()
Additionally, you may specify multiple DIR on the cmd-line, of which you the 
output
get_mounts: dirs[0]:/ fs:rootfs nfs:No
get_mounts: dirs[1]:/proc fs:proc nfs:No
get_mounts: dirs[2]:/sys fs:sysfs nfs:No
get_mounts: dirs[3]:/dev fs:devtmpfs nfs:No
get_mounts: dirs[4]:/dev/pts fs:devpts nfs:No
get_mounts: dirs[5]:/dev/shm fs:tmpfs nfs:No
get_mounts: already know dir[0]:/
get_mounts: dirs[0]:/ fs:nfs nfs:Yes
get_mounts: dirs[6]:/proc/bus/usb fs:usbfs nfs:No
get_mounts: dirs[7]:/var/lib/stateless/writable fs:tmpfs nfs:No
get_mounts: dirs[8]:/var/cache/man fs:tmpfs nfs:No
get_mounts: dirs[9]:/var/lock fs:tmpfs nfs:No
get_mounts: dirs[10]:/var/log fs:tmpfs nfs:No
get_mounts: dirs[11]:/var/run fs:tmpfs nfs:No
get_mounts: dirs[12]:/var/lib/dbus fs:tmpfs nfs:No
get_mounts: dirs[13]:/var/lib/nfs fs:tmpfs nfs:No
get_mounts: dirs[14]:/tmp fs:tmpfs nfs:No
get_mounts: dirs[15]:/var/cache/foomatic fs:tmpfs nfs:No
get_mounts: dirs[16]:/var/cache/hald fs:tmpfs nfs:No
get_mounts: dirs[17]:/var/cache/logwatch fs:tmpfs nfs:No
get_mounts: dirs[18]:/var/lib/dhclient fs:tmpfs nfs:No
get_mounts: dirs[19]:/var/tmp fs:tmpfs nfs:No
get_mounts: dirs[20]:/media fs:tmpfs nfs:No
get_mounts: dirs[21]:/etc/adjtime fs:tmpfs nfs:No
get_mounts: dirs[22]:/etc/ntp.conf fs:tmpfs nfs:No
get_mounts: dirs[23]:/etc/resolv.conf fs:tmpfs nfs:No
get_mounts: dirs[24]:/etc/lvm/archive fs:tmpfs nfs:No
get_mounts: dirs[25]:/etc/lvm/backup fs:tmpfs nfs:No
get_mounts: dirs[26]:/var/account fs:tmpfs nfs:No
get_mounts: dirs[27]:/var/lib/iscsi fs:tmpfs nfs:No
get_mounts: dirs[28]:/var/lib/logrotate.status fs:tmpfs nfs:No
get_mounts: dirs[29]:/var/lib/ntp fs:tmpfs nfs:No
get_mounts: dirs[30]:/var/spool fs:tmpfs nfs:No
get_mounts: dirs[31]:/var/lib/sss fs:tmpfs nfs:No
get_mounts: dirs[32]:/etc/sysconfig/network-scripts fs:tmpfs nfs:No
get_mounts: dirs[33]:/var fs:ext4 nfs:No
get_mounts: already know dir[14]:/tmp
get_mounts: dirs[14]:/tmp fs:ext4 nfs:No
get_mounts: dirs[34]:/local fs:ext4 nfs:No
get_mounts: dirs[35]:/proc/sys/fs/binfmt_misc fs:binfmt_misc nfs:No
get_mounts: dirs[36]:/local/cgroup/cpuset fs:cgroup nfs:No
get_mounts: dirs[37]:/local/cgroup/cpu fs:cgroup nfs:No
get_mounts: dirs[38]:/local/cgroup/cpuacct fs:cgroup nfs:No
get_mounts: dirs[39]:/local/cgroup/memory fs:cgroup nfs:No
get_mounts: dirs[40]:/local/cgroup/devices fs:cgroup nfs:No
get_mounts: dirs[41]:/local/cgroup/freezer fs:cgroup nfs:No
get_mounts: dirs[42]:/local/cgroup/net_cls fs:cgroup nfs:No
get_mounts: dirs[43]:/local/cgroup/blkio fs:cgroup nfs:No
get_mounts: dirs[44]:/usr/pppl fs:nfs nfs:Yes
get_mounts: dirs[45]:/misc fs:autofs nfs:No
get_mounts: dirs[46]:/net fs:autofs nfs:No
get_mounts: dirs[47]:/v fs:autofs nfs:No
get_mounts: dirs[48]:/u fs:autofs nfs:No
get_mounts: dirs[49]:/w fs:autofs nfs:No
get_mounts: dirs[50]:/l fs:autofs nfs:No
get_mounts: dirs[51]:/p fs:autofs nfs:No
get_mounts: dirs[52]:/pfs fs:autofs nfs:No
get_mounts: dirs[53]:/proc/fs/nfsd fs:nfsd nfs:No
get_mounts: dirs[54]:/u/gtchilin fs:nfs nfs:Yes
get_mounts: dirs[55]:/u/ldelgado fs:nfs nfs:Yes
get_mounts: dirs[56]:/p/in

[OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2017-04-27 Thread Prentice Bisbal
I'm building Open MPI 2.1.0 with PGI 17.3, and now I'm getting 'illegal 
instruction' errors during 'make check':


../../config/test-driver: line 107: 65169 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 1 threads: Passed

That's just one example of the error output. See all relevant error 
output below.


Usually, I see these errors when trying to run an executable on a 
processor that doesn't support the instruction set of the executable. I 
used to see this all the time when I supported an IBM Blue Gene/P 
system. I don't think I've ever seen it on an x86 system.


 I'm passing the argument '-tp=x64' to pgcc to build a unified binary, 
so that might be part of the problem, but I've used this exact same 
process to build 2.1.0 with PGI 16.5 just a couple hours ago. I also 
built 1.10.3 with the same compiler flags with PGI 16.5 and 17.3 without 
this error.


Any ideas?

The relevant output from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier
- 1 threads: Passed
PASS: atomic_barrier
- 2 threads: Passed
PASS: atomic_barrier
- 4 threads: Passed
PASS: atomic_barrier
- 5 threads: Passed
PASS: atomic_barrier
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier_noinline
- 1 threads: Passed
PASS: atomic_barrier_noinline
- 2 threads: Passed
PASS: atomic_barrier_noinline
- 4 threads: Passed
PASS: atomic_barrier_noinline
- 5 threads: Passed
PASS: atomic_barrier_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock
- 1 threads: Passed
PASS: atomic_spinlock
- 2 threads: Passed
PASS: atomic_spinlock
- 4 threads: Passed
PASS: atomic_spinlock
- 5 threads: Passed
PASS: atomic_spinlock
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock_noinline
- 1 threads: Passed
PASS: atomic_spinlock_noinline
- 2 threads: Passed
PASS: atomic_spinlock_noinline
- 4 threads: Passed
PASS: atomic_spinlock_noinline
- 5 threads: Passed
PASS: atomic_spinlock_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65169 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 1 threads: Passed
../../config/test-driver: line 107: 65172 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 2 threads: Passed
../../config/test-driver: line 107: 65176 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 4 threads: Passed
../../config/test-driver: line 107: 65180 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 5 threads: Passed
../../config/test-driver: line 107: 65185 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65195 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 1 threads: Passed
../../config/test-driver: line 107: 65198 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 2 threads: Passed
../../config/test-driver: line 107: 65202 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 4 threads: Passed
../../config/test-driver: line 107: 65206 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 5 threads: Passed
../../config/test-driver: line 107: 65210 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65220 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 1 threads: Passed
../../config/test-driver: line 107: 65223 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 2 threads: Passed
../../config/test-driver: line 107: 65227 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 4 threads: Passed
../../config/test-driver: line 107: 65231 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 5 threads: Passed
../../config/test-driver: line 107: 65235 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65245 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset_noinline
- 1 threads: Passed
../../config/test-driver: line 

Re: [OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2017-04-28 Thread Prentice Bisbal

Update: removing the -fast switch caused this error to go away.

Prentice

On 04/27/2017 06:00 PM, Prentice Bisbal wrote:
I'm building Open MPI 2.1.0 with PGI 17.3, and now I'm getting 
'illegal instruction' errors during 'make check':


../../config/test-driver: line 107: 65169 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 1 threads: Passed

That's just one example of the error output. See all relevant error 
output below.


Usually, I see these errors when trying to run an executable on a 
processor that doesn't support the instruction set of the executable. 
I used to see this all the time when I supported an IBM Blue Gene/P 
system. I don't think I've ever seen it on an x86 system.


 I'm passing the argument '-tp=x64' to pgcc to build a unified binary, 
so that might be part of the problem, but I've used this exact same 
process to build 2.1.0 with PGI 16.5 just a couple hours ago. I also 
built 1.10.3 with the same compiler flags with PGI 16.5 and 17.3 
without this error.


Any ideas?

The relevant output from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier
- 1 threads: Passed
PASS: atomic_barrier
- 2 threads: Passed
PASS: atomic_barrier
- 4 threads: Passed
PASS: atomic_barrier
- 5 threads: Passed
PASS: atomic_barrier
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier_noinline
- 1 threads: Passed
PASS: atomic_barrier_noinline
- 2 threads: Passed
PASS: atomic_barrier_noinline
- 4 threads: Passed
PASS: atomic_barrier_noinline
- 5 threads: Passed
PASS: atomic_barrier_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock
- 1 threads: Passed
PASS: atomic_spinlock
- 2 threads: Passed
PASS: atomic_spinlock
- 4 threads: Passed
PASS: atomic_spinlock
- 5 threads: Passed
PASS: atomic_spinlock
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock_noinline
- 1 threads: Passed
PASS: atomic_spinlock_noinline
- 2 threads: Passed
PASS: atomic_spinlock_noinline
- 4 threads: Passed
PASS: atomic_spinlock_noinline
- 5 threads: Passed
PASS: atomic_spinlock_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65169 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 1 threads: Passed
../../config/test-driver: line 107: 65172 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 2 threads: Passed
../../config/test-driver: line 107: 65176 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 4 threads: Passed
../../config/test-driver: line 107: 65180 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 5 threads: Passed
../../config/test-driver: line 107: 65185 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65195 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 1 threads: Passed
../../config/test-driver: line 107: 65198 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 2 threads: Passed
../../config/test-driver: line 107: 65202 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 4 threads: Passed
../../config/test-driver: line 107: 65206 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 5 threads: Passed
../../config/test-driver: line 107: 65210 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_math_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65220 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 1 threads: Passed
../../config/test-driver: line 107: 65223 Illegal instruction "$@" > 
$log_file 2>&1

FAIL: atomic_cmpset
- 2 threads: Passed
../../config/test-driver: line 107: 65227 Illegal instruction "$@" > 
$

Re: [OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2017-05-01 Thread Prentice Bisbal

Jeff,

Why IBM? This problem is caused by the PGI compilers, so shouldn't this 
be directed towards NVidia, which now owns PGI?


Prentice

On 04/29/2017 07:37 AM, Jeff Squyres (jsquyres) wrote:

IBM: can someone check to see if this is a compiler error?



On Apr 28, 2017, at 5:09 PM, Prentice Bisbal  wrote:

Update: removing the -fast switch caused this error to go away.

Prentice

On 04/27/2017 06:00 PM, Prentice Bisbal wrote:

I'm building Open MPI 2.1.0 with PGI 17.3, and now I'm getting 'illegal 
instruction' errors during 'make check':

../../config/test-driver: line 107: 65169 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 1 threads: Passed

That's just one example of the error output. See all relevant error output 
below.

Usually, I see these errors when trying to run an executable on a processor 
that doesn't support the instruction set of the executable. I used to see this 
all the time when I supported an IBM Blue Gene/P system. I don't think I've 
ever seen it on an x86 system.

I'm passing the argument '-tp=x64' to pgcc to build a unified binary, so that 
might be part of the problem, but I've used this exact same process to build 
2.1.0 with PGI 16.5 just a couple hours ago. I also built 1.10.3 with the same 
compiler flags with PGI 16.5 and 17.3 without this error.

Any ideas?

The relevant output from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier
- 1 threads: Passed
PASS: atomic_barrier
- 2 threads: Passed
PASS: atomic_barrier
- 4 threads: Passed
PASS: atomic_barrier
- 5 threads: Passed
PASS: atomic_barrier
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier_noinline
- 1 threads: Passed
PASS: atomic_barrier_noinline
- 2 threads: Passed
PASS: atomic_barrier_noinline
- 4 threads: Passed
PASS: atomic_barrier_noinline
- 5 threads: Passed
PASS: atomic_barrier_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock
- 1 threads: Passed
PASS: atomic_spinlock
- 2 threads: Passed
PASS: atomic_spinlock
- 4 threads: Passed
PASS: atomic_spinlock
- 5 threads: Passed
PASS: atomic_spinlock
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock_noinline
- 1 threads: Passed
PASS: atomic_spinlock_noinline
- 2 threads: Passed
PASS: atomic_spinlock_noinline
- 4 threads: Passed
PASS: atomic_spinlock_noinline
- 5 threads: Passed
PASS: atomic_spinlock_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65169 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 1 threads: Passed
../../config/test-driver: line 107: 65172 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 2 threads: Passed
../../config/test-driver: line 107: 65176 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 4 threads: Passed
../../config/test-driver: line 107: 65180 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 5 threads: Passed
../../config/test-driver: line 107: 65185 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65195 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 1 threads: Passed
../../config/test-driver: line 107: 65198 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 2 threads: Passed
../../config/test-driver: line 107: 65202 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 4 threads: Passed
../../config/test-driver: line 107: 65206 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 5 threads: Passed
../../config/test-driver: line 107: 65210 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65220 Illegal instruction "$@" > $lo

Re: [OMPI users] OpenMPI 2.1.0 + PGI 17.3 = asm test failures

2017-05-01 Thread Prentice Bisbal

Jeff,

You probably were thrown off when I said I've only really seen this 
problem when people didn't cross-compile correctly on the Blue Gene/P I 
used to support. Also, PGI, and IBM both have 3 letters...;)


Prentice

On 05/01/2017 02:20 PM, Jeff Squyres (jsquyres) wrote:

Er... right.  Duh.



On May 1, 2017, at 11:21 AM, Prentice Bisbal  wrote:

Jeff,

Why IBM? This problem is caused by the PGI compilers, so shouldn't this be 
directed towards NVidia, which now owns PGI?

Prentice

On 04/29/2017 07:37 AM, Jeff Squyres (jsquyres) wrote:

IBM: can someone check to see if this is a compiler error?



On Apr 28, 2017, at 5:09 PM, Prentice Bisbal  wrote:

Update: removing the -fast switch caused this error to go away.

Prentice

On 04/27/2017 06:00 PM, Prentice Bisbal wrote:

I'm building Open MPI 2.1.0 with PGI 17.3, and now I'm getting 'illegal 
instruction' errors during 'make check':

../../config/test-driver: line 107: 65169 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 1 threads: Passed

That's just one example of the error output. See all relevant error output 
below.

Usually, I see these errors when trying to run an executable on a processor 
that doesn't support the instruction set of the executable. I used to see this 
all the time when I supported an IBM Blue Gene/P system. I don't think I've 
ever seen it on an x86 system.

I'm passing the argument '-tp=x64' to pgcc to build a unified binary, so that 
might be part of the problem, but I've used this exact same process to build 
2.1.0 with PGI 16.5 just a couple hours ago. I also built 1.10.3 with the same 
compiler flags with PGI 16.5 and 17.3 without this error.

Any ideas?

The relevant output from 'make check':

make  check-TESTS
make[3]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
make[4]: Entering directory `/local/pbisbal/openmpi-2.1.0/test/asm'
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier
- 1 threads: Passed
PASS: atomic_barrier
- 2 threads: Passed
PASS: atomic_barrier
- 4 threads: Passed
PASS: atomic_barrier
- 5 threads: Passed
PASS: atomic_barrier
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_barrier_noinline
- 1 threads: Passed
PASS: atomic_barrier_noinline
- 2 threads: Passed
PASS: atomic_barrier_noinline
- 4 threads: Passed
PASS: atomic_barrier_noinline
- 5 threads: Passed
PASS: atomic_barrier_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock
- 1 threads: Passed
PASS: atomic_spinlock
- 2 threads: Passed
PASS: atomic_spinlock
- 4 threads: Passed
PASS: atomic_spinlock
- 5 threads: Passed
PASS: atomic_spinlock
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
PASS: atomic_spinlock_noinline
- 1 threads: Passed
PASS: atomic_spinlock_noinline
- 2 threads: Passed
PASS: atomic_spinlock_noinline
- 4 threads: Passed
PASS: atomic_spinlock_noinline
- 5 threads: Passed
PASS: atomic_spinlock_noinline
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65169 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 1 threads: Passed
../../config/test-driver: line 107: 65172 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 2 threads: Passed
../../config/test-driver: line 107: 65176 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 4 threads: Passed
../../config/test-driver: line 107: 65180 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 5 threads: Passed
../../config/test-driver: line 107: 65185 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math
- 8 threads: Passed
basename: extra operand `--test-name'
Try `basename --help' for more information.
--> Testing
../../config/test-driver: line 107: 65195 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 1 threads: Passed
../../config/test-driver: line 107: 65198 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 2 threads: Passed
../../config/test-driver: line 107: 65202 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 4 threads: Passed
../../config/test-driver: line 107: 65206 Illegal instruction "$@" > $log_file 
2>&1
FAIL: atomic_math_noinline
- 5 thre

Re: [OMPI users] Strange OpenMPI errors on building Caffe 1.0

2017-05-05 Thread Prentice Bisbal
This error should really be posted to the caffe mailing list. This is an 
error with caffe. Most likely, you are not specifying the location to 
your Open MPI installation properly. And Caffe definitely depends on 
OpenMPI you errors:



.build_release/lib/libcaffe.so: undefined reference to 
`ompi_mpi_cxx_op_intercept'
.build_release/lib/libcaffe.so: undefined reference to 
`MPI::Datatype::Free()'

.build_release/lib/libcaffe.so: undefined reference to `MPI::Comm::Comm()'
.build_release/lib/libcaffe.so: undefined reference to `MPI::Win::Free()'


Are basically saying that the libcaffe shared library (libcaffe.so) was 
compiled making references to MPI functions, but now can't find the 
libraries that actually provide those functions. This means that when 
libcaffe.so was compiled, it could find the OpenMPI headers containing 
the function prototypes, but can't the actual libraries.



To fix, your command needs a -L argument specifying the path to where 
the OpenMPI libaries are located, followed by -l (lower case L) 
arguments for each of the MPI libraries you need. -lmpi is probably one 
of them, but most MPI implementations require additional libraries, such 
as -lopen-rte, -lopen-pal, etc., for Open MPI



While I think I've answered your questions, it's best you ask this on a 
Caffe mailing list, because if the build process could find your MPI 
headers but not your MPI libraries, you either configured your build 
incorrectly, or something about the Caffe configure/build process is 
broken, so you either need to find out how to configure your build 
correctly, or report a bug in the Caffe build process.



Prentice

On 05/04/2017 05:35 PM, Lane, William wrote:


I know this could possibly be off-topic, but the errors are OpenMPI 
errors and if anyone could shed light on the nature of these errors I 
figure it would be this group:

CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin
g++ .build_release/tools/upgrade_solver_proto_text.o -o 
.build_release/tools/upgrade_solver_proto_text.bin -pthread -fPIC 
-DCAFFE_VERSION=1.0.0-rc5 -DNDEBUG -O2 -DUSE_OPENCV -DUSE_LEVELDB 
-DUSE_LMDB -DCPU_ONLY -DWITH_PYTHON_LAYER 
-I/hpc/apps/python27/include/python2.7 
-I/hpc/apps/python27/externals/numpy/1.9.2/lib/python2.7/site-packages/numpy/core/include 
-I/usr/local/include -I/hpc/apps/hdf5/1.8.17/include 
-I.build_release/src -I./src -I./include 
-I/hpc/apps/atlas/3.10.2/include -Wall -Wno-sign-compare -lcaffe 
-L/hpc/apps/gflags/lib -L/hpc/apps/python27/lib 
-L/hpc/apps/python27/lib/python2.7 -L/hpc/apps/atlas/3.10.2/lib 
-L.build_release/lib  -lglog -lgflags -lprotobuf -lboost_system 
-lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb 
-lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread 
-lstdc++ -lboost_python -lpython2.7 -lcblas -latlas \

-Wl,-rpath,\$ORIGIN/../lib
.build_release/lib/libcaffe.so: undefined reference to 
`ompi_mpi_cxx_op_intercept'
.build_release/lib/libcaffe.so: undefined reference to 
`MPI::Datatype::Free()'

.build_release/lib/libcaffe.so: undefined reference to `MPI::Comm::Comm()'
.build_release/lib/libcaffe.so: undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
I've read this may be due to a dependency of Caffe that uses OpenMPI 
(since I've been told Caffe itself doesn't use OpenMPI).


Would adding -l directives to LIBRARIES line in the Makefile for Caffe 
that reference all OpenMPI libraries fix this problem?

For example, -l mpi.

Thank you in advance. Hopefully this isn't entirely OT.

William L.

IMPORTANT WARNING: This message is intended for the use of the person 
or entity to which it is addressed and may contain information that is 
privileged and confidential, the disclosure of which is governed by 
applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivering it to 
the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this information is strictly 
prohibited. Thank you for your cooperation.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Strange OpenMPI errors showing up in Caffe rc5 build

2017-05-05 Thread Prentice Bisbal


On 05/04/2017 09:08 PM, gil...@rist.or.jp wrote:


William,

the link error clearly shows libcaffe.so does require C++ bindings.

did you build caffe from a fresh tree ?

what if you

ldd libcaffe.so

nm libcaffe.so | grep -i ompi

if libcaffe.so does require mpi c++ bindings, it should depend on it

(otherwise the way it was built is questionnable)

you might want to link with mpic++ instead of g++



This is a great point I missed in my previous e-mail on this topic. When 
compiling a program that uses MPI, you want to specify the MPI compiler 
wrappers for your C, C++ and Fortran Compilers, and not your chosen 
compiler directly, For example:


./configure --prefix=/usr/local/foo-1.2.3 CC=mpicc CXX=mpicxx FC=mpif90

 or something similar. This guarantees that the actual compiler is 
called with all the write flags for the C preprocessor, linker, etc. 
This will almost always prevent those linking errors.


note mpi C++ bindings are no more built by default since v2.0, so you 
likely have to


configure --enable-mpi-cxx

last but not least, make sure caffe and openmpi were built with the 
same c++ compiler


Cheers,

Gilles

- Original Message -

I know this could possibly be off-topic, but the errors are
OpenMPI errors and if anyone could shed light on the nature of
these errors I figure it would be this group:

CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin
g++ .build_release/tools/upgrade_solver_proto_text.o -o
.build_release/tools/upgrade_solver_proto_text.bin -pthread
-fPIC -DCAFFE_VERSION=1.0.0-rc5 -DNDEBUG -O2 -DUSE_OPENCV
-DUSE_LEVELDB -DUSE_LMDB -DCPU_ONLY -DWITH_PYTHON_LAYER
-I/hpc/apps/python27/include/python2.7

-I/hpc/apps/python27/externals/numpy/1.9.2/lib/python2.7/site-packages/numpy/core/include
-I/usr/local/include -I/hpc/apps/hdf5/1.8.17/include
-I.build_release/src -I./src -I./include
-I/hpc/apps/atlas/3.10.2/include -Wall -Wno-sign-compare
-lcaffe -L/hpc/apps/gflags/lib -L/hpc/apps/python27/lib
-L/hpc/apps/python27/lib/python2.7
-L/hpc/apps/atlas/3.10.2/lib -L.build_release/lib-lglog
-lgflags -lprotobuf -lboost_system -lboost_filesystem -lm
-lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core
-lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++
-lboost_python -lpython2.7 -lcblas -latlas \
-Wl,-rpath,\$ORIGIN/../lib
.build_release/lib/libcaffe.so: undefined reference to
`ompi_mpi_cxx_op_intercept'
.build_release/lib/libcaffe.so: undefined reference to
`MPI::Datatype::Free()'
.build_release/lib/libcaffe.so: undefined reference to
`MPI::Comm::Comm()'
.build_release/lib/libcaffe.so: undefined reference to
`MPI::Win::Free()'
collect2: error: ld returned 1 exit status

I've read this may be due to a dependency of Caffe that uses
OpenMPI (since I've been told Caffe itself doesn't use OpenMPI).

Would adding -l directives to LIBRARIES line in the Makefile for
Caffe that reference all OpenMPI libraries fix this problem?

For example, -l mpi.

Thank you in advance. Hopefully this isn't entirely OT.

William L.

IMPORTANT WARNING: This message is intended for the use of the
person or entity to which it is addressed and may contain
information that is privileged and confidential, the disclosure of
which is governed by applicable law. If the reader of this message
is not the intended recipient, or the employee or agent
responsible for delivering it to the intended recipient, you are
hereby notified that any dissemination, distribution or copying of
this information is strictly prohibited. Thank you for your
cooperation. IMPORTANT WARNING: This message is intended for the
use of the person or entity to which it is addressed and may
contain information that is privileged and confidential, the
disclosure of which is governed by applicable law. If the reader
of this message is not the intended recipient, or the employee or
agent responsible for delivering it to the intended recipient, you
are hereby notified that any dissemination, distribution or
copying of this information is strictly prohibited. Thank you for
your cooperation. 




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Strange OpenMPI errors showing up in Caffe rc5 build

2017-05-08 Thread Prentice Bisbal


On 05/06/2017 03:28 AM, Lane, William wrote:


The strange thing is OpenMPI isn't mentioned anywhere as being a 
dependency for Caffe! I haven't read anything that suggests OpenMPI is 
supported  in Caffe either. This is why I figure it must be a 
dependency of Caffe (of which there are 15) that relies on OpenMPI.




Are you sure you didn't donwload Caffe MPI by acident? Both versions are 
open-source and available from Github:


http://www.inspursystems.com/dl/open-source-caffe-mpi-download/



I tried setting the compiler to mpic++ in the Makefile.config file and 
the result was:


Makefile:314: *** Cannot static link with the mpic++ compiler.  Stop.


I'm going to try explicitly enumerating all OpenMPI libraries in 
Makefile.config and see if that makes a difference.




I would not recommend that. It's always better to use the wrapper 
scripts (mpicc, mpic++, mpif90, etc.). If that's not working, it would 
be better for you to find out why and fix that problem I would start 
with a simple MPI-enabled "Hello, world!" C++ program. See if you can 
compile that, and go from there. Start simple and work your way up from 
there.


I've seen errors similar to yours in the past that were caused by the 
wrong switches being passed to the compiler. I've also seen similar 
errors when the compiler command was screwed up, too  (typo, etc.) Also, 
check to make sure that the static OpenMPI libraries exist. I think 
they're built by default, but I could be wrong. I always explicitly 
specify building both static and dynamic libraries for all my software 
at configure time.


Also, you when you post error  messages, ALWAYS include the command that 
caused the error, too. An error message by itself, like the one above, 
doesn't give us much information to help you diagnose the problem. If 
you provided the command, it's possible that someone on the list could 
immediately see  a problem with the command and quickly pinpoint the 
problem.


Prentice



Thanks for your help, the Caffe listserve group doesn't have any 
answers for this issue (except use the Docker image).



-William L.

--------
*From:* users  on behalf of Prentice 
Bisbal 

*Sent:* Friday, May 5, 2017 7:47:39 AM
*To:* users@lists.open-mpi.org
*Subject:* Re: [OMPI users] Strange OpenMPI errors showing up in Caffe 
rc5 build


On 05/04/2017 09:08 PM, gil...@rist.or.jp wrote:


William,

the link error clearly shows libcaffe.so does require C++ bindings.

did you build caffe from a fresh tree ?

what if you

ldd libcaffe.so

nm libcaffe.so | grep -i ompi

if libcaffe.so does require mpi c++ bindings, it should depend on it

(otherwise the way it was built is questionnable)

you might want to link with mpic++ instead of g++



This is a great point I missed in my previous e-mail on this topic. 
When compiling a program that uses MPI, you want to specify the MPI 
compiler wrappers for your C, C++ and Fortran Compilers, and not your 
chosen compiler directly, For example:


./configure --prefix=/usr/local/foo-1.2.3 CC=mpicc CXX=mpicxx FC=mpif90

 or something similar. This guarantees that the actual compiler is 
called with all the write flags for the C preprocessor, linker, etc. 
This will almost always prevent those linking errors.


note mpi C++ bindings are no more built by default since v2.0, so you 
likely have to


configure --enable-mpi-cxx

last but not least, make sure caffe and openmpi were built with the 
same c++ compiler


Cheers,

Gilles

- Original Message -

I know this could possibly be off-topic, but the errors are
OpenMPI errors and if anyone could shed light on the nature of
these errors I figure it would be this group:

CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin
g++ .build_release/tools/upgrade_solver_proto_text.o -o
.build_release/tools/upgrade_solver_proto_text.bin -pthread
-fPIC -DCAFFE_VERSION=1.0.0-rc5 -DNDEBUG -O2 -DUSE_OPENCV
-DUSE_LEVELDB -DUSE_LMDB -DCPU_ONLY -DWITH_PYTHON_LAYER
-I/hpc/apps/python27/include/python2.7

-I/hpc/apps/python27/externals/numpy/1.9.2/lib/python2.7/site-packages/numpy/core/include
-I/usr/local/include -I/hpc/apps/hdf5/1.8.17/include
-I.build_release/src -I./src -I./include
-I/hpc/apps/atlas/3.10.2/include -Wall -Wno-sign-compare
-lcaffe -L/hpc/apps/gflags/lib -L/hpc/apps/python27/lib
-L/hpc/apps/python27/lib/python2.7
-L/hpc/apps/atlas/3.10.2/lib -L.build_release/lib-lglog
-lgflags -lprotobuf -lboost_system -lboost_filesystem -lm
-lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core
-lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++
-lboost_python -lpython2.7 -lcblas -latlas \
-Wl,-rpath,\$ORIGIN/../lib
.build_release/lib/libcaffe.so: undefined reference to

[OMPI users] bind-to-core with AMD CMT?

2017-08-24 Thread Prentice Bisbal

OpenMPI Users,

I am using AMD processocers with CMT, where two cores constitute a 
module, and there is only one FPU per module, so each pair of cores has 
to share a single FPU.  I want to use only one core per module so there 
is no contention between cores in the same module for the single FPU. Is 
this possible from the command-line using mpirun with the correct 
binding specifications? If so, how would I do this?


I am using OpenMPI 1.10.3. I read the man page regarding the 
bind-to-core options, and I'm not sure that will do exactly what I want, 
so I figured I'd ask the experts here.


--
Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] bind-to-core with AMD CMT?

2017-08-29 Thread Prentice Bisbal

I'd like to follow up to my own e-mail...

After playing around with the --bind-to options, it seems there is no 
way to do this with AMD CMT processors, since they are actual physical 
cores, and not hardware threads that appear as "logical cores" as with 
Intel processors with hyperthreading. Which, in hindsight, makes perfect 
sense.


In the BIOS, you can turn reduce the number of cores to match the number 
of FPUs. On the SuperMicro systems I was testing on, the options is 
called "Downcore" (or somethiing like that) and I set to a value of 
"compute unit"


Prentice

On 08/24/2017 03:11 PM, Prentice Bisbal wrote:

OpenMPI Users,

I am using AMD processocers with CMT, where two cores constitute a 
module, and there is only one FPU per module, so each pair of cores 
has to share a single FPU.  I want to use only one core per module so 
there is no contention between cores in the same module for the single 
FPU. Is this possible from the command-line using mpirun with the 
correct binding specifications? If so, how would I do this?


I am using OpenMPI 1.10.3. I read the man page regarding the 
bind-to-core options, and I'm not sure that will do exactly what I 
want, so I figured I'd ask the experts here.




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Building OpenMPI 10.4 with PGI fortran 10.8 and gcc

2010-09-15 Thread Prentice Bisbal
How good are you with reading/editing Makefiles? I find problems like
this are usually solved by searching the Makefiles for the offending
line(s) and removing the offending switch.

In a well-designed make environment, you should only have to edit the
top-level Makefile. In the worst case, you'll have to edit every
Makefile. Fortunately, you can usually speed this up with some shell
kung-fu, if necessary.

This of course doesn't work if the developers were "clever" enough to
have a build environment that overwrite the Makefiles with new ones
every time to try to build. I don't think this applies to Open MPI.

Prentice


Axel Schweiger wrote:
>  Trying to build a hybrid OpenMPI with PGI fortran and gcc to support
> WRF model
> The problem appears to be due to a -pthread switch passed to pgfortran.
> 
> 
> 
> libtool: link: pgfortran -shared  -fpic -Mnomain  .libs/mpi.o
> .libs/mpi_sizeof.o .libs/mpi_comm_spawn_multiple_f90.o
> .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o
> .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o  
> -Wl,-rpath -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/ompi/.libs
> -Wl,-rpath -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs
> -Wl,-rpath -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs
> -Wl,-rpath -Wl,/opt/openmpi-pgi-gcc-1.42/lib
> -L/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs
> -L/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs
> ../../../ompi/.libs/libmpi.so
> /home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs/libopen-rte.so
> /home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs/libopen-pal.so -ldl
> -lnsl -lutil -lm-pthread -Wl,-soname -Wl,libmpi_f90.so.0 -o
> .libs/libmpi_f90.so.0.0.0
> pgfortran-Error-Unknown switch: -pthread
> make[4]: *** [libmpi_f90.la] Error 1
> 
> 
> There has been discussion on this issue and the below solution
> suggested. This doesn't appear to work for the 10.8
> release.
> 
> http://www.open-mpi.org/community/lists/users/2009/04/8911.php
> 
> There was a previous thread:
> http://www.open-mpi.org/community/lists/users/2009/03/8687.php
> 
> suggesting other solutions.
> 
> Wondering if there is a better solution right now? Building 1.4.2
> 
> Thanks
> Axel





Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Prentice Bisbal
Ethan Deneault wrote:
> All,
> 
> I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the
> /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically
> /opt/openmpi, but Red Hat does things differently. I have my PATH and
> LD_LIBRARY_PATH set correctly; because the test program does compile and
> run.
> 
> The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is
> a AMD x86_64 machine which serves the diskless node images and /home as
> an NFS mount. I compile all of my programs as 32-bit.
> 
> My code is a simple hello world:
> $ more test.f
>   program test
> 
>   include 'mpif.h'
>   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
> 
>   call MPI_INIT(ierror)
>   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>   print*, 'node', rank, ': Hello world'
>   call MPI_FINALIZE(ierror)
>   end
> 
> If I run this program with:
> 
> $ mpirun --machinefile testfile ./test.out
>  node   0 : Hello world
>  node   2 : Hello world
>  node   1 : Hello world
> 
> This is the expected output. Here, testfile contains the master node:
> 'pleiades', and two slave nodes: 'taygeta' and 'm43'
> 
> If I add another machine to testfile, say 'asterope', it hangs until I
> ctrl-c it. I have tried every machine, and as long as I do not include
> more than 3 hosts, the program will not hang.
> 
> I have run the debug-daemons flag with it as well, and I don't see what
> is wrong specifically.
> 

I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)

This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

2. Run the mpirun command from a different host. I'd try running it from
several different hosts.

3. Change your machinefile to include 4 completely different hosts.

I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.

-- 
Prentice


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Prentice Bisbal
Ashley Pittman wrote:
> This smacks of a firewall issue, I thought you'd said you weren't using one 
> but now I read back your emails I can't see anywhere where you say that.  Are 
> you running a flrewall or any iptables rules on any of the nodes?  It looks 
> to me like you may have some setup from on the worker nodes.
> 
> Ashley.
> 

I agree with Ashley. To make sure it's not an IP tables or SELinux
problem on one of the nodes, run these two commands on all teh nodes and
then try again:

service iptables stop
setenforce 0


-- 
Prentice


Re: [OMPI users] open MPI please recommend a debugger for open MPI

2010-11-03 Thread Prentice Bisbal
Jack Bryan wrote:
> Hi,
> 
> Would you please recommend a debugger, which can do debugging for
> parallel processes 
> on Open MPI systems ? 
> 
> I hope that it can be installed without root right because I am not a
> root user for our
> MPI cluster. 
> 
> Any help is appreciated. 
> 

Well-placed printf statements are tough to beat.

-- 
Prentice


Re: [OMPI users] open MPI please recommend a debugger for open MPI

2010-11-03 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Nov 3, 2010, at 10:58 AM, Prentice Bisbal wrote:
> 
>> Well-placed printf statements are tough to beat.
> 
> Ouch.
> 
> Please read:
> 
> - http://blogs.cisco.com/ciscotalk-performance/parallel_debugging/.
> - http://cw.squyres.com/columns/2004-12-CW-MPI-Mechanic.pdf
> - http://cw.squyres.com/columns/2005-01-CW-MPI-Mechanic.pdf
> 
> :-)
> 

So then to paraphrase Jon Stewart, you'd like to politely disagree with me?

:-)

-- 
Prentice


Re: [OMPI users] open MPI please recommend a debugger for open MPI

2010-11-03 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Nov 3, 2010, at 3:59 PM, Prentice Bisbal wrote:
> 
>> So then to paraphrase Jon Stewart, you'd like to politely disagree with me?
> 
> I think that would be quite reasonable.
> 
> ;-)

Perhaps we could discuss our differences over a spot of tea? [1]

> 
> This is not to say that I'm above printf debugging -- I do it all the time.  
> But... I know what I'm doing.  Ahem.  But seriously, I try to use printf's 
> for only truly trivial things; I use various kinds of debugging tools for 
> everything else (e.g., I'll frequently use gdb to examine corefiles and/or 
> attach to individual processes in parallel jobs to do heavy-lifting 
> debugging).
> 

I use the gdb technique, too, but didn't want to suggest that and then
be on the hook for writing detailed instructions here on how to do it.  :-)

[1] I was actually at the Sanity/Fear Rally on Saturday, with about 215k
of my closest friends.

-- 
Prentice


Re: [OMPI users] Open MPI data transfer error

2010-11-05 Thread Prentice Bisbal
Jack Bryan wrote:
> 
> Hi, 
> 
> In my Open MPI program, one master sends data to 3 workers.
> 
> Two workers can receive their data. 
> 
> But, the third  worker can not get their data. 
> 
> Before sending data, the master sends a head information to each worker
> receiver 
> so that each worker knows what the following data package is. (such as
> length, package tag).
>  
> The third worker can get its head information message from master but
> cannot get its correct 
> data package. 
> 
> It got the data that should be received by first worker, which get its
> correct data. 
> 


Jack,

Providing the relevant sections of code here would be very helpful.


I would tell you to add some printf statements to your code to see what
data is stored in your variables on the master before it sends them to
each node, but Jeff Squyres and I agreed to disagree in a civil manner
on that debugging technique earlier this week, and I'd hate to re-open
those old wounds by suggesting that technique here. ;)


-- 
Prentice


Re: [OMPI users] Open MPI data transfer error

2010-11-05 Thread Prentice Bisbal
We can't help you with your coding problem without seeing your code.


Jack Bryan wrote:
> Thanks, 
> I have used "cout" in c++ to print the values of data. 
> 
> The sender sends correct data to correct receiver. 
> 
> But, receiver gets wrong data from correct sender. 
> 
> why ? 
> 
> thanks 
> 
> Nov. 5 2010
> 
>> Date: Fri, 5 Nov 2010 08:54:22 -0400
>> From: prent...@ias.edu
>> To: us...@open-mpi.org
>> Subject: Re: [OMPI users] Open MPI data transfer error
>>
>> Jack Bryan wrote:
>> >
>> > Hi,
>> >
>> > In my Open MPI program, one master sends data to 3 workers.
>> >
>> > Two workers can receive their data.
>> >
>> > But, the third worker can not get their data.
>> >
>> > Before sending data, the master sends a head information to each worker
>> > receiver
>> > so that each worker knows what the following data package is. (such as
>> > length, package tag).
>> >
>> > The third worker can get its head information message from master but
>> > cannot get its correct
>> > data package.
>> >
>> > It got the data that should be received by first worker, which get its
>> > correct data.
>> >
>>
>>
>> Jack,
>>
>> Providing the relevant sections of code here would be very helpful.
>>
>> 
>> I would tell you to add some printf statements to your code to see what
>> data is stored in your variables on the master before it sends them to
>> each node, but Jeff Squyres and I agreed to disagree in a civil manner
>> on that debugging technique earlier this week, and I'd hate to re-open
>> those old wounds by suggesting that technique here. ;)
>> 
>>
>> --
>> Prentice


Re: [OMPI users] Open MPI data transfer error

2010-11-05 Thread Prentice Bisbal
Choose one

A) Post only the relevant sections of the code. If you have syntax
error, it should be in the Send and Receive calls, or one of the lines
where the data is copied or read from the array/buffer/whatever that
you're sending or receiving.

B) Try reproducing your problem in a toy program that has only enough
code to reproduce your problem. For example, create an array, populate
it with data, send it, and then on the receiving end, receive it, and
print it out. Something simple like that. I find when I do that, I
usually find the error in my code.

Prentice


Jack Bryan wrote:
> Thanks,
> 
> But, my code is too long to be posted. 
> 
> dozens of files, thousands of lines. 
> 
> Do you have better ideas ? 
> 
> Any help is appreciated. 
> 
> Jack
> 
> Nov. 5 2010
> 
> From: solarbik...@gmail.com
> Date: Fri, 5 Nov 2010 11:20:57 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI data transfer error
> 
> As Prentice said, we can't help you without seeing your code.  openMPI
> has stood many trials from many programmers, with many bugs ironed out.
> So typically it is unlikely openMPI is the source of your error. 
> Without seeing your code the only logical conclusion is that something
> is wrong with your programming.
> 
> On Fri, Nov 5, 2010 at 10:52 AM, Prentice Bisbal  <mailto:prent...@ias.edu>> wrote:
> 
> We can't help you with your coding problem without seeing your code.
> 
> 
> Jack Bryan wrote:
> > Thanks,
> > I have used "cout" in c++ to print the values of data.
> >
> > The sender sends correct data to correct receiver.
> >
> > But, receiver gets wrong data from correct sender.
> >
> > why ?
> >
> > thanks
> >
> > Nov. 5 2010
> >
> >> Date: Fri, 5 Nov 2010 08:54:22 -0400
> >> From: prent...@ias.edu <mailto:prent...@ias.edu>
> >> To: us...@open-mpi.org <mailto:us...@open-mpi.org>
> >> Subject: Re: [OMPI users] Open MPI data transfer error
> >>
> >> Jack Bryan wrote:
> >> >
> >> > Hi,
> >> >
> >> > In my Open MPI program, one master sends data to 3 workers.
> >> >
> >> > Two workers can receive their data.
> >> >
> >> > But, the third worker can not get their data.
> >> >
> >> > Before sending data, the master sends a head information to
> each worker
> >> > receiver
> >> > so that each worker knows what the following data package is.
> (such as
> >> > length, package tag).
> >> >
> >> > The third worker can get its head information message from
> master but
> >> > cannot get its correct
> >> > data package.
> >> >
> >> > It got the data that should be received by first worker, which
> get its
> >> > correct data.
> >> >
> >>
> >>
> >> Jack,
> >>
> >> Providing the relevant sections of code here would be very helpful.
> >>
> >> 
> >> I would tell you to add some printf statements to your code to
> see what
> >> data is stored in your variables on the master before it sends
> them to
> >> each node, but Jeff Squyres and I agreed to disagree in a civil
> manner
> >> on that debugging technique earlier this week, and I'd hate to
> re-open
> >> those old wounds by suggesting that technique here. ;)
> >> 
>     >>
> >> --
> >> Prentice
> ___
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> -- 
> David Zhang
> University of California, San Diego
> 
> ___ users mailing list
> us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Creating 64-bit objects?

2010-11-09 Thread Prentice Bisbal
How are you specifying these flags? Are you setting them as environment
variables, or are you adding them to the configure command line?

Can you show us the exact commands you used?

Prentice


Price, Brian M (N-KCI) wrote:
> OpenMPI version: 1.3.3 & 1.4.3
> 
> Platform: IBM P5
> 
> Issue:  I want OpenMPI to support some existing 64-bit FORTRAN software,
> but I can’t seem to get 64-bit objects from OpenMPI without some
> modification to the Makefile in ompi/mpi/f90.
> 
> I can configure, build, and install just fine with the following compilers:
> 
> -  CC = xlC_r
> 
> -  CXX = xlC_r
> 
> -  F77 = xlf95_r
> 
> -  FC = xlf95_r
> 
> But, this configuration produces 32-bit objects for all languages.
> 
> So, to produce 64-bit objects for all languages, I supply the following
> flags:
> 
> -  CFLAGS = -q64
> 
> -  CXXFLAGS = -q64
> 
> -  FFLAGS = -q64
> 
> -  FCFLAGS = -q64
> 
> This configuration results in the following error during the build (more
> specifically, link) phase:
> 
> -  When creating libmpi_f90.la in ompi/mpi/f90
> 
> -  COMMANDS:
> 
> o   /bin/sh ../../../libtool  --mode=link xlf95_r
> -I../../../ompi/include -I../../../ompi/include -I. -I.
> -I../../../ompi/mpi/f90  -q64 -version-info 0:1:0  -export-dynamic  -o
> libmpi_f90.la -rpath /lib mpi.lo mpi_sizeof.lo
> mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo mpi_testsome_f90.lo
> mpi_waitall_f90.lo mpi_waitsome_f90.lo mpi_wtick_f90.lo
> mpi_wtime_f90.lo  ../../../ompi/libmpi.la -lnsl -lutil
> 
> o   libtool: link: /usr/bin/ld -m elf64ppc -shared  .libs/mpi.o
> .libs/mpi_sizeof.o .libs/mpi_comm_spawn_multiple_f90.o
> .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o
> .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.lo 
> -L/orte/.libs -L/opal/.libs
> ../../../ompi/.libs/libmpi.so /orte/.libs/libopen-rte.so
> /opal/.libs/libopen-pal.so -ldl -lnsl -lutil  -q64  -soname
> libmpi_f90.so.0 -o .libs/libmpi_f90.so.0.0.1
> 
> -  OUTPUT:
> 
> /usr/bin/ld: unrecognized option ‘-q64’
> 
> /usr/bin/ld: use the --help option for usage information
> 
> make[4]: *** [libmpi_f90.la] Error 1
> 
> make[4]: Leaving directory `/ompi/mpi/f90`
> 
> make[3]: *** [all-recursive] Error 1
> 
> make[3]: Leaving directory `/ompi/mpi/f90`
> 
> make[2]: *** [all] Error 2
> 
> make[2]: Leaving directory `/ompi/mpi/f90`
> 
> make[1]: *** [all-recursive] Error 1
> 
> make[1]: Leaving directory `/ompi`
> 
> make: *** [all-recursive] Error 1
> 
>  
> 
> The -q64 option, while valid for the xlf95_r compiler, is not a valid
> option for /usr/bin/ld.  So, I’m wondering why this option got passed to
> /usr/bin/ld.  After looking at /ompi/mpi/f90/Makefile, I see
> that FCFLAGS shows up in link lines (“libmpi_f90_la_LINK” and
> “FCLINK”).  This direction seems to come from Makefile.in.
> 
> If I remove these FCFLAGS references from the Makefile, I am able to
> complete the build and install of OpenMPI, and it seems to correctly
> support my existing software.
> 
> So,  now for my question:
> 
> Should FCFLAGS show up on these links lines and, if so, how would I get
> 64-bit objects?
> 
> Thanks,
> 
> Brian Price
> 
>  
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-29 Thread Prentice Bisbal
No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
> it may have been that  I didn't set ld_library_path
> 
> On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacres  > wrote:
> 
> thank you, you have been doubly helpful, but I am having linking
> errors and I do not know what the solaris studio compiler's
> preferred linker is. The
> 
> the configure statement was
> 
> ./configure --prefix=/state/partition1/apps/sunmpi/
> --enable-mpi-threads --with-sge --enable-static
> --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
> CXX=/opt/oracle/solstudio12.2/bin/sunCC
> F77=/opt/oracle/solstudio12.2/bin/sunf77
> FC=/opt/oracle/solstudio12.2/bin/sunf90
> 
>compile statement was
> 
> make all install 2>errors
> 
> 
> error below is
> 
> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
> otherwise
> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
> otherwise
> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
> otherwise
> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
> otherwise
> f90: Warning: Option -soname passed to ld, if ld is invoked, ignored
> otherwise
> /usr/bin/ld: unrecognized option '-path'
> /usr/bin/ld: use the --help option for usage information
> make[4]: *** [libmpi_f90.la ] Error 2
> make[3]: *** [all-recursive] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [all-recursive] Error 1
> make: *** [all-recursive] Error 1
> 
> am I doing this wrong? are any of those configure flags unnecessary
> or inappropriate
> 
> 
> 
> On Mon, Nov 29, 2010 at 2:06 PM, Gus Correa  > wrote:
> 
> Nehemiah Dacres wrote:
> 
> I want to compile openmpi to work with the solaris studio
> express  or
> solaris studio. This is a different version than is installed on
> rockscluster 5.2  and would like to know if there any
> gotchas or configure
> flags I should use to get it working or portable to nodes on
> the cluster.
> Software-wise,  it is a fairly homogeneous environment with
> only slight
> variations on the hardware side which could be isolated
> (machinefile flag
> and what-not)
> Please advise
> 
> 
> Hi Nehemiah
> I just answered your email to the OpenMPI list.
> I want to add that if you build OpenMPI with Torque support,
> the machine file for each is not needed, it is provided by Torque.
> I believe the same is true for SGE (but I don't use SGE).
> Gus Correa
> 
> 
> 
> 
> -- 
> Nehemiah I. Dacres
> System Administrator 
> Advanced Technology Group Saint Louis University
> 
> 
> 
> 
> -- 
> Nehemiah I. Dacres
> System Administrator 
> Advanced Technology Group Saint Louis University
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Prentice Bisbal
Nehemiah Dacres wrote:
> that looks about right. So the suggestion:
> 
> ./configure LDFLAGS="-notpath ... ... ..."
> 
> -notpath should be replaced by whatever the proper flag should be, in my case 
> -L ? 

Yes, that's exactly what I meant. I should have chosen something better
than "-notpath" to say "put a value there that was not '-path'".

Not sure if my suggestion will help, given the bug report below. If
you're really determined, you can always try editing all the makefiles
after configure. Something like this might work:

find . -name Makefile -exec sed -i.bak s/-path/-L/g \{\} \;

Use that at your own risk. You might change instances of the string
'-path' that are actually correct.

Prentice

> 
> 
> On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
> mailto:rolf.vandeva...@oracle.com>> wrote:
> 
> This problem looks a lot like a thread from earlier today.  Can you
> look at this
> ticket and see if it helps?  It has a workaround documented in it.
> 
> https://svn.open-mpi.org/trac/ompi/ticket/2632
> 
> Rolf
> 
> 
> On 11/29/10 16:13, Prentice Bisbal wrote:
>> No, it looks like ld is being called with the option -path, and your
>> linker doesn't use that switch. Grep you Makefile(s) for the string
>> "-path". It's probably in a statement defining LDFLAGS somewhere.
>>
>> When you find it, replace it with the equivalent switch for your
>> compiler. You may be able to override it's value on the configure
>> command-line, which is usually easiest/best:
>>
>> ./configure LDFLAGS="-notpath ... ... ..."
>>
>> --
>> Prentice
>>
>>
>> Nehemiah Dacres wrote:
>>   
>>> it may have been that  I didn't set ld_library_path
>>>
>>> On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacres >> <mailto:dacre...@slu.edu>
>>> <mailto:dacre...@slu.edu>> wrote:
>>>
>>> thank you, you have been doubly helpful, but I am having linking
>>> errors and I do not know what the solaris studio compiler's
>>> preferred linker is. The
>>>
>>> the configure statement was
>>>
>>> ./configure --prefix=/state/partition1/apps/sunmpi/
>>> --enable-mpi-threads --with-sge --enable-static
>>> --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
>>> CXX=/opt/oracle/solstudio12.2/bin/sunCC
>>> F77=/opt/oracle/solstudio12.2/bin/sunf77
>>> FC=/opt/oracle/solstudio12.2/bin/sunf90
>>>
>>>compile statement was
>>>
>>> make all install 2>errors
>>>
>>>
>>> error below is
>>>
>>> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
>>> otherwise
>>> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
>>> otherwise
>>> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
>>> otherwise
>>> f90: Warning: Option -path passed to ld, if ld is invoked, ignored
>>> otherwise
>>> f90: Warning: Option -soname passed to ld, if ld is invoked, ignored
>>> otherwise
>>> /usr/bin/ld: unrecognized option '-path'
>>> /usr/bin/ld: use the --help option for usage information
>>> make[4]: *** [libmpi_f90.la <http://libmpi_f90.la> 
>>> <http://libmpi_f90.la>] Error 2
>>> make[3]: *** [all-recursive] Error 1
>>> make[2]: *** [all] Error 2
>>> make[1]: *** [all-recursive] Error 1
>>> make: *** [all-recursive] Error 1
>>>
>>> am I doing this wrong? are any of those configure flags unnecessary
>>> or inappropriate
>>>
>>>
>>>
>>> On Mon, Nov 29, 2010 at 2:06 PM, Gus Correa >> <mailto:g...@ldeo.columbia.edu>
>>> <mailto:g...@ldeo.columbia.edu>> wrote:
>>>
>>> Nehemiah Dacres wrote:
>>>
>>> I want to compile openmpi to work with the solaris studio
>>> express  or
>>> solaris studio. This is a different version than is 
>>> installed on
>>> rockscluster 5.2  and would like to know if there any
>>> gotchas or co

Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits

2010-11-30 Thread Prentice Bisbal
Jeff Squyres wrote:
> Please note that this is an English-speaking list.  I don't know if Tim 
> speaks ?Spanish?, but I unfortunately don't.  :-)
> 

s/Spanish/Portuguese/

-- 
Prentice


Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Prentice Bisbal



On 12/10/2010 07:55 AM, Ralph Castain wrote:

Ick - I agree that's portable, but truly ugly.

Would it make sense to implement this as an MPI extension, and then
perhaps propose something to the Forum for this purpose?


I think that makes sense. As core and socket counts go up, I imagine the 
need for this information will become more common as programmers try to 
explicitly keep codes on a single socket or node.


Prentice



Just hate to see such a complex, time-consuming method when the info is
already available on every process.

On Dec 10, 2010, at 3:36 AM, Terry Dontje wrote:


A more portable way of doing what you want below is to gather each
processes processor_name given by MPI_Get_processor_name, have the
root who gets this data assign unique numbers to each name and then
scatter that info to the processes and have them use that as the color
to a MPI_Comm_split call. Once you've done that you can do a
MPI_Comm_size to find how many are on the node and be able to send to
all the other processes on that node using the new communicator.

Good luck,

--td
On 12/09/2010 08:18 PM, Ralph Castain wrote:

The answer is yes - sort of...

In OpenMPI, every process has information about not only its own local rank, 
but the local rank of all its peers regardless of what node they are on. We use 
that info internally for a variety of things.

Now the "sort of". That info isn't exposed via an MPI API at this time. If that 
doesn't matter, then I can tell you how to get it - it's pretty trivial to do.


On Dec 9, 2010, at 6:14 PM, David Mathog wrote:


Is it possible through MPI for a worker to determine:

  1. how many MPI processes are running on the local machine
  2. within that set its own "local rank"

?

For instance, a quad core with 4 processes might be hosting ranks 10,
14, 15, 20, in which case the "local ranks" would be 1,2,3,4.  The idea
being to use this information so that a program could selectively access
different local resources.  Simple example: on this 4 worker machine
reside telephone directories for Los Angeles, San Diego, San Jose, and
Sacramento.  Each worker is to open one database and search it when the
master sends a request.  With the "local rank" number this would be as
easy as naming the databases file1, file2, file3, and file4.  Without it
the 4 processes would have to communicate with each other somehow to
sort out which is to use which database.  And that could get ugly fast,
especially if they don't all start at the same time.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-02 Thread Prentice Bisbal
Jeffrey A Cummings wrote:
> I use OpenMPI on a variety of platforms:  stand-alone servers running
> Solaris on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes,
> also Linux (again CentOS) on large clusters of AMD/Intel boxes.  These
> platforms all have some version of the 1.3 OpenMPI stream.  I recently
> requested an upgrade on all systems to 1.4.3 (for production work) and
> 1.5.1 (for experimentation).  I'm getting a lot of push back from the
> SysAdmin folks claiming that OpenMPI is closely intertwined with the
> specific version of the operating system and/or other system software
> (i.e., Rocks on the clusters).  I need to know if they are telling me
> the truth or if they're just making excuses to avoid the work.  To state
> my question another way:  Apparently each release of Linux and/or Rocks
> comes with some version of OpenMPI bundled in.  Is it dangerous in some
> way to upgrade to a newer version of OpenMPI?  Thanks in advance for any
> insight anyone can provide.
> 
> - Jeff
> 

Jeff,

OpenMPI is more or less a user-space program, and isn't that tightly
coupled to the OS at all. As long as the OS has the correct network
drivers (ethernet, IB, or other), that's all OpenMPI needs to do it's
job. In fact, you can install it yourself in your own home directory (if
 your home directory is shared amongst the cluster nodes you want to
use), and run it from there - no special privileges needed.

I have many different versions of OpenMPI installed on my systems,
without a problem.

As a system administrator responsible for maintaining OpenMPI on several
clusters, it sounds like one of two things:

1. Your system administrators really don't know what they're talking
about, or,

2. They're lying to you to avoid doing work.

--
Prentice


Re: [OMPI users] OpenMPI version syntax?

2011-02-03 Thread Prentice Bisbal
rpm -qi  might give you more detailed information.

If not, as a last resort, you can download and installed the SRPM and
then look at the name of the tarball in /usr/src/redhat/SOURCES.

Prentice

Jeffrey A Cummings wrote:
> The context was wrt the OpenMPI version that is bundled with a specific
> version of CentOS Linux which my IT folks are about to install on one of
> our servers.  Since the most recent 1.4 stream version is 1.4.3, I'm
> afraid that 1.4-4 is really some variant of 1.4 (i.e., 1.4.0) and hence
> not that new.
> 
> 
> 
> 
> From:Jeff Squyres 
> To:Open MPI Users 
> Date:02/02/2011 07:38 PM
> Subject:Re: [OMPI users] OpenMPI version syntax?
> Sent by:users-boun...@open-mpi.org
> 
> 
> 
> 
> On Feb 2, 2011, at 1:44 PM, Jeffrey A Cummings wrote:
> 
>> I've encountered a supposed OpenMPI version of 1.4-4.  Is the hyphen a
> typo or is this syntax correct and if so what does it mean?
> 
> Is this an RPM version number?  It's fairly common for RPMs to add "-X"
> at the end of the version number.  The "X" indicates the RPM version
> number (i.e., the version number of the packaging -- not the package
> itself).
> 
> Open MPI's version number scheme is explained here:
> 
>http://www.open-mpi.org/software/ompi/versions/
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


Re: [OMPI users] libmpi.so.0 not found during gdb debugging

2011-02-11 Thread Prentice Bisbal
swagat mishra wrote:
> hello everyone,
> i have a network of systems connected over lan with each computer
> running ubuntu. openmpi 1.4.x is installed on 1 machine and the
> installation is mounted on other nodes through Networking File
> System(NFS). the source program and compiled file(a.out) are present in
> the mounted directory
> i run my programs by the following command:
> /opt/project/bin/mpirun -np 4 --prefix  /opt/project/ --hostfile
> hostfile a.out
> i have not set LD_LIBRARY_PATH but as i use --prefix mpirun works
> successfully
>  
> however as per the open mpi debugging faq:
> http://www.open-mpi.org/faq/?category=debugging
> when i run
> /opt/project/bin/mpirun -np 4 --prefix  /opt/project/ --hostfile
> hostfile -x DISPLAY=10.0.0.1:0.0 xterm -e gdb a.out
>  
> 4 xterm windows are opened with gdb running as expected. however when i
> give the command start to gdb in the windows corresponding to remote
> nodes, i get the error:
> libmpi.so.0 not found: no such file/directory
>  
> as mentioned other mpi jobs run fine with mpirun
>  
> when i execute
> /opt/project/bin/mpirun -np 4 --prefix  /opt/project/ -x
> DISPLAY=10.0.0.1:0.0 xterm -e gdb a.out ,the debugging continues succesfully
>  
> please help
> 

You need to set LD_LIBRARY_PATH to include the path to the OpenMPI
libraries. The --prefix option works for OpenMPI only; it has no effect
on other programs. You also need to make sure that the LD_LIBRARY_PATH
variable is correctly passed along to the other OpenMPI programs. For
processes on other hosts, this is usually done by editing your shell's
rc file for non-interactive logins (.bash_profile for bash).

-- 
Prentice



[OMPI users] What's wrong with this code?

2011-02-22 Thread Prentice Bisbal
One of the researchers I support is writing some Fortran code that uses
Open MPI. The code is being compiled with the Intel Fortran compiler.
This one line of code:

integer ierr,istatus(MPI_STATUS_SIZE)

leads to these errors:

$ mpif90 -o simplex simplexmain579m.for simplexsubs579
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
error #6406: Conflicting attributes or multiple declaration of name.
[MPI_STATUS_SIZE]
  parameter (MPI_STATUS_SIZE=5)
-^
simplexmain579m.for(147): error #6591: An automatic object is invalid in
a main program.   [ISTATUS]
integer ierr,istatus(MPI_STATUS_SIZE)
-^
simplexmain579m.for(147): error #6219: A specification expression object
must be a dummy argument, a COMMON block object, or an object accessible
through host or use association   [MPI_STATUS_SIZE]
integer ierr,istatus(MPI_STATUS_SIZE)
-^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6756: A COMMON block data object must not be an automatic object.
  [MPI_STATUS_IGNORE]
  integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)
--^
/usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
error #6591: An automatic object is invalid in a main program.
[MPI_STATUS_IGNORE]
  integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)


Any idea how to fix this? Is this a bug in the Intel compiler, or the code?

Some additional information:

$ mpif90 --showme
ifort -I/usr/local/openmpi-1.2.8/intel-11/x86_64/include
-I/usr/local/openmpi-1.2.8/intel-11/x86_64/lib
-L/usr/local/openmpi-1.2.8/intel-11/x86_64/lib -lmpi_f90 -lmpi_f77 -lmpi
-lopen-rte -lopen-pal -libverbs -lrt -lnuma -ldl -Wl,--export-dynamic
-lnsl -lutil

-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal


Tim Prince wrote:
> On 2/22/2011 1:41 PM, Prentice Bisbal wrote:
>> One of the researchers I support is writing some Fortran code that uses
>> Open MPI. The code is being compiled with the Intel Fortran compiler.
>> This one line of code:
>>
>> integer ierr,istatus(MPI_STATUS_SIZE)
>>
>> leads to these errors:
>>
>> $ mpif90 -o simplex simplexmain579m.for simplexsubs579
>> /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
>> error #6406: Conflicting attributes or multiple declaration of name.
>> [MPI_STATUS_SIZE]
>>parameter (MPI_STATUS_SIZE=5)
>> -^
>> simplexmain579m.for(147): error #6591: An automatic object is invalid in
>> a main program.   [ISTATUS]
>>  integer ierr,istatus(MPI_STATUS_SIZE)
>> -^
>> simplexmain579m.for(147): error #6219: A specification expression object
>> must be a dummy argument, a COMMON block object, or an object accessible
>> through host or use association   [MPI_STATUS_SIZE]
>>  integer ierr,istatus(MPI_STATUS_SIZE)
>> -^
>> /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
>> error #6756: A COMMON block data object must not be an automatic object.
>>[MPI_STATUS_IGNORE]
>>integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)
>> --^
>> /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211):
>> error #6591: An automatic object is invalid in a main program.
>> [MPI_STATUS_IGNORE]
>>integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE)
>>
>>
>> Any idea how to fix this? Is this a bug in the Intel compiler, or the
>> code?
>>
> 
> I can't see the code from here.  The first failure to recognize the
> PARAMETER definition apparently gives rise to the others.  According to
> the message, you already used the name MPI_STATUS_SIZE in mpif-config.h
> and now you are trying to give it another usage (not case sensitive) in
> the same scope.  If so, it seems good that the compiler catches it.

I agree with your logic, but the problem is where the code containing
the error is coming from - it's comping from a header files that's a
part of Open MPI, which makes me think this is a compiler error, since
I'm sure there are plenty of people using the same header file. in their
code.


-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Feb 22, 2011, at 4:41 PM, Prentice Bisbal wrote:
> 
>> One of the researchers I support is writing some Fortran code that uses
>> Open MPI. The code is being compiled with the Intel Fortran compiler.
>> This one line of code:
>>
>> integer ierr,istatus(MPI_STATUS_SIZE)
>>
>> leads to these errors:
>>
>> $ mpif90 -o simplex simplexmain579m.for simplexsubs579
>> /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
>> error #6406: Conflicting attributes or multiple declaration of name.
>> [MPI_STATUS_SIZE]
>>  parameter (MPI_STATUS_SIZE=5)
>> -^
> 
> It's hard to say without seeing the rest of the subroutine in question.
> 
> Can you send the source the the entire subroutine that is failing to compile?
> 

Unfortunately, I I don't think I can provide the rest of the code. I'll
have to talk to the author first. I believe it's considered proprietary.

-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Feb 23, 2011, at 9:48 AM, Tim Prince wrote:
> 
>>> I agree with your logic, but the problem is where the code containing
>>> the error is coming from - it's comping from a header files that's a
>>> part of Open MPI, which makes me think this is a cmpiler error, since
>>> I'm sure there are plenty of people using the same header file. in their
>>> code.
>>>
>> Are you certain that they all find it necessary to re-define identifiers 
>> from that header file, rather than picking parameter names which don't 
>> conflict?
> 
> Without seeing the code, it sounds like Tim might be right: someone is trying 
> to re-define the MPI_STATUS_SIZE parameter that is being defined by OMPI's 
> mpif-config.h header file.  Regardless of include file/initialization 
> ordering (i.e., regardless of whether mpif-config.h is the first or Nth 
> entity to try to set this parameter), user code should never set this 
> parameter value.  
> 
> Or any symbol that begins with MPI_, for that matter.  The entire "MPI_" 
> namespace is reserved for MPI.
> 

I understand that, and I checked the code to make sure the programmer
didn't do anything stupid like that.

The entire code is only a few hundred lines in two different files. In
the entire program, there is only 1 include statement:

include 'mpif.h'

and MPI_STATUS_SIZE appears only once:

integer ierr,istatus(MPI_STATUS_SIZE)

I have limited knowledge of Fortran programming, but based on this, I
don't see how MPI_STATUS_SIZE could be getting overwritten.


-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal


Tim Prince wrote:
> On 2/23/2011 8:27 AM, Prentice Bisbal wrote:
>> Jeff Squyres wrote:
>>> On Feb 23, 2011, at 9:48 AM, Tim Prince wrote:
>>>
>>>>> I agree with your logic, but the problem is where the code containing
>>>>> the error is coming from - it's comping from a header files that's a
>>>>> part of Open MPI, which makes me think this is a cmpiler error, since
>>>>> I'm sure there are plenty of people using the same header file. in
>>>>> their
>>>>> code.
>>>>>
>>>> Are you certain that they all find it necessary to re-define
>>>> identifiers from that header file, rather than picking parameter
>>>> names which don't conflict?
>>>
>>> Without seeing the code, it sounds like Tim might be right: someone
>>> is trying to re-define the MPI_STATUS_SIZE parameter that is being
>>> defined by OMPI's mpif-config.h header file.  Regardless of include
>>> file/initialization ordering (i.e., regardless of whether
>>> mpif-config.h is the first or Nth entity to try to set this
>>> parameter), user code should never set this parameter value.
>>>
>>> Or any symbol that begins with MPI_, for that matter.  The entire
>>> "MPI_" namespace is reserved for MPI.
>>>
>>
>> I understand that, and I checked the code to make sure the programmer
>> didn't do anything stupid like that.
>>
>> The entire code is only a few hundred lines in two different files. In
>> the entire program, there is only 1 include statement:
>>
>> include 'mpif.h'
>>
>> and MPI_STATUS_SIZE appears only once:
>>
>> integer ierr,istatus(MPI_STATUS_SIZE)
>>
>> I have limited knowledge of Fortran programming, but based on this, I
>> don't see how MPI_STATUS_SIZE could be getting overwritten.
>>
>>
> Earlier, you showed a preceding PARAMETER declaration setting a new
> value for that name, which would be required to make use of it in this
> context.  Apparently, you intend to support only compilers which violate
> the Fortran standard by supporting a separate name space for PARAMETER
> identifiers, so that you can violate the MPI standard by using MPI_
> identifiers in a manner which I believe is called shadowing in C.
> 

Tim,

Check the original post again - that PARAMETER line you are referring to
 comes from the mpif-config.h file - not from my own code.

-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal


Jeff Squyres wrote:
> I thought the error was this:
> 
> $ mpif90 -o simplex simplexmain579m.for simplexsubs579
> /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88):
> error #6406: Conflicting attributes or multiple declaration of name.
> [MPI_STATUS_SIZE]
>  parameter (MPI_STATUS_SIZE=5)
> -^
> simplexmain579m.for(147): error #6591: An automatic object is invalid in
> a main program.   [ISTATUS]
>integer ierr,istatus(MPI_STATUS_SIZE)
> -^
> 
> which seems to only show the definition in mpif-config.h (which is an 
> internal OMPI file).  I could be mis-interpreting those compiler messages, 
> though...
> 
> Off-the-wall guess here: is the program doing both "use mpi" *and* "include 
> mpif.h" in the same subroutine...?

Jeff,

I suspected that and checked for it earlier. I just double-checked, and
that is not the problem. Out of the two source files, 'include mpif.h'
appears once, and 'use mpi' does not appear at all. I'm beginning to
suspect it is the compiler that is the problem. I'm using ifort 11.1.
It's not the latest version, but it's only about 1 year old.

$ ifort --version
ifort (IFORT) 11.1 20100203
Copyright (C) 1985-2010 Intel Corporation.  All rights reserved.

--
Prentice



> 
> 
> On Feb 23, 2011, at 11:51 AM, Tim Prince wrote:
> 
>> On 2/23/2011 8:27 AM, Prentice Bisbal wrote:
>>> Jeff Squyres wrote:
>>>> On Feb 23, 2011, at 9:48 AM, Tim Prince wrote:
>>>>
>>>>>> I agree with your logic, but the problem is where the code containing
>>>>>> the error is coming from - it's comping from a header files that's a
>>>>>> part of Open MPI, which makes me think this is a cmpiler error, since
>>>>>> I'm sure there are plenty of people using the same header file. in their
>>>>>> code.
>>>>>>
>>>>> Are you certain that they all find it necessary to re-define identifiers 
>>>>> from that header file, rather than picking parameter names which don't 
>>>>> conflict?
>>>> Without seeing the code, it sounds like Tim might be right: someone is 
>>>> trying to re-define the MPI_STATUS_SIZE parameter that is being defined by 
>>>> OMPI's mpif-config.h header file.  Regardless of include 
>>>> file/initialization ordering (i.e., regardless of whether mpif-config.h is 
>>>> the first or Nth entity to try to set this parameter), user code should 
>>>> never set this parameter value.
>>>>
>>>> Or any symbol that begins with MPI_, for that matter.  The entire "MPI_" 
>>>> namespace is reserved for MPI.
>>>>
>>> I understand that, and I checked the code to make sure the programmer
>>> didn't do anything stupid like that.
>>>
>>> The entire code is only a few hundred lines in two different files. In
>>> the entire program, there is only 1 include statement:
>>>
>>> include 'mpif.h'
>>>
>>> and MPI_STATUS_SIZE appears only once:
>>>
>>> integer ierr,istatus(MPI_STATUS_SIZE)
>>>
>>> I have limited knowledge of Fortran programming, but based on this, I
>>> don't see how MPI_STATUS_SIZE could be getting overwritten.
>>>
>>>
>> Earlier, you showed a preceding PARAMETER declaration setting a new value 
>> for that name, which would be required to make use of it in this context.  
>> Apparently, you intend to support only compilers which violate the Fortran 
>> standard by supporting a separate name space for PARAMETER identifiers, so 
>> that you can violate the MPI standard by using MPI_ identifiers in a manner 
>> which I believe is called shadowing in C.
>>
>> -- 
>> Tim Prince
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 



Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Feb 23, 2011, at 2:20 PM, Prentice Bisbal wrote:
> 
>> I suspected that and checked for it earlier. I just double-checked, and
>> that is not the problem. Out of the two source files, 'include mpif.h'
>> appears once, and 'use mpi' does not appear at all. I'm beginning to
>> suspect it is the compiler that is the problem. I'm using ifort 11.1.
>> It's not the latest version, but it's only about 1 year old.
> 
> 11.1 should be fine - I test with that regularly.
> 
> Can you put together a small example that shows the problem and isn't 
> proprietary?
> 

Jeff,

Thanks for requesting that. As I was looking at the oringinal code to
write a small test program, I found the source of the error. Doesn't it
aways work that way.

The code I'm debugging looked like this:

c main program
implicit integer(i-m)
integer ierr,istatus(MPI_STATUS_SIZE)
include 'mpif.h'
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD,imy_rank,ierr)
call MPI_Comm_size(MPI_COMM_WORLD,iprocess,ierr)
call MPI_FINALIZE(ierr)
stop
end


Can you see the error?  Scroll down for answer ;)









It's using MPI_STATUS_SIZE to dimension istatus before mpif.h is even
read! Correcting the order of the include and declaration statements
fixed the problem. D'oh!


-- 
Prentice


Re: [OMPI users] What's wrong with this code?

2011-02-23 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Feb 23, 2011, at 3:36 PM, Prentice Bisbal wrote:
> 
>> It's using MPI_STATUS_SIZE to dimension istatus before mpif.h is even
>> read! Correcting the order of the include and declaration statements
>> fixed the problem. D'oh!
> 
> A pox on old fortran for letting you use symbols before they are declared...
> 

I second that emotion.

The error message could have been a tad more helpful.

-- 
Prentice


Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-18 Thread Prentice Bisbal
It's not hard to test whether or not SELinux is the problem. You can
turn SELinux off on the command-line with this command:

setenforce 0

Of course, you need to be root in order to do this.

After turning SELinux off, you can try reproducing the error. If it
still occurs, it's SELinux, if it doesn't the problem is elswhere. When
your done, you can reenable SELinux with

setenforce 1

If you're running your job across multiple nodes, you should disable
SELinux on all of them for testing.

Did you compile/install Open MPI yourself? If so, I suspect that you
have the SELinux context labels on your MPI binaries are incorrect.

If you use the method above to determine that SELinux is the problem,
please post your results here and I may be able to help you set things
right. I have some experience with SELinux problems like this, but I'm
not exactly an expert.

--
Prentice


On 03/17/2011 11:01 AM, Jeff Squyres wrote:
> Sorry for the delayed reply.
> 
> I'm afraid I haven't done much with SE Linux -- I don't know if there are any 
> "gotchas" that would show up there.  SE Linux support is not something we've 
> gotten a lot of request for.  I doubt that anyone in the community has done 
> much testing in this area.  :-\
> 
> I suspect that Open MPI is trying to access something that your user (under 
> SE Linux) doesn't have permission to.  
> 
> So I'm afraid I don't have much of an answer for you -- sorry!  If you do 
> figure it out, though, if a fix is not too intrusive, we can probably 
> incorporate it upstream.
> 
> 
> On Mar 4, 2011, at 7:31 AM, Youri LACAN-BARTLEY wrote:
> 
>> Hi,
>>  
>> This is my first post to this mailing-list so I apologize for maybe being a 
>> little rough on the edges.
>> I’ve been digging into OpenMPI for a little while now and have come across 
>> one issue that I just can’t explain and I’m sincerely hoping someone can put 
>> me on the right track here.
>>  
>> I’m using a fresh install of openmpi-1.2.7 and I systematically get a 
>> segmentation fault at the end of my mpirun calls if I’m logged in as a 
>> regular user.
>> However, as soon as I switch to the root account, the segfault does not 
>> appear.
>> The jobs actually run to their term but I just can’t find a good reason for 
>> this to be happening and I haven’t been able to reproduce the problem on 
>> another machine.
>>  
>> Any help or tips would be greatly appreciated.
>>  
>> Thanks,
>>  
>> Youri LACAN-BARTLEY
>>  
>> Here’s an example running osu_latency locally (I’ve “blacklisted” openib to 
>> make sure it’s not to blame):
>>  
>> [user@server ~]$ mpirun --mca btl ^openib  -np 2 
>> /opt/scripts/osu_latency-openmpi-1.2.7
>> # OSU MPI Latency Test v3.3
>> # SizeLatency (us)
>> 0 0.76
>> 1 0.89
>> 2 0.89
>> 4 0.89
>> 8 0.89
>> 160.91
>> 320.91
>> 640.92
>> 128   0.96
>> 256   1.13
>> 512   1.31
>> 1024  1.69
>> 2048  2.51
>> 4096  5.34
>> 8192  9.16
>> 1638417.47
>> 3276831.79
>> 6553651.10
>> 131072   92.41
>> 262144  181.74
>> 524288  512.26
>> 10485761238.21
>> 20971522280.28
>> 41943044616.67
>> [server:15586] *** Process received signal ***
>> [server:15586] Signal: Segmentation fault (11)
>> [server:15586] Signal code: Address not mapped (1)
>> [server:15586] Failing at address: (nil)
>> [server:15586] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
>> [server:15586] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
>> [server:15586] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
>> [server:15586] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) 
>> [0x3cd120fe61]
>> [server:15586] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
>> [server:15586] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
>> [server:15586] *** End of error message ***
>> [server:15587] *** Process received signal ***
>> [server:15587] Signal: Segmentation fault (11)
>> [server:15587] Signal code: Address not mapped (1)
>> [server:15587] Failing at address: (nil)
>> [server:15587] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
>> [server:15587] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
>> [server:15587] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
>> [server:15587] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) 
>> [0x3cd120fe61]
>> [server:15587] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
>> [server:15587] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
>> [server:15587] *** End of error message ***
>> mpirun noticed that job rank 0 with PID 15586 on node server exited on 
>> signal 11 (Segmentation fault).
>> 1 additional process aborted (not s

Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-21 Thread Prentice Bisbal
On 03/20/2011 06:22 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
> 
>> It's not hard to test whether or not SELinux is the problem. You can
>> turn SELinux off on the command-line with this command:
>>
>> setenforce 0
>>
>> Of course, you need to be root in order to do this.
>>
>> After turning SELinux off, you can try reproducing the error. If it
>> still occurs, it's SELinux, if it doesn't the problem is elswhere. When
>> your done, you can reenable SELinux with
>>
>> setenforce 1
>>
>> If you're running your job across multiple nodes, you should disable
>> SELinux on all of them for testing.
> 
> You are not actually disabling SELinux with setenforce 0, just
> putting it into "permissive" mode: SELinux is still active.
> 

That's correct. Thanks for catching my inaccurate choice of words.

> Running SELinux in its permissive mode, as opposed to disabling it
> at boot time, sees SELinux making a log of things that would cause
> it to dive in, were it running in "enforcing" mode.

I forgot about that. Checking those logs will make debugging even easier
for the original poster.

> 
> There's then a tool you can run over that log that will suggest
> the ACL changes you need to make to fix the issue from an SELinux
> perspective.
> 

-- 
Prentice


Re: [OMPI users] Parallel Computation under WiFi for Beginners

2011-03-22 Thread Prentice Bisbal
I'd like to point out that nothing special needs to be done because
you're using a wireless network. As long as you're using TCP for your
message passing, it won't make a difference what you're using as long as
you have TCP/IP configured correctly.

On 03/22/2011 10:42 AM, Jeff Squyres wrote:
> There's lots of good MPI tutorials on the web.
> 
> My favorites are at the NCSA web site; if you get a free account, you can 
> login and see their course listings.
> 
> 
> On Mar 22, 2011, at 7:30 AM, Abdul Rahman Riza wrote:
> 
>> Dear All,
>>
>> I am newbie in parallel computing and would like to ask.
>>
>> I have switch and 2 laptops:
>>  • Dell inspiron 640, dual core 2 gb ram
>>  • Dell inspiron 1010 intel atom 1 gb ram
>>
>> Both laptop running Ubuntu 10.04 under wireles network using TP-LINK access 
>> point.
>>
>> I am wondering if you have tutorial and source code as demo of simple 
>> parallel computing for  2 laptops to perform simultaneous computation.
>>
>> Riza
>> _______
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] printf and scanf problem of C code compiled with Open MPI

2011-03-29 Thread Prentice Bisbal
On 03/29/2011 01:29 PM, Meilin Bai wrote:
> Dear open-mpi users:
>  
> I come across a little problem when running a MPI C program compiled
> with Open MPI 1.4.3. A part of codes as follows:
>  
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD, &myid);
> MPI_Get_processor_name(processor_name, &namelen);
> if (myid == 0) {
>  printf("Please give N= ");
>  //fflush(stdout);
>  scanf("%d", &n);
>  startwtime = MPI_Wtime();
>  }
>  
> If comment out the sentence of "fflush(stdout);", it doesn't print out
> the message till I input an integer n. And if I add the fflush function
> between them, it works as expected, though comsumming time obviously.
>  
> However, when I compiled it with Mpich2-1.3.2p1, without fflush function
> in the code, it works correctly.
>  
> Can anyone know what the matter is.
>  

The Open MPI Developers (Jeff, Ralph, etc) can confirm this:

The MPI standard doesn't have a lot of strict requirements for I/O
behavior like this, so implementations are allowed to buffer I/O if they
want. There is nothing wrong with requiring fflush(stdout) in order to
get the behavior you want. In fact, if you check some text books on MPI
programming, I'm pretty sure they recommend using fflush to minimize
this problem.

MPICH behaves differently because its developers made different design
choices.

Neither behavior is "wrong".

-- 
Prentice


Re: [OMPI users] SGE and openmpi

2011-04-07 Thread Prentice Bisbal


On 04/06/2011 07:09 PM, Jason Palmer wrote:
> Hi,
> I am having trouble running a batch job in SGE using openmpi.  I have read
> the faq, which says that openmpi will automatically do the right thing, but
> something seems to be wrong.
> 
> Previously I used MPICH1 under SGE without any problems. I'm avoiding MPICH2
> because it doesn't seem to support static compilation, whereas I was able to
> get openmpi to compile with open64 and compile my program statically.
> 
> But I am having problems launching. According to the documentation, I should
> be able to have a script file, qsub.sh:
> 
> #!/bin/bash
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -q all.q
> #$ -pe orte 18
> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin
> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS  myprog
> 

If you have SGE integration, you should not specify the number of slots
requested on the command-line. Open MPI will speak directly to SGE (or
vice versa, to get this information.

Also, what is the significance of specifying MPI_DIR? I think want to
add that to your PATH, and then export it to the rest of the nodes by
using the -V switch to qsub. If the correct mpirun isn't found first in
your PATH, your job will definitely fail when launched on the slave hosts.

You also should add the path to the MPI libraries to your LD_LIBRARY
PATH, too, or else you'll endup with run-time linking problems.

For example, I would change your submission script to look like this:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -q all.q
#$ -pe orte 18
#$ -V

MPI_DIR=/home/jason/openmpi-1.4.3-install
export PATH=$MPI_DIR/bin:$PATH
export LD_LIBRARY_PATH=$MPI_DIR/lib:$LD_LIBRARY_PATH

mpirun myprog

This may not fix all your problems, but will definitely fix some of them.


-- 
Prentice


[OMPI users] Anyone with Visual Studio + MPI Experience

2011-06-30 Thread Prentice Bisbal
Does anyone on this list have experience using MS Visual Studio for MPI
development? I'm supporting a Windows user who has been doing Fortran
programming on Windows using an ANCIENT version of Digital Visual
Fortran (I know, I know - using "ancient" and "Digital" in the same
sentence is redundant.)

Well, we are upgrading his equally ancient laptopa new one with Windows
7, so we installed Intel Visual Fortran (direct descendent of DVF) and
Visual Studio 2010, and to be honest, I feel like a fish out of water
using VS 2010. It took me a longer than I care to admit to figure out
how to specify the include and linker paths.

Right now, I'm working with the Intel MPI libraries, but plan on
installing OpenMPI, too, once I figure out VS 2010.

Can anyone tell me how to configure visual studio so that when you click
on the little "play" icon to build/run the code, it will call mpiexec
automatically? Right now, it compiles fine, but throws errors when the
program executes because it doesn't have the right environment setup
because it's not being executed by mpiexec. It runs fine when I execute
it with mpiexec or wmpiexec.

-- 
Prentice


Re: [OMPI users] Anyone with Visual Studio + MPI Experience

2011-06-30 Thread Prentice Bisbal
Thanks, Joe.

I did say that, but I meant that in a different way. For program 'foo',
I need to tell Visual Studio that when I click on the 'run' button, I
need it to execute

mpiexec -np X foo

instead of just

foo

I know what I *need* to do to the VS environment, I just don't know
*how* to do it. I've been going through all the settings, but can't find
the magical checkbox or textbox.

Windows is so disorienting. It's like someone went out of their way to
make life as hard as possible for us command-line guys.

Prentice

On 06/30/2011 04:46 PM, Joe Griffin wrote:
> Prentice,
> 
> It might or might not matter, but on your older system you
> may have used "LD_LIBRARY_PATH" but on windows you need "PATH"
> to contain the PATH.
> 
> I only mention this because you said it runs in one environment,
> but not the other.
> 
> Joe
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Prentice Bisbal
> Sent: Thursday, June 30, 2011 1:42 PM
> To: Open MPI Users
> Subject: [OMPI users] Anyone with Visual Studio + MPI Experience
> 
> Does anyone on this list have experience using MS Visual Studio for MPI
> development? I'm supporting a Windows user who has been doing Fortran
> programming on Windows using an ANCIENT version of Digital Visual
> Fortran (I know, I know - using "ancient" and "Digital" in the same
> sentence is redundant.)
> 
> Well, we are upgrading his equally ancient laptopa new one with Windows
> 7, so we installed Intel Visual Fortran (direct descendent of DVF) and
> Visual Studio 2010, and to be honest, I feel like a fish out of water
> using VS 2010. It took me a longer than I care to admit to figure out
> how to specify the include and linker paths.
> 
> Right now, I'm working with the Intel MPI libraries, but plan on
> installing OpenMPI, too, once I figure out VS 2010.
> 
> Can anyone tell me how to configure visual studio so that when you click
> on the little "play" icon to build/run the code, it will call mpiexec
> automatically? Right now, it compiles fine, but throws errors when the
> program executes because it doesn't have the right environment setup
> because it's not being executed by mpiexec. It runs fine when I execute
> it with mpiexec or wmpiexec.
> 


Re: [OMPI users] mpi & mac

2011-07-06 Thread Prentice Bisbal
On 07/06/2011 10:42 AM, Constantinos Makassikis wrote:
> On Tue, Jul 5, 2011 at 9:48 PM, Robert Sacker  > wrote:
> 
> Hi all,
> 
> Hello !
> 
> I need some help. I'm trying to run C++ code in Xcode on a Mac Pro
> Desktop (OS 10.6) and utilize all 8 cores . My ultimate goal is to
> be able to run the code on the cluster here on campus. I'm in the
> process of converting into C++ the number crunching part of the
> stuff I previously wrote in Matlab. 
> Is there some documentation that explains how to get started?
> Thanks. Bob
> 
> 
> I am not sure whether this is the relevant mailing list for
> general parallelization questions ...

Well, general MPI questions not specific to OpenMPI are not uncommon here.

> 
> In any case, before converting your Matlab code to C++ try using
> parallelization features that come with Matlab.
> 
> Otherwise, after translating your Matlab code to C++, you should
> consider in the first place getting acquainted with OpenMP and
> use it to speed up your code on your 8-core machine.
> OpenMP can be rather straightforward to apply.
> 
> Afterwards, if necessary, you may look into parallelizing over multiple
> machines with OpenMPI.

Why not just use MPI for every step? Open MPI can detect when
communication partners are on the same host and use shared memory for
improved performance. Not sure how this measures up to OpenMP for
intra-node communications, but I imagine it can make the programming
simpler, since only one syntax needs to be learned/used.

As I said, I don't know the performance difference between MPI and
OpenMP, so if someone can shed some light...





Re: [OMPI users] Anyone with Visual Studio + MPI Experience

2011-07-06 Thread Prentice Bisbal
Miguel,

I'm using VS 2010 Professional + Intel Visual Fortran. I don't have the
"Debugger to Launch" option in my version (or I'm looking in the wrong
place), and don't see MPI options any where. Do you have any additional
software installed, like the HPC Pack 2008?

Prentice

On 07/04/2011 04:32 PM, Miguel Vargas Felix wrote:
> 
> Hi,
> 
> well, I don't have a lot of experience with VS+MPI, but these are the
> steps taht I followed to make my projects run:
> 
> 1. Select your project from the Solution explorer, right-click and select
> "Properties"
> 
> 2. From the list on the left, select "Debugging"
> 
> 3. Set "Debugger to launch" to "MPI Cluster Debugger"
> 
> 4. Set "MPIRun Command" to the full path of your "mpiexec" (use quotes at
> to enclose the path)
> 
> 5. Use "MPIRun Arguments" to set the number of processes to start, like
> "-n 4"
> 
> 6. Set "MPIRUN Working Directory" if you need.
> 
> 7. "Application Command" normaly is "$(TargetPath)"
> 
> 8. "Application Arguments" if you need them.
> 
> 9. "MPIShim Location", this is a triky one, for some reason some times VS
> needs the full path for this VS tool. It is located at: "C:\Program
> Files\Microsoft Visual Studio 9.0\Common7\IDE\Remote
> Debugger\x64\mpishim.exe" or "C:\Program Files\Microsoft Visual Studio
> 9.0\Common7\IDE\Remote Debugger\x86\mpishim.exe" (use quotes at to enclose
> the path).
> 
> I haven't played with the other options.
> 
> 10. Close the dialog box.
> 
> 11. Set some breakpoints in your program.
> 
> 12. Ready to run.
> 
> These instructions only work to debug MPI processes on the localhost, and
> I only have tested VS+MPI using MPICH2 for Windows.
> 
> To debug on several nodes you should install the Microsoft HPC SDK (I
> haven't used it).
> 
> Good luck.
> 
> -Miguel
> 
> PS. I use Visual Studio 2008 professional. Also, I know that MPI debugging
> is not available in VS Express editions.
> 
> 
>> Does anyone on this list have experience using MS Visual Studio for MPI
>> development? I'm supporting a Windows user who has been doing Fortran
>> programming on Windows using an ANCIENT version of Digital Visual
>> Fortran (I know, I know - using "ancient" and "Digital" in the same
>> sentence is redundant.)
>>
>> Well, we are upgrading his equally ancient laptopa new one with Windows
>> 7, so we installed Intel Visual Fortran (direct descendent of DVF) and
>> Visual Studio 2010, and to be honest, I feel like a fish out of water
>> using VS 2010. It took me a longer than I care to admit to figure out
>> how to specify the include and linker paths.
>>
>> Right now, I'm working with the Intel MPI libraries, but plan on
>> installing OpenMPI, too, once I figure out VS 2010.
>>
>> Can anyone tell me how to configure visual studio so that when you click
>> on the little "play" icon to build/run the code, it will call mpiexec
>> automatically? Right now, it compiles fine, but throws errors when the
>> program executes because it doesn't have the right environment setup
>> because it's not being executed by mpiexec. It runs fine when I execute
>> it with mpiexec or wmpiexec.
>>
>> --
>> Prentice
>>
>>
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Re: [OMPI users] Anyone with Visual Studio + MPI Experience

2011-07-07 Thread Prentice Bisbal
Miguel,

Thanks for the assistance. I don't have the MPI options you spoke of, so
I figured that might have been part of the HPC Pack. I found a couple of
web pages that helped me make progress. I'm not 100% there, but I'm much
closer, say 85% of the way there.

Now I can get an Fortran+MPI program to run with a single click, but
then I get an error that's OpenMPI-related. The same program runs from
the command-line, so I think it's just a matter of me making sure some
environment variables are set correctly. It turns out the user I'm doing
this for will be away for 6 weeks, so this is no longer the priority it
was a few days ago.

Prentice


On 07/07/2011 01:47 PM, Miguel Vargas Felix wrote:
> Prentice,
> 
> I didn't have to install the HPC Pack, as far as I know it is only needed
> when you want to develop/debug in a cluster. I'm sorry I can't help you
> with VS 2010 (I hated it, I switched back to VS 2008), but the
> instructions to configure VS 2010 seems to be similar, check the MPICH2
> guide for Windows developers.
> 
> http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.3.2-windevguide.pdf
> 
> May be this option is not available for Visual Fortran.
> 
> -Miguel
> 
>> Miguel,
>>
>> I'm using VS 2010 Professional + Intel Visual Fortran. I don't have the
>> "Debugger to Launch" option in my version (or I'm looking in the wrong
>> place), and don't see MPI options any where. Do you have any additional
>> software installed, like the HPC Pack 2008?
>>
>> Prentice
>>
>> On 07/04/2011 04:32 PM, Miguel Vargas Felix wrote:
>>>
>>> Hi,
>>>
>>> well, I don't have a lot of experience with VS+MPI, but these are the
>>> steps taht I followed to make my projects run:
>>>
>>> 1. Select your project from the Solution explorer, right-click and
>>> select
>>> "Properties"
>>>
>>> 2. From the list on the left, select "Debugging"
>>>
>>> 3. Set "Debugger to launch" to "MPI Cluster Debugger"
>>>
>>> 4. Set "MPIRun Command" to the full path of your "mpiexec" (use quotes
>>> at
>>> to enclose the path)
>>>
>>> 5. Use "MPIRun Arguments" to set the number of processes to start, like
>>> "-n 4"
>>>
>>> 6. Set "MPIRUN Working Directory" if you need.
>>>
>>> 7. "Application Command" normaly is "$(TargetPath)"
>>>
>>> 8. "Application Arguments" if you need them.
>>>
>>> 9. "MPIShim Location", this is a triky one, for some reason some times
>>> VS
>>> needs the full path for this VS tool. It is located at: "C:\Program
>>> Files\Microsoft Visual Studio 9.0\Common7\IDE\Remote
>>> Debugger\x64\mpishim.exe" or "C:\Program Files\Microsoft Visual Studio
>>> 9.0\Common7\IDE\Remote Debugger\x86\mpishim.exe" (use quotes at to
>>> enclose
>>> the path).
>>>
>>> I haven't played with the other options.
>>>
>>> 10. Close the dialog box.
>>>
>>> 11. Set some breakpoints in your program.
>>>
>>> 12. Ready to run.
>>>
>>> These instructions only work to debug MPI processes on the localhost,
>>> andcommand
>>> I only have tested VS+MPI using MPICH2 for Windows.
>>>
>>> To debug on several nodes you should install the Microsoft HPC SDK (I
>>> haven't used it).
>>>
>>> Good luck.
>>>
>>> -Miguel
>>>
>>> PS. I use Visual Studio 2008 professional. Also, I know that MPI
>>> debugging
>>> is not available in VS Express editions.
>>>
>>>
 Does anyone on this list have experience using MS Visual Studio for MPI
 development? I'm supporting a Windows user who has been doing Fortran
 programming on Windows using an ANCIENT version of Digital Visual
 Fortran (I know, I know - using "ancient" and "Digital" in the same
 sentence is redundant.)

 Well, we are upgrading his equally ancient laptopa new one with Windows
 7, so we installed Intel Visual Fortran (direct descendent of DVF) and
 Visual Studio 2010, and to be honest, I feel like a fish out of water
 using VS 2010. It took me a longer than I care to admit to figure out
 how to specify the include and linker paths.

 Right now, I'm working with the Intel MPI libraries, but plan on
 installing OpenMPI, too, once I figure out VS 2010.

 Can anyone tell me how to configure visual studio so that when you
 click
 on the little "play" icon to build/run the code, it will call mpiexec
 automatically? Right now, it compiles fine, but throws errors when the
 program executes because it doesn't have the right environment setup
 because it's not being executed by mpiexec. It runs fine when I execute
 it with mpiexec or wmpiexec.

 --
 Prentice


>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-27 Thread Prentice Bisbal

On 09/27/2011 07:50 AM, Jeff Squyres wrote:
> On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote:
> 
>>  We would like to know if the ethernet interfaces play any role in the 
>> startup phase of an opempi job using InfiniBand
>> In this case, where we can found some literature on this topic?
> 
> Unfortunately, there's not a lot of docs about this other than people asking 
> questions on this list.
> 
> IP is used by default during Open MPI startup.  Specifically, it is used as 
> our "out of band" communication channel for things like stdin/stdout/stderr 
> redirection, launch command relaying, process control, etc.  The OOB channel 
> is also used by default for bootstrapping IB queue pairs.

To clarify, is IP/Ethernet required, or will IPoIB be used if it's
configured on the nodes? Would this make a difference.

Just curious,
Prentice


Re: [OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-28 Thread Prentice Bisbal
On 09/27/2011 05:30 PM, Jeff Squyres wrote:
> On Sep 27, 2011, at 5:03 PM, Prentice Bisbal wrote:
> 
>> To clarify, is IP/Ethernet required, or will IPoIB be used if it's
>> configured on the nodes? Would this make a difference.
> 
> IPoIB is fine, although I've heard concerns about its stability at scale.
> 
> The difference that it'll make is that it's generally faster than ethernet.  
> It never runs at wire IB speed because of the overheads involved, but it's 
> likely to be much faster than 1GB ethernet, for example.
> 
> You can specify which interfaces Open MPI's OOB channel uses with the 
> oob_tcp_if_include MCA parameter.  For example:
> 
>mpirun --mca oob_tcp_if_include ib0 ...
> 

Jeff,

Thanks for the clarification. I was just checking. Earlier in this
thread you specifically said "ethernet". I suspected you meant "IP", and
just wanted to be sure.


Re: [OMPI users] wiki and "man mpirun" odds, and a question

2011-11-10 Thread Prentice Bisbal
Paul,

I'm sure this isn't the response you want to hear, but I'll suggest it
anyway:

Queuing systems can forward the submitters environment if desired. For
example, in SGE, the -V switch forwards all the environment variables to
the job's environment, so if there's one you can use to launch your job,
you might want to check it's documentation.

--
Prentice 

On 11/10/2011 08:01 AM, Ralph Castain wrote:
> I'm not sure where the FAQ got its information, but it has always been one 
> param per -x option.
>
> I'm afraid there isn't any envar to support the setting of multiple -x 
> options. We didn't expect someone to forward very many, if any, so we didn't 
> create that capability. It wouldn't be too hard to convert it to an mca 
> param, though, so you could add such options to your mca param file, if that 
> would help.
>
>
> On Nov 10, 2011, at 4:02 AM, Paul Kapinos wrote:
>
>> Hi folks,
>> I.  looked for ways to tell to "mpiexec" to forward some environment 
>> variables, I saw a mismatch:
>>
>> ---
>> http://www.open-mpi.org/faq/?category=running#mpirun-options
>> ...
>> --x : A comma-delimited list of environment variables to 
>> export to the parallel application.
>> ---
>> (Open MPI/1.5.3)
>> $ man mpirun
>>   -x 
>>  Export  the  specified environment variables to the remote 
>> nodes before executing the program.  Only one environment variable can
>>^^^
>> be  specified per -x option.
>> ---
>>
>> So, either the info is outdated somewhre, or -x and --x have different 
>> meaning - but then there is a lack of info, too :o)
>>
>> Maybe you could update the Wiki and/or the man page?
>>
>> II. Now the question. Defaultly no non-OpenMPI environmet variables are 
>> exported to the parallel application, AFAIK.
>>
>> With -x option of mpiexec it is possible to export one (or a list of, see 
>> below) environment variable. But, it's a bit tedious to type a [long] list 
>> of variables.
>>
>> Is there someone envvar, by setting which to a list of names of other 
>> envvars the same effect could be achieved as by setting -x on command line 
>> of mpirun?
>>
>> Best wishes
>> Paul Kapinos
>>
>>
>> -- 
>> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>> RWTH Aachen University, Center for Computing and Communication
>> Seffenter Weg 23,  D 52074  Aachen (Germany)
>> Tel: +49 241/80-24915
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Prentice Bisbal

On 12/14/2011 12:21 PM, Micah Sklut wrote:
> Hi Gustav,
>
> I did read Price's email:
>
> When I do "which mpif90", i get:
> /opt/openmpi/intel/bin/mpif90
> which is the desired directory/binary
>
> As I mentioned, the config log file indicated it was using ifort, and
> had no mention of gfortran.
> Below is the output from ompi_info. It shows reference to the correct
> ifort compiler. But, yet the mpif90 compiler, still yeilds a gfortran
> compiler.

Micah,

You are confusing the compilers users to build Open MPI  itself with the
compilers used by Open MPI to compile other codes with the proper build
environment.

For example, your configure command,

./configure --prefix=/opt/openmpi/intel CC=gcc CXX=g++ F77=ifort FC=ifort

Doesn't tell Open MPI to use ifcort for mpif90 and mpif77. It tell the
build process to use ifort to compile the Fortran sections of the Open
MPI source code. To tell mpif90 and mpif77 which compilers you'd like to
use to compile Fortran programs that use Open MPI, you must set the
environment variables OMPI_F77 and OMPI_F90. To illustrate, when I want
to use the gnu compilers, I set the following in my .bashrc:

export OMPI_CC=gcc
export OMPI_CXX=g++
export OMPI_F77=gfortran
export OMPI_FC=gfortran

If I wanted to use Intel compilers, swap the above 4 lines for this:

export OMPI_CC=pgcc
export OMPI_CXX=pgCC
export OMPI_F77=pgf77
export OMPI_FC=pgf95

You can verify which compiler is set using the --showme switch to mpif90:

$ mpif90 --showme
pgf95 -I/usr/local/openmpi-1.2.8/pgi-8.0/x86_64/include
-I/usr/local/openmpi-1.2.8/pgi-8.0/x86_64/lib -L/usr/lib64
-L/usr/local/openmpi-1.2.8/pgi/x86_64/lib
-L/usr/local/openmpi-1.2.8/pgi-8.0/x86_64/lib -lmpi_f90 -lmpi_f77 -lmpi
-lopen-rte -lopen-pal -libverbs -lrt -lnuma -ldl -Wl,--export-dynamic
-lnsl -lutil -lpthread -ldl

I suspect if you run the command ' env | grep OMPI_FC', you'll see that
you have it set to gfortran. I can verify that mine is set to pgf97 this
way:

$ env | grep OMPI_FC
OMPI_FC=pgf95

Of course, a simple echo would work, too:

$ echo $OMPI_FC
pgf95

You can also change these setting by editing the file
mpif90-wrapper-data.txt in your Open MPI installation directory.

Full details on setting these variables (and others) can be found in the
FAQ:

http://www.open-mpi.org/faq/?category=mpi-apps#override-wrappers-after-v1.0

--
Prentice



> -->
> barells@ip-10-17-153-123:~> ompi_info
>  Package: Open MPI barells@ip-10-17-148-204 Distribution
> Open MPI: 1.4.4
>Open MPI SVN revision: r25188
>Open MPI release date: Sep 27, 2011
> Open RTE: 1.4.4
>Open RTE SVN revision: r25188
>Open RTE release date: Sep 27, 2011
> OPAL: 1.4.4
>OPAL SVN revision: r25188
>OPAL release date: Sep 27, 2011
> Ident string: 1.4.4
>   Prefix: /usr/lib64/mpi/gcc/openmpi
>  Configured architecture: x86_64-unknown-linux-gnu
>   Configure host: ip-10-17-148-204
>Configured by: barells
>Configured on: Wed Dec 14 14:22:43 UTC 2011
>   Configure host: ip-10-17-148-204
> Built by: barells
> Built on: Wed Dec 14 14:27:56 UTC 2011
>   Built host: ip-10-17-148-204
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: ifort
>   Fortran77 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
>   Fortran90 compiler: ifort
>   Fortran90 compiler abs: /opt/intel/fce/9.1.040/bin/ifort
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>Sparse Groups: no
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: no
>  mpirun default --prefix: no
>  MPI I/O support: yes
>MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
>FT Checkpoint support: no  (checkpoint thread: no)
>MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2)
>   MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2)
>MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
>MCA carto: auto_detect (MCA v2.0, API v2.0, Component
> v1.4.2)
>MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2)
>MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
>MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2)
>  MCA installdirs: env (MCA v2.0, API v2.0, Com

Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Prentice Bisbal

On 12/14/2011 01:20 PM, Fernanda Oliveira wrote:
> Hi Micah,
>
> I do not know if it is exactly what you need but I know that there are
> environment variables to use with intel mpi. They are: I_MPI_CC,
> I_MPI_CXX, I_MPI_F77, I_MPI_F90. So, you can set this using 'export'
> for bash, for instance or directly when you run.
>
> I use in my bashrc:
>
> export I_MPI_CC=icc
> export I_MPI_CXX=icpc
> export I_MPI_F77=ifort
> export I_MPI_F90=ifort

Those environment variables are for Intel MPI.  For OpenMPI, the
equivalent variables would be OMPI_CC, OMPI_CXX, OMPI_F77, and OMPI_FC,
respectively.

--
Prentice


Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Prentice Bisbal
On 12/14/2011 03:29 PM, Micah Sklut wrote:
> Okay thanks Prentice.
>
> I understand what you are saying about specifying the compilers during
> configure.
> Perhaps, that alone would have solved the problem, but removing the
> 1.4.2 ompi installation worked as well.
>
> Micah
>

Well, to clarify my earlier statement, those compilers used during
installation are used to set the defaults in the wrapper files
(mpif90-wrapper--data.txt, etc.), but those
can easily be changed, either by editing those files, or by defining
environment variables.

Anywhow, we're all glad you were finally able to solve your problem.

--
Prentice




Re: [OMPI users] openmpi - gfortran and ifort conflict

2011-12-14 Thread Prentice Bisbal

On 12/14/2011 03:39 PM, Jeff Squyres wrote:
> On Dec 14, 2011, at 3:21 PM, Prentice Bisbal wrote:
>
>> For example, your configure command,
>>
>> ./configure --prefix=/opt/openmpi/intel CC=gcc CXX=g++ F77=ifort FC=ifort
>>
>> Doesn't tell Open MPI to use ifcort for mpif90 and mpif77.
> Actually, that's not correct.
>
> For Open MPI, our wrapper compilers will default to using the same compilers 
> that were used to build Open MPI.  So in the above case:
>
> mpicc will use gcc
> mpicxx will use g++
> mpif77 will use ifort
> mpif90 will use ifort
>
>

Jeff,

I realized this after I wrote that and clarified it in a subsequent
e-mail. Which you probably just read. ;-)

Prentice


Re: [OMPI users] Installation of openmpi-1.4.4

2011-12-21 Thread Prentice Bisbal
Is the path to your opempi libraries in your LD_LIBRARY_PATH?

--
Prentice


On 12/21/2011 01:56 PM, amosl...@gmail.com wrote:
> Dear OMPI Users,
>   I have just read the messages from Martin Rushton and Jeff
> Squyres and have been having the same problem trying to get
> openmp-1.4.4 to work.  My specs are below:
>Xeon(R) CPU 5335 2.00 GHz
>Linux  SUSE 11.4 (x86_64)
>Linux 2.6.371-1.2 desktop x86_64
> I go through the compilation process with the commands:
>   ./configure --prefix=/opt/openmpi CC=icc
> CXX=icpc F77=ifort F90=ifort "FCFLAGS=-O3 -i8" "FFLAGS=-O3 -i8" 2>&1 |
> tee config.out
>make -j 4 all 2>&1 | tee make.out
>make install 2>&1 | tee install.out.
> The entire process seems to go properly but when I try to use an
> example it doesn't work properly.
>mpicc hello_c.c -o hello_c
> compiles properly.  However,
>"./hello_c" gives an error message that it
> cannot find the file libmpi_so.0.There are at least 3 copies of
> the file present as found by the search command but none of these are
> found.  I have checked the permissions and they seem to be OK so I am
> at the same point as Martin Rushton.  I hope that somebody comes up
> with an anser soon.
>   
>
> Amos Leffler
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] ompi + bash + GE + modules

2012-01-13 Thread Prentice Bisbal

On 01/12/2012 08:40 AM, Dave Love wrote:
> Surely this should be on the gridengine list -- and it's in recent
> archives -- but there's some ob-openmpi below.  Can Notre Dame not get
> the support they've paid Univa for?

This is, in fact, in the recent gridengine archives. I brought up this
problem myself within the past couple of months ago.

> Reuti  writes:
>
>> SGE 6.2u5 can't handle multi line environment variables or functions,
>> it was fixed in 6.2u6 which isn't free.
> [It's not listed for 6.2u6.]  For what it's worth, my fix for Sun's fix
> is https://arc.liv.ac.uk/trac/SGE/changeset/3556/sge.
>
>> Do you use -V while submitting the job? Just ignore the error or look
>> into Son of Gridengine which fixed it too.
> Of course
> you can always avoid the issue by not using `export -f', which isn't in
> the modules version we have.  I default -V in sge_request and load
> the open-mpi module in the job submission session.  I don't
> fin whatever problems it causes, and it works for binaries like
>   qsub -b y ... mpirun ...
> However, the folkloristic examples here typically load the module stuff
> in the job script.
>
>> If you can avoid -V, then it could be defined in any of the .profile
>> or alike if you use -l as suggested.  You could even define a
>> started_method in SGE to define it for all users by default and avoid
>> to use -V:
>>
>> #!/bin/sh
>> module() { ...command...here... }
>> export -f module
>> exec "${@}"
> That won't work for example if someone is tasteless enough to submit csh.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] OpenMPI: How many connections?

2012-01-27 Thread Prentice Bisbal
I would like to nominate the quote below for the best explanation of how
a piece of software works  that I've ever read.

Kudos, Jeff.

On 01/26/2012 04:38 PM, Jeff Squyres wrote:
> You send a message, a miracle occurs, and the message is received on the 
> other side. 

--
Prentice


Re: [OMPI users] [Open MPI Announce] Open MPI v1.4.5 released

2012-02-16 Thread Prentice Bisbal

On 02/15/2012 07:44 AM, Reuti wrote:
> Hi,
>
> Am 15.02.2012 um 03:48 schrieb alexalex43210:
>
>>   But I am a novice for the parallel computation, I often use Fortran to 
>> compile my program, now I want to use the Parallel, can you give me some 
>> help how to begin?
>>   PS: I learned about OPEN MPI is the choice for my question solution. am I 
>> right?
> This depends on your application and how easy it can be adopted to split the 
> problem into smaller parts. It could also be the case, that you want to stay 
> on one node only to use additional cores and could parallelize it better by 
> using OpenMP, where all threads operate on the same memory area on a single 
> node.
>
> http://openmp.org/wp/
>
> It's built into many compilers by default nowadays.
>
> In addition to the online courses Jeff mentioned there are several books 
> available like Parallel Programming with MPI by Peter Pacheco (although it 
> covers only MPI-1 due to its age http://www.cs.usfca.edu/~peter/ppmpi/), 
> Parallel Programming in C with MPI and OpenMP by Michael Quinn.
>

Personally, I didn't like Peter Pacheco's book all that much, so I'd
like to add a couple more books to this list of recommendations:

Using MPI: Portable Parallel Programming with the Message-Passing
Interface, 2nd Edition
William Gropp, Ewing Lusk, and Anthony Skjellum
Copyright 1997, MIT Press
http://www.mcs.anl.gov/research/projects/mpi/usingmpi/

Using MPI-2: Advanced Features of the Message-Passing Interface
William Gropp, Ewing Lusk, Rajeev Thakur
Copyright 1997, MIT Press
http://www.mcs.anl.gov/research/projects/mpi/usingmpi2/index.html

The second book covers the more advanced features of MPI-2.  As a n00b
just learning MPI, you probably don't need to learn that stuff until
you've mastered the material in the first book I listed,  or Pacheco's
book. 

--
Prentice






Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1

2012-03-02 Thread Prentice Bisbal

On 02/29/2012 03:15 PM, Jeffrey Squyres wrote:
> On Feb 29, 2012, at 2:57 PM, Jingcha Joba wrote:
>
>> So if I understand correctly, if a message size is smaller than it will use 
>> the MPI way (non-RDMA, 2 way communication), if its larger, then it would 
>> use the Open Fabrics, by using the ibverbs (and ofed stack) instead of using 
>> the MPI's stack?
> Er... no.
>
> So let's talk MPI-over-OpenFabrics-verbs specifically.
>
> All MPI communication calls will use verbs under the covers.  They may use 
> verbs send/receive semantics in some cases, and RDMA semantics in other 
> cases.  "It depends" -- on a lot of things, actually.  It's hard to come up 
> with a good rule of thumb for when it uses one or the other; this is one of 
> the reasons that the openib BTL code is so complex.  :-)
>
> The main points here are:
>
> 1. you can trust the openib BTL to do the Best thing possible to get the 
> message to the other side.  Regardless of whether that message is an MPI_SEND 
> or an MPI_PUT (for example).
>
> 2. MPI_PUT does not necessarily == verbs RDMA write (and likewise, MPI_GET 
> does not necessarily == verbs RDMA read).
>
>> If so, could that be the reason why the MPI_Put "hangs" when sending a 
>> message more than 512KB (or may be 1MB)?
> No.  I'm guessing that there's some kind of bug in the MPI_PUT implementation.
>
>> Also is there a way to know if for a particular MPI call, OF uses send/recv 
>> or RDMA exchange?
> Not really.
>
> More specifically: all things being equal, you don't care which is used.  You 
> just want your message to get to the receiver/target as fast as possible.  
> One of the main ideas of MPI is to hide those kinds of details from the user. 
>  I.e., you call MPI_SEND.  A miracle occurs.  The message is received on the 
> other side.
>
> :-)
>

Nice use of the "A Miracle Occurs" meme. We really need t-shirts that
say this for the OpenMPI BoF at SC12.

--
Prentice


Re: [OMPI users] ssh between nodes

2012-03-02 Thread Prentice Bisbal

On 02/29/2012 04:51 PM, Martin Siegert wrote:
> Hi,
>
> On Wed, Feb 29, 2012 at 09:09:27PM +, Denver Smith wrote:
>>Hello,
>>On my cluster running moab and torque, I cannot ssh without a password
>>between compute nodes. I can however request multiple node jobs fine. I
>>was wondering if passwordless ssh keys need to be set up between
>>compute nodes in order for mpi applications to run correctly.
>>Thanks
> No. passwordless ssh keys are not needed. In fact, I strong advise
> against using those (teaching users how to generate passwordless
> ssh keys creates security problems: they start using those not just
> for connecting to compute nodes). There are several alternatives:
>
> 1) use openmpi's hooks into torque (use the --with-tm configure option);
> 2) use ssh hostbased authentication (and set IgnoreUserKnownHosts to yes);
> 3) use rsh (works if your cluster is sufficiently small).

What has been said for Torque also holds true for SGE - if you compile
Open MPI with the  --with-sge switch, passwordless SSH is not needed
since Open MPI will work directly with SGE .

And as much as I agree passwordless SSH keys are not desirable, they can
be difficult to avoid., especially if you use commercial software on
your cluster. MATLAB, for example requires passwordless SSH between
cluster nodes in order to work.

--
Prentice.


Re: [OMPI users] Simple question on GRID

2012-03-02 Thread Prentice Bisbal
On 03/01/2012 12:10 AM, Shaandar Nyamtulga wrote:
> Hi
> I have two Beowulf clusters (both Ubuntu 10.10, one is OpenMPI, one is
> MPICH2).
> They run separately in their local network environment.I know there is
> a way to integrate them through Internet, presumably by Grid software,
> I guess. Is there any tutorial to do this?
>  
>

This question is a little off-topic for this list, since this list is
for Open MPI-specific questions (and some general MPI questions). You
should really ask this question on the Beowulf mailing list, which
covers any and all topics related to HPC clustering. See www.beowulf.org
for more information.


Also, you need to be more specifc as to what you really want to do
"integrate" is a vague, overused term. Do you want the scheduler at one
site to be able to manage jobs on the cluster at the other site with no
message-passing traffic between sites? That might be possible.

Or, do you want the two remote clusters to send message-passing traffic
back-and-forth over the internet and behave as a single cluster? That
might be possible, too, but due to the latency and reduced bandwidth of
sending those messages  over the internet,  the performance would be so
poor as to probably not be worth it.

--
Prentice


Re: [OMPI users] redirecting output

2012-04-02 Thread Prentice Bisbal
On 03/30/2012 11:12 AM, Tim Prince wrote:
>  On 03/30/2012 10:41 AM, tyler.bal...@huskers.unl.edu wrote:
>>
>>
>> I am using the command mpirun -np nprocs -machinefile machines.arch
>> Pcrystal and my output strolls across my terminal I would like to
>> send this output to a file and I cannot figure out how to do soI
>> have tried the general > FILENAME and > log & these generate
>> files however they are empty.any help would be appreciated.

If you see the output on your screen, but it's not being redirected to a
file, it must be printing to STDERR and not STDOUT. The '>' by itself
redirects STDOUT only, so it doesn't redirect error messages. To
redirect STDERR, you can use '2>', which says redirect filehandle # 2,
which is stderr.

some_command 2> myerror.log

or

some_command >myoutput.log 2>myerror.log

 To redirect both STDOUT and STDERR to the same place, use the syntax
"2>&1" to tie STDERR to STDOUT:

some_command > myoutput.log 2>&1

I prefer to see the ouput on the screen at the same time I write it to a
file. That way, if the command hangs for some reason, I know it
immediately. I find the 'tee' command priceless for this:

some_command 2>&1 | tee myoutput.log

Google for 'bash output redirection' and you'll find many helpful pages
with better explanation and examples, like this one:

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html

If you don't you bash, those results will be much less helpful.

I hope that helps, or at least gets you pointed in the right direction.

--
Prentice

>
> If you run under screen your terminal output should be collected in
> screenlog.  Beats me why some sysadmins don't see fit to install screen.
>





Re: [OMPI users] regarding the problem occurred while running anmpi programs

2012-04-26 Thread Prentice Bisbal
Actually, he should leave the ":$LD_LIBRARY_PATH" on the end. That way
if LD_LIBRARY_PATH is already defined, the Open MPI directory is just
prepended to LD_LIBRARY_PATH. Omitting ":$LD_LIBRARY_PATH" from his
command could cause other needed elements of LD_LIBRARY_PATH to be lost,
causing other runtime errors.

--
Prentice



On 04/25/2012 11:48 AM, tyler.bal...@huskers.unl.edu wrote:
> export LD_LIBRARY_PATH= [location of library] leave out
> the :$LD_LIBRARY_PATH 
> 
> *From:* users-boun...@open-mpi.org [users-boun...@open-mpi.org] on
> behalf of seshendra seshu [seshu...@gmail.com]
> *Sent:* Wednesday, April 25, 2012 10:43 AM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] regarding the problem occurred while
> running anmpi programs
>
> Hi
> I have exported the library files as below
>
> [master@ip-10-80-106-70 ~]$ export
> LD_LIBRARY_PATH=/usr/local/openmpi-1.4.5/lib:$LD_LIBRARY_PATH 
>   
> [master@ip-10-80-106-70 ~]$ mpirun --prefix /usr/local/openmpi-1.4.5
> -n 1 --hostfile hostfile out
> out: error while loading shared libraries: libmpi_cxx.so.0: cannot
> open shared object file: No such file or directory
> [master@ip-10-80-106-70 ~]$ mpirun --prefix /usr/local/lib/ -n 1
> --hostfile hostfile
> out   
> 
> out: error while loading shared libraries: libmpi_cxx.so.0: cannot
> open shared object file: No such file or directory
>
> But still iam getting the same error.
>
>
>
>
>
> On Wed, Apr 25, 2012 at 5:36 PM, Jeff Squyres (jsquyres)
> mailto:jsquy...@cisco.com>> wrote:
>
> See the FAQ item I cited. 
>
> Sent from my phone. No type good. 
>
> On Apr 25, 2012, at 11:24 AM, "seshendra seshu"
> mailto:seshu...@gmail.com>> wrote:
>
>> Hi
>> now i have created an used and tried to run the program but i got
>> the following error
>>
>> [master@ip-10-80-106-70 ~]$ mpirun -n 1 --hostfile hostfile
>> out  
>>   
>> out: error while loading shared libraries: libmpi_cxx.so.0:
>> cannot open shared object file: No such file or directory
>>
>>
>> thanking you
>>
>>
>>
>> On Wed, Apr 25, 2012 at 5:12 PM, Jeff Squyres > > wrote:
>>
>> On Apr 25, 2012, at 11:06 AM, seshendra seshu wrote:
>>
>> > so should i need to create an user and run the mpi program.
>> or how can i run in cluster
>>
>> It is a "best practice" to not run real applications as root
>> (e.g., MPI applications).  Create a non-privlidged user to
>> run your applications.
>>
>> Then be sure to set your LD_LIBRARY_PATH if you installed
>> Open MPI into a non-system-default location.  See this FAQ item:
>>
>>  
>>  http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com 
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> -- 
>>  WITH REGARDS
>> M.L.N.Seshendra
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> -- 
>  WITH REGARDS
> M.L.N.Seshendra
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Building openmpi from src rpm: rpmbuild --rebuild errors with 'cpio: MD5 sum mismatch' (since openmpi 1.4.5)

2012-06-06 Thread Prentice Bisbal
On 05/31/2012 02:04 AM, livelfs wrote:
> Hi
> Since 1.4.5 openmpi release, it is no longer possible to build openmpi
> binary with rpmbuild --rebuild if system rpm package version is 4.4.x,
> like in SLES10, SLES11, RHEL/CentOS 5.x.
>
> For instance, on CentOS 5.8 x86_64 with rpm 4.4.2.3-28.el5_8:
>
> [root@horizon _tmp]# rpmbuild --rebuild openmpi-1.4.5-1.src.rpm
> Installing openmpi-1.4.5-1.src.rpm
> warning: user jsquyres does not exist - using root
> error: unpacking of archive failed on file
> /usr/src/redhat/SPECS/openmpi-1.4.5.spec;4fc65c74: cpio: MD5 sum mismatch
> error: openmpi-1.4.5-1.src.rpm cannot be installed
>
> Apparently this problem is due to lack of support of SHA-256 in rpm 4.4.x
>
> Googling suggests
>   rpmbuild -bs \
>--define "_source_filedigest_algorithm md5" \
>--define "_binary_filedigest_algorithm md5" \
>package.spec
> should be used to produce openmpi src rpms and avoid the problem.
>
> Please note that
> - rpmbuild works OK on RHEL/CentOS 5.x with openmpi-1.4.4-1.src.rpm
> and all previous versions
> - rpmbuild works OK on with all openmpi versions with rpm 4.8.x from
> RHEL/CentOS 6.x
> - this is of course not blocking, since I successfully tested 2
> workarounds
> 1) install package with --nomd5, then rpmbuild -ba 
> 2) repackage with "old" rpm:
> rpm2cpio to extract spec file + sources tar
> rpmbuild -bs  to produce new src rpm
> Then rpmbuild --rebuild is OK
>
>

This is a known "problem" with RHEL 6 that burned me, too. I say
"problem" in quotes because in my case, it only appeared when I tried to
install RPMS built for RHEL 5 on a RHEL 6 system. That's a problem to
me, but some purists don't see this is a problem and just say "Well,
that's what you get for trying to install RHEL 5 RPMs on a RHEL 6
system. I don't agree with them.

As a work around, i think I did some magic with rpm2cpio, as documented
above, but I don't remember the details.

--
Prentice


Re: [OMPI users] Building openmpi from src rpm: rpmbuild --rebuild errors with 'cpio: MD5 sum mismatch' (since openmpi 1.4.5)

2012-06-06 Thread Prentice Bisbal
On 05/31/2012 07:26 AM, Jeff Squyres wrote:
> On May 31, 2012, at 2:04 AM, livelfs wrote:
>
>> Since 1.4.5 openmpi release, it is no longer possible to build openmpi 
>> binary with rpmbuild --rebuild if system rpm package version is 4.4.x, like 
>> in SLES10, SLES11, RHEL/CentOS 5.x.
>>
>> For instance, on CentOS 5.8 x86_64 with rpm 4.4.2.3-28.el5_8:
>>
>> [root@horizon _tmp]# rpmbuild --rebuild openmpi-1.4.5-1.src.rpm
>> Installing openmpi-1.4.5-1.src.rpm
>> warning: user jsquyres does not exist - using root
>> error: unpacking of archive failed on file 
>> /usr/src/redhat/SPECS/openmpi-1.4.5.spec;4fc65c74: cpio: MD5 sum mismatch
>> error: openmpi-1.4.5-1.src.rpm cannot be installed
>>
>> Apparently this problem is due to lack of support of SHA-256 in rpm 4.4.x
> Mmmm.  I wonder if this corresponds to me upgrading my cluster (where I make 
> the SRPM) from RHEL5 to RHEL6.  I'll bet it does.  :-\
>
> Just curious -- do you know if there's a way I can make an RHEL5-friendly 
> SRPM on my RHEL6 cluster?  I seem to have RPM 4.8.0 on my RHEL6 machines.
>
> Or, better yet, perhaps I should be producing the SRPM on the official OMPI 
> build machine (i.e., where we make our tarballs), which is still back at 
> RHEL4.  I'm not quite sure how it evolved that we make tarballs in tightly 
> controlled conditions, but the SRPM is just made by hand on my cluster (which 
> is subject to upgrades, etc.).  Hrm. :-\
>

Building on RHEL 4 shouldn't have any impact. If anything, it would make
things worse instead of better, but I think that's unlikely. This
problem has to do with changes in RPM itself from RHEL5 to RHEL 6.
Ideally, you should be using Mock to build your RPMs, and build a
separate set of RPMs for RHEL 3,4,5,6,... It's a PITA, I know, but it's
really the best way to build RPMs without any dependency gotchas.

--
Prentice



Re: [OMPI users] Infiniband requirements

2009-06-30 Thread Prentice Bisbal
Gus Correa wrote:
> Hi Jim, list
> 
> 1) Your first question:
> 
> I opened a thread on this list two months or so ago about a similar
> situation: when OpenMPI would use/not use libnuma.
> I asked a question very similar to your question about IB support,
> and how the configure script would provide it or not.
> Jeff answerer it, and I asked him to post the answer in the FAQ,
> which he kindly did (or an edited version of it):
> 
> http://www.open-mpi.org/faq/?category=building#default-build
> 
> The wisdom is that OpenMPI will search for IB on standard places,
> and will use it if it finds it.
> If you don't have IB on a standard place, then you can use the
> switch --with-openib=/dir to force IB to be part of your OpenMPI.
> If I understood it right, the bottom line is that you
> only don't get IB if it is hidden, or doesn't exist.
> 

I've found that on 64-bit RHEL systems, many configure scripts don't
consider /lib64 and /usr/lib64 "standard" locations to look for
libraries, so I often have to do something like

./configure --with-pkg=/usr/lib64

or ./configure --with-pkg-lib=/usr/lib64

depending on the package's configure script. I just checked my notes
form compiling OMPI 1.2.8 and 1.3, and all I needed was "--with-openib"
for my distro (a rebuild of RHEL 5.3), but you never know - you may need
just such a trick.

--
Prentice


Re: [OMPI users] Network connection check

2009-07-23 Thread Prentice Bisbal

Jeff Squyres wrote:
> On Jul 22, 2009, at 10:05 AM, vipin kumar wrote:
> 
>> Actually requirement is how a C/C++ program running in "master" node
>> should find out whether "slave" node is reachable (as we check this
>> using "ping" command) or not ? Because IP address may change at any
>> time, that's why I am trying to achieve this using "host name" of the
>> "slave" node. How this can be done?
> 
> 
> Are you asking to find out this information before issuing "mpirun"? 
> Open MPI does assume that the nodes you are trying to use are reachable.
> 


How about you start your MPI program from a shell script that does the
following:

1. Reads a text file containing the names of all the possible candidates
 for MPI nodes

2. Loops through the list of names from (1) and pings each machine to
see if it's alive. If the host is pingable, then write it's name to a
different text file which will be host as the machine file for the
mpirun command

3. Call mpirun using the machine file generated in (2).

--
Prentice


[OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Prentice Bisbal
Several jobs on my cluster just died with the error below.

Are there any IB/Open MPI diagnostics I should use to diagnose, should I
just reboot the nodes, or should I have the user who submitted these
jobs just increase the retry count/timeout paramters?


[0,1,6][../../../../../ompi/mca/btl/openib/btl_openib_component.c:1375:btl_openib_component_progress]
from node14.aurora to: node40.aurora error polling HP CQ with status
RETRY EXCEEDED ERROR status number 12 for wr_id 13606831800 opcode 9
--
The InfiniBand retry count between two MPI processes has been
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):

The total number of times that the sender wishes the receiver to
retry timeout, packet sequence, etc. errors before posting a
completion error.

This error typically means that there is something awry within the
InfiniBand fabric itself. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.

Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:

* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).

* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 10). The actual timeout value used is calculated as:

4.096 microseconds * (2^btl_openib_ib_timeout)

See the InfiniBand spec 1.2 (section 12.7.34) for more details.

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


[OMPI users] MPI_Comm_dup hangs

2010-02-03 Thread Prentice Bisbal
I have a problem with MPI_Comm_dup. When I call it in a function, it
causes my application to hang. Are they any common causes for a problem
like this? I'm using OpenMPI 1.2.8. Are there any known bugs that could
be causing this?

My program seems to hang when it gets to MPI_Comm_dup. Here's an example
of how I'm using it.

#include "foo.h"

/* MPI variables */
int my_rank;
int num_proc;
MPI_Comm euclid_comm;

void foo(long *arg1, long *arg2, MPI_Comm old_comm);

int main (int argc, char** argv) {

  /* get options */

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &num_proc);

 /* read in some data, yadda, yadda yadda */

 foo(&arg1, &arg2, MPI_COMM_WORLD);

 / *print results from foo */

 MPI_Finalize();

}

void foo(long *arg1, long *arg2, MPI_Comm old_comm)
{
  MPI_Comm foo_comm;
  int foo_my_rank;
  int foo_num_proc;

  MPI_Comm_dup(old_comm, &foo_comm);
  MPI_Comm_rank(foo_comm, &foo_my_rank);
  MPI_Comm_size(foo_comm, &foo_num_proc);

  /* do stuff */

}


-- 
Prentice


Re: [OMPI users] MPI_Comm_dup hangs

2010-02-04 Thread Prentice Bisbal
Nevermind... I figured this one out on my own. I was calling foo() from
inside an if (rank == 0) block. Since MPI_Comm_dup is a collective
operation, it was waiting for all the other nodes to also call
MPI_Comm_dup. Oops.

--
Prentice

Prentice Bisbal wrote:
> I have a problem with MPI_Comm_dup. When I call it in a function, it
> causes my application to hang. Are they any common causes for a problem
> like this? I'm using OpenMPI 1.2.8. Are there any known bugs that could
> be causing this?
> 
> My program seems to hang when it gets to MPI_Comm_dup. Here's an example
> of how I'm using it.
> 
> #include "foo.h"
> 
> /* MPI variables */
> int my_rank;
> int num_proc;
> MPI_Comm euclid_comm;
> 
> void foo(long *arg1, long *arg2, MPI_Comm old_comm);
> 
> int main (int argc, char** argv) {
> 
>   /* get options */
> 
>   MPI_Init(&argc, &argv);
>   MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
>   MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
> 
>  /* read in some data, yadda, yadda yadda */
> 
>  foo(&arg1, &arg2, MPI_COMM_WORLD);
> 
>  / *print results from foo */
> 
>  MPI_Finalize();
> 
> }
> 
> void foo(long *arg1, long *arg2, MPI_Comm old_comm)
> {
>   MPI_Comm foo_comm;
>   int foo_my_rank;
>   int foo_num_proc;
> 
>   MPI_Comm_dup(old_comm, &foo_comm);
>   MPI_Comm_rank(foo_comm, &foo_my_rank);
>   MPI_Comm_size(foo_comm, &foo_num_proc);
> 
>   /* do stuff */
> 
> }
> 
> 


[OMPI users] Difficulty with MPI_Unpack

2010-02-07 Thread Prentice Bisbal
Hello, everyone. I'm having trouble packing/unpacking this structure:

typedef struct{
  int index;
  int* coords;
}point;

The size of the coords array is not known a priori, so it needs to be a
dynamic array. I'm trying to send it from one node to another using
MPI_Pack/MPI_Unpack as shown below. When I unpack it, I get this error
when unpacking the coords array:

[fatboy:07360] *** Process received signal ***
[fatboy:07360] Signal: Segmentation fault (11)
[fatboy:07360] Signal code: Address not mapped (1)
[fatboy:07360] Failing at address: (nil)

Any idea what I'm doing wrong here? Any help/advice will be greatly
appreciated. I've compared my code to Pacheco's book and a few other
examples online, and everything looks okay. I'm sure I'm overlooking
something minor and trivial.

--
Prentice

if (rank == 0) {
/* assign values to a_point */

position = 0;

buffer = malloc(4 * sizeof(int));
buff_size = (4 * sizeof(int));

MPI_Pack(&a_point.index, 1, MPI_INT, buffer, buff_size, &position,
MPI_COMM_WORLD);
MPI_Pack(a_point.coords, 3, MPI_INT, buffer, buff_size, &position,
MPI_COMM_WORLD);
MPI_Send(buffer, buff_size, MPI_PACKED, 1, 0, MPI_COMM_WORLD);
}
}

if (rank == 1) {
buffer = malloc(4 * sizeof(int));
buff_size = (4 * sizeof(int));

MPI_Recv(buffer, buff_size, MPI_PACKED, 0, 0, MPI_COMM_WORLD, &status);

position = 0;
MPI_Unpack(buffer, buff_size, &position, &b_point.index, 1, MPI_INT,
MPI_COMM_WORLD);
printf("b_point.index = %i\n", b_point.index);

/*everything works up to this point! */
MPI_Unpack(buffer, buff_size, &position, b_point.coords, 3, MPI_INT,
MPI_COMM_WORLD);/*
printf("b_point.coords = (%i, %i, %i)\n", b_point.coords[0],
b_point.coords[1], b_point.coords[2]);
  }






Re: [OMPI users] Difficulty with MPI_Unpack

2010-02-08 Thread Prentice Bisbal
Jed Brown wrote:
> On Sun, 07 Feb 2010 22:40:55 -0500, Prentice Bisbal  wrote:
>> Hello, everyone. I'm having trouble packing/unpacking this structure:
>>
>> typedef struct{
>>   int index;
>>   int* coords;
>> }point;
>>
>> The size of the coords array is not known a priori, so it needs to be a
>> dynamic array. I'm trying to send it from one node to another using
>> MPI_Pack/MPI_Unpack as shown below. When I unpack it, I get this error
>> when unpacking the coords array:
>>
>> [fatboy:07360] *** Process received signal ***
>> [fatboy:07360] Signal: Segmentation fault (11)
>> [fatboy:07360] Signal code: Address not mapped (1)
>> [fatboy:07360] Failing at address: (nil)
> 
> Looks like b_point.coords = NULL.  Has this been allocated on rank=1?

Yep, that was the problem. I left that out. I can't believe I overlooked
something so obvious. Thanks for the code review. Thanks to Brian
Austin, too,  who also found that mistake.

> 
> You might need to use MPI_Get_count to decide how much to allocate.
> Also, if you don't have a convenient upper bound on the size of the
> receive buffer, you can use MPI_Probe followed by MPI_Get_count to
> determine this before calling MPI_Recv.

Thanks for the tip. I'll take a look at those functions.

-- 
Prentice


[OMPI users] Similar question about MPI_Create_type

2010-02-08 Thread Prentice Bisbal
Hello, again MPU Users:

This question is similar to my earlier one about MPI_Pack/Unpack,

I'm trying to send the following structure, which has a dynamically
allocated array in it, as a MPI derived type using MPI_Create_type_struct():

typedef struct{
   int index;
   int* coords;
}point;

I would think that this can't be done since the coords array will not be
  contiguous in memory with the rest of the structure, so calculating
the displacements between  point.index and point.coords will be
meaningless. However, I'm pretty sure that Pacheco's book implies that
this can be done (I'd list the exact page(s), but I don't have that book
handy).

Am I wrong or right?

Below my signature is a the code I'm using to test this, which fails as
I'd expect. Is my thinking right, or is my program wrong? When I run the
program I get this error:

 *** An error occurred in MPI_Address
 *** on communicator MPI_COMM_WORLD
 *** MPI_ERR_ARG: invalid argument of some other kind
 *** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 28286 on node juno.sns.ias.edu
exited on signal 15 (Terminated).

-- 
Prentice

#include 
#include 
#include 

int rank;
MPI_Status status;
int size;
int tag;

typedef struct{
  int index;
  int* coords;
}point;

int block_lengths[2];
MPI_Datatype type_list[2];
MPI_Aint displacements[2];
MPI_Aint start_address;
MPI_Aint address;
MPI_Datatype derived_point;
point a_point, b_point;

int main(int argc, char* argv[])
{
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0) {
a_point.index = 1;
a_point.coords = malloc(3 * sizeof(int));
a_point.coords[0] = 3;
a_point.coords[1] = 6;
a_point.coords[2] = 9;
  }

  block_lengths[0] = 1;
  block_lengths[1] = 3;

  type_list[0] = MPI_INT;
  type_list[1] = MPI_INT;

  displacements[0] = 0;
  MPI_Address(&a_point.index, &start_address);
  MPI_Address(a_point.coords, &address);
  displacements[1] = address - start_address;

  MPI_Type_create_struct(2, block_lengths, displacements, type_list,
&derived_point);
  MPI_Type_commit(&derived_point);

  if (rank == 0) {
MPI_Send(&a_point, 1, derived_point, 1, 0, MPI_COMM_WORLD);
  }
  if (rank == 1) {
b_point.coords = malloc(3 *sizeof(int));
MPI_Recv(&b_point, 1, derived_point, 0, 0, MPI_COMM_WORLD, &status);
printf("b_point.index = %i\n", b_point.index);
printf("b_point.coords:(%i, %i, %i)\n", b_point.coords[0],
b_point.coords[1], b_point.coords[2]);

  }
  MPI_Finalize();
  exit(0);
}




Re: [OMPI users] Similar question about MPI_Create_type

2010-02-08 Thread Prentice Bisbal
I hit send to early on my last reply, please forgive me...

Jed Brown wrote:
> On Mon, 08 Feb 2010 13:54:10 -0500, Prentice Bisbal  wrote:
>> but I don't have that book handy
> 
> The standard has lots of examples.
> 
>   http://www.mpi-forum.org/docs/docs.html

Thanks, I'll check out those examples.
> 
> You can do this, but for small structures, you're better off just
> packing buffers.  For large structures containing variable-size fields,
> I think it is clearer to use MPI_BOTTOM instead of offsets from an
> arbitrary (instance-dependent) address.

I'll give that a try, too. IMHO, MPI_Pack/Unpack looks easier and less
error prone, but Pacheco advocates using derived types over
MPI_Pack/Unpack.

> 
> [...]
> 
>>   if (rank == 0) {
>> a_point.index = 1;
>> a_point.coords = malloc(3 * sizeof(int));
>> a_point.coords[0] = 3;
>> a_point.coords[1] = 6;
>> a_point.coords[2] = 9;
>>   }
>>
>>   block_lengths[0] = 1;
>>   block_lengths[1] = 3;
>>
>>   type_list[0] = MPI_INT;
>>   type_list[1] = MPI_INT;
>>
>>   displacements[0] = 0;
>>   MPI_Address(&a_point.index, &start_address);
>>   MPI_Address(a_point.coords, &address);
> ^^
> 
> Rank 1 has not allocated this yet.

I'm glad you brought that up. I wanted to ask about that:

In my situation, rank 0 is reading in a file containing all the coords.
So even if other ranks don't have the data, I still need to create the
structure on all the nodes, even if I don't populate it with data?

Thanks for the help.

-- 
Prentice


Re: [OMPI users] Similar question about MPI_Create_type

2010-02-08 Thread Prentice Bisbal


Prentice Bisbal wrote:
> I hit send to early on my last reply, please forgive me...
> 
> Jed Brown wrote:
>> On Mon, 08 Feb 2010 13:54:10 -0500, Prentice Bisbal  wrote:
>>> but I don't have that book handy
>> The standard has lots of examples.
>>
>>   http://www.mpi-forum.org/docs/docs.html
> 
> Thanks, I'll check out those examples.
>> You can do this, but for small structures, you're better off just
>> packing buffers.  For large structures containing variable-size fields,
>> I think it is clearer to use MPI_BOTTOM instead of offsets from an
>> arbitrary (instance-dependent) address.
> 
> I'll give that a try, too. IMHO, MPI_Pack/Unpack looks easier and less
> error prone, but Pacheco advocates using derived types over
> MPI_Pack/Unpack.
> 
>> [...]
>>
>>>   if (rank == 0) {
>>> a_point.index = 1;
>>> a_point.coords = malloc(3 * sizeof(int));
>>> a_point.coords[0] = 3;
>>> a_point.coords[1] = 6;
>>> a_point.coords[2] = 9;
>>>   }
>>>
>>>   block_lengths[0] = 1;
>>>   block_lengths[1] = 3;
>>>
>>>   type_list[0] = MPI_INT;
>>>   type_list[1] = MPI_INT;
>>>
>>>   displacements[0] = 0;
>>>   MPI_Address(&a_point.index, &start_address);
>>>   MPI_Address(a_point.coords, &address);
>> ^^
>>
>> Rank 1 has not allocated this yet.
> 
> I'm glad you brought that up. I wanted to ask about that:
> 
> In my situation, rank 0 is reading in a file containing all the coords.
> So even if other ranks don't have the data, I still need to create the
> structure on all the nodes, even if I don't populate it with data?

To clarify: I thought adding a similar structure, b_point in rank 1
would be adequate to receive the data from rank 0.

-- 
Prentice


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
I was getting the same error a few weeks ago. In my case the error
message was spot on. I was trying to put too much data in a buffer using
MPI_Pack.

I was able to track down the problem using valgrind. Have you tried that
yet? You need to install valgrind first and then compile OpenMPI with
valgrind support. It takes some time, but is worth it.

http://www.open-mpi.org/faq/?category=debugging#memchecker_what

Amr Hassan wrote:
> Hi All,
> 
> I'm facing a strange problem with OpenMPI.
> 
> I'm developing an application which is required to send a message from
> each client  (1 MB each) to a server node for around 10 times per second
> (it's a distributed render application and I'm trying to reach a higher
> frame rate ). The problem is that OpenMPI crash in that case and only
> works if I partition this messages into a set of 20 k sub-messages with
> a sleep between each one of them for around 1 to 10 ms!! This solution
> is very expensive in term of time needed to send the data.  Is there any
> other solutions?
> 
> The error i got now is:
> Signal: Segmentation fault (11)
> Signal code:  Address not mapped (1)
> Failing at address: x
> 
> The OS is Linux CentOS.  I'm using the latest version of OpenMPI.
> 
> I appreciate any help regarding that.
> 
>  Regards,
> Amr
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
Amr Hassan wrote:
> Thanks alot for your reply,
>  
> I'm using blocking Send and Receive. All the clients are sending data
> and the server is receive the messages from the clients with
> MPI_ANY_SOURCE as the sender. Do you think there is a race condition
> near this pattern? 
>  
> I searched a lot and used totalview but I couldn't detect such case. I
> really appreciate if you send me a link or give an example of a possible
> race condition in that scenario . 
>  
> Also, when I partition the message into smaller parts (send in sequence
> - all the other clients wait until the send finish) it works fine. is
> that exclude the race condition?
>  

It sounds like, when sending the large messages, you are putting more
data into a buffer than it can hold. When you break the messages up into
 smaller sizes, you're not overflowing the buffer.

Are you using MPI_Pack, by any chance?

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Prentice Bisbal
Terry Frankcombe wrote:
> Surely this is the problem of the scheduler that your system uses,
> rather than MPI?

That's not true. The scheduler only assigns the initial processes to
nodes and starts them. It can kill the processes it starts if they use
too much memory or run too long, but doesn't prevent them from spawning
more processes, and once spawned, unless they are spawned through the
scheduler, it has no control over them.
> 
> 
> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
>> Hello,
>>
>> I wonder if someone can help.
>>
>> The situation is that I have an MPI-parallel fortran program. I run it
>> and it's distributed on N cores, and each of these processes must call
>> an external program.
>>
>> This external program is also an MPI program, however I want to run it
>> in serial, on the core that is calling it, as if it were part of the
>> fortran program. The fortran program waits until the external program
>> has completed, and then continues.
>>
>> The problem is that this external program seems to run on any core,
>> and not necessarily the (now idle) core that called it. This slows
>> things down a lot as you get one core doing multiple tasks.
>>
>> Can anyone tell me how I can call the program and ensure it runs only
>> on the core that's calling it? Note that there are several cores per
>> node. I can ID the node by running the hostname command (I don't know
>> a way to do this for individual cores).
>>
>> Thanks!
>>
>> 
>>
>> Extra information that might be helpful:
>>
>> If I simply run the external program from the command line (ie, type
>> "/path/myprogram.ex "), it runs fine. If I run it within the
>> fortran program by calling it via
>>
>> CALL SYSTEM("/path/myprogram.ex")
>>
>> it doesn't run at all (doesn't even start) and everything crashes. I
>> don't know why this is.
>>
>> If I call it using mpiexec:
>>
>> CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
>>
>> then it does work, but I get the problem that it can go on any core. 
>>
>> __
>> Do you want a Hotmail account? Sign-up now - Free
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


[OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Is there a limit on how many MPI processes can run on a single host?

I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:

mpirun -np X foo

When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:

$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?


-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Sorry. I meant to include that. I'm using version 1.2.8.

Ralph Castain wrote:
> It helps to have some idea what version you are talking about...
> 
> On Mar 3, 2010, at 9:51 AM, Prentice Bisbal wrote:
> 
>> Is there a limit on how many MPI processes can run on a single host?
>>
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>>
>> mpirun -np X foo
>>
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> program:
>>
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
>>
>>
>> -- 
>> Prentice Bisbal
>> Linux Software Support Specialist/System Administrator
>> School of Natural Sciences
>> Institute for Advanced Study
>> Princeton, NJ
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Prentice Bisbal
Reuti wrote:
> Are you speaking of the same?

Good point, Reuti. I was thinking of a cluster scheduler like SGE or
Torque.
> 
> Am 03.03.2010 um 17:32 schrieb Prentice Bisbal:
> 
>> Terry Frankcombe wrote:
>>> Surely this is the problem of the scheduler that your system uses,
> 
> This I would also state.
> 
> 
>>> rather than MPI?
> 
> Scheduler in the Linux kernel?
> 
> 
>> That's not true. The scheduler only assigns the initial processes to
>> nodes
> 
> Scheduler in MPI?
> 
> 
>> and starts them. It can kill the processes it starts if they use
>> too much memory or run too long, but doesn't prevent them from spawning
>> more processes, and once spawned,
> 
> When the processes are bound to one and the same core, these addititonal
> processes won't intefere with other jobs' processes on the same node
> which run on the other cores.
> 
> -- Reuti
> 
> 
>> unless they are spawned through the
>> scheduler, it has no control over them.
>>>
>>>
>>> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
>>>> Hello,
>>>>
>>>> I wonder if someone can help.
>>>>
>>>> The situation is that I have an MPI-parallel fortran program. I run it
>>>> and it's distributed on N cores, and each of these processes must call
>>>> an external program.
>>>>
>>>> This external program is also an MPI program, however I want to run it
>>>> in serial, on the core that is calling it, as if it were part of the
>>>> fortran program. The fortran program waits until the external program
>>>> has completed, and then continues.
>>>>
>>>> The problem is that this external program seems to run on any core,
>>>> and not necessarily the (now idle) core that called it. This slows
>>>> things down a lot as you get one core doing multiple tasks.
>>>>
>>>> Can anyone tell me how I can call the program and ensure it runs only
>>>> on the core that's calling it? Note that there are several cores per
>>>> node. I can ID the node by running the hostname command (I don't know
>>>> a way to do this for individual cores).
>>>>
>>>> Thanks!
>>>>
>>>> 
>>>>
>>>> Extra information that might be helpful:
>>>>
>>>> If I simply run the external program from the command line (ie, type
>>>> "/path/myprogram.ex "), it runs fine. If I run it within the
>>>> fortran program by calling it via
>>>>
>>>> CALL SYSTEM("/path/myprogram.ex")
>>>>
>>>> it doesn't run at all (doesn't even start) and everything crashes. I
>>>> don't know why this is.
>>>>
>>>> If I call it using mpiexec:
>>>>
>>>> CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
>>>>
>>>> then it does work, but I get the problem that it can go on any core.
>>>>
>>>> __
>>>> Do you want a Hotmail account? Sign-up now - Free
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> -- 
>> Prentice Bisbal
>> Linux Software Support Specialist/System Administrator
>> School of Natural Sciences
>> Institute for Advanced Study
>> Princeton, NJ
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Eugene Loh wrote:
> Prentice Bisbal wrote:
> 
>> Is there a limit on how many MPI processes can run on a single host?
>>
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>>
>> mpirun -np X foo
>>
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> program:
>>
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
>>  
>>
> One possible issue is the limit on the number of descriptors.  The error
> message should be pretty helpful and descriptive, but perhaps you're
> using an older version of OMPI.  If this is your problem, one workaround
> is something like this:
> 
> unlimit descriptors
> mpirun -np 256 mpihello

Looks like I'm not allowed to set that as a regular user:

$ ulimit -n 2048
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Since I am the admin, I could change that elsewhere, but I'd rather not
do that system-wide unless absolutely necessary.

> 
> though I guess the syntax depends on what shell you're running.  Another
> is to set the MCA parameter opal_set_max_sys_limits to 1.

That didn't work either:

$ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Eugene Loh wrote:
> Prentice Bisbal wrote:
>> Eugene Loh wrote:
>>   
>>> Prentice Bisbal wrote:
>>> 
>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>   
> Depending on which OMPI release you're using, I think you need something
> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
> 1000+ descriptors.  You're quite possibly up against your limit, though
> I don't know for sure that that's the problem here.
> 
> You say you're running 1.2.8.  That's "a while ago", so would you
> consider updating as a first step?  Among other things, newer OMPIs will
> generate a much clearer error message if the descriptor limit is the
> problem.

While 1.2.8 might be "a while ago", upgrading software just because it's
"old" is not a valid argument.

I can install the lastest version of OpenMPI, but it will take a little
while.


>>>> I have a user trying to test his code on the command-line on a single
>>>> host before running it on our cluster like so:
>>>>
>>>> mpirun -np X foo
>>>>
>>>> When he tries to run it on large number of process (X = 256, 512), the
>>>> program fails, and I can reproduce this with a simple "Hello, World"
>>>> program:
>>>>
>>>> $ mpirun -np 256 mpihello
>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>> exited on signal 15 (Terminated).
>>>> 252 additional processes aborted (not shown)
>>>>
>>>> I've done some testing and found that X <155 for this program to work.
>>>> Is this a bug, part of the standard, or design/implementation decision?
>>>>  
>>>>
>>>>   
>>> One possible issue is the limit on the number of descriptors.  The error
>>> message should be pretty helpful and descriptive, but perhaps you're
>>> using an older version of OMPI.  If this is your problem, one workaround
>>> is something like this:
>>>
>>> unlimit descriptors
>>> mpirun -np 256 mpihello
>>> 
>>
>> Looks like I'm not allowed to set that as a regular user:
>>
>> $ ulimit -n 2048
>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>
>> Since I am the admin, I could change that elsewhere, but I'd rather not
>> do that system-wide unless absolutely necessary.
>>   
>>> though I guess the syntax depends on what shell you're running.  Another
>>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>>> 
>> That didn't work either:
>>
>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>>   
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
> 
>> Eugene Loh wrote:
>>> Prentice Bisbal wrote:
>>>> Eugene Loh wrote:
>>>>
>>>>> Prentice Bisbal wrote:
>>>>>
>>>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>>>
>>> Depending on which OMPI release you're using, I think you need something
>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>> I don't know for sure that that's the problem here.
>>>
>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>> consider updating as a first step?  Among other things, newer OMPIs will
>>> generate a much clearer error message if the descriptor limit is the
>>> problem.
>> While 1.2.8 might be "a while ago", upgrading software just because it's
>> "old" is not a valid argument.
>>
>> I can install the lastest version of OpenMPI, but it will take a little
>> while.
> 
> Maybe not because it is "old", but Eugene is correct. The old versions of 
> OMPI required more file descriptors than the newer versions.
> 
> That said, you'll still need a minimum of 4x the number of procs on the node 
> even with the latest release. I suggest talking to your sys admin about 
> getting the limit increased. It sounds like it has been set unrealistically 
> low.
> 
> 
I *am* the system admin! ;)

The file descriptor limit is the default for RHEL,  1024, so I would not
characterize it as "unrealistically low".  I assume someone with much
more knowledge of OS design and administration than me came up with this
default, so I'm hesitant to change it without good reason. If there was
good reason, I'd have no problem changing it. I have read that setting
it to more than 8192 can lead to system instability.

This is admittedly unusual situation - in normal use, no one would ever
want to run that many processes on a single system - so I don't see any
justification for modifying that setting.

Yesterday I spoke to the researcher who originally asked me this limit -
he just wanted to know what the limit was, and doesn't actually plan to
do any "real" work with that many processes on a single node, rendering
this whole discussion academic.

I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
test it yet. I'll post the results of testing here.

>>
>>>>>> I have a user trying to test his code on the command-line on a single
>>>>>> host before running it on our cluster like so:
>>>>>>
>>>>>> mpirun -np X foo
>>>>>>
>>>>>> When he tries to run it on large number of process (X = 256, 512), the
>>>>>> program fails, and I can reproduce this with a simple "Hello, World"
>>>>>> program:
>>>>>>
>>>>>> $ mpirun -np 256 mpihello
>>>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>>>> exited on signal 15 (Terminated).
>>>>>> 252 additional processes aborted (not shown)
>>>>>>
>>>>>> I've done some testing and found that X <155 for this program to work.
>>>>>> Is this a bug, part of the standard, or design/implementation decision?
>>>>>>
>>>>>>
>>>>>>
>>>>> One possible issue is the limit on the number of descriptors.  The error
>>>>> message should be pretty helpful and descriptive, but perhaps you're
>>>>> using an older version of OMPI.  If this is your problem, one workaround
>>>>> is something like this:
>>>>>
>>>>> unlimit descriptors
>>>>> mpirun -np 256 mpihello
>>>>>
>>>> Looks like I'm not allowed to set that as a regular user:
>>>>
>>>> $ ulimit -n 2048
>>>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>>>
>>>> Since I am the admin, I could change that elsewhere, but I'd rather not
>>>> do that system-wide unless absolutely necessary.
>>>>
>>>>> though I guess the syntax depends on what shell you're running.  Another
>>>>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>>>>>
>>>> That didn't work either:
>>>>
>>>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>> exited on signal 15 (Terminated).
>>>> 252 additional processes aborted (not shown)


-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
> 
>>
>> Ralph Castain wrote:
>>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>>>
>>>> Eugene Loh wrote:
>>>>> Prentice Bisbal wrote:
>>>>>> Eugene Loh wrote:
>>>>>>
>>>>>>> Prentice Bisbal wrote:
>>>>>>>
>>>>>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>>>>>
>>>>> Depending on which OMPI release you're using, I think you need something
>>>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>>>> I don't know for sure that that's the problem here.
>>>>>
>>>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>>>> consider updating as a first step?  Among other things, newer OMPIs will
>>>>> generate a much clearer error message if the descriptor limit is the
>>>>> problem.
>>>> While 1.2.8 might be "a while ago", upgrading software just because it's
>>>> "old" is not a valid argument.
>>>>
>>>> I can install the lastest version of OpenMPI, but it will take a little
>>>> while.
>>> Maybe not because it is "old", but Eugene is correct. The old versions of 
>>> OMPI required more file descriptors than the newer versions.
>>>
>>> That said, you'll still need a minimum of 4x the number of procs on the 
>>> node even with the latest release. I suggest talking to your sys admin 
>>> about getting the limit increased. It sounds like it has been set 
>>> unrealistically low.
>>>
>>>
>> I *am* the system admin! ;)
>>
>> The file descriptor limit is the default for RHEL,  1024, so I would not
>> characterize it as "unrealistically low".  I assume someone with much
>> more knowledge of OS design and administration than me came up with this
>> default, so I'm hesitant to change it without good reason. If there was
>> good reason, I'd have no problem changing it. I have read that setting
>> it to more than 8192 can lead to system instability.
> 
> Never heard that, and most HPC systems have it set a great deal higher 
> without trouble.

I just read that the other day. Not sure where, though. Probably a forum
posting somewhere. I'll take your word for it that it's safe to increase
if necessary.
> 
> However, the choice is yours. If you have a large SMP system, you'll 
> eventually be forced to change it or severely limit its usefulness for MPI. 
> RHEL sets it that low arbitrarily as a way of saving memory by keeping the fd 
> table small, not because the OS can't handle it.
> 
> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
> fd's are required for socket-based communications and to forward I/O.

Thanks, Ralph, that's exactly the answer I was looking for - where this
limit was coming from.

I can see how on a large SMP system the fd limit would have to be
increased. In normal circumstances, my cluster nodes should never have
more than 8 MPI processes running at once (per node), so I shouldn't be
hitting that limit on my cluster.

> 
> 
>> This is admittedly unusual situation - in normal use, no one would ever
>> want to run that many processes on a single system - so I don't see any
>> justification for modifying that setting.
>>
>> Yesterday I spoke to the researcher who originally asked me this limit -
>> he just wanted to know what the limit was, and doesn't actually plan to
>> do any "real" work with that many processes on a single node, rendering
>> this whole discussion academic.
>>
>> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
>> test it yet. I'll post the results of testing here.
>>
>>>>>>>> I have a user trying to test his code on the command-line on a single
>>>>>>>> host before running it on our cluster like so:
>>>>>>>>
>>>>>>>> mpirun -np X foo
>>>>>>>>
>>>>>>>> When he tries to run it on large number of process (X = 256, 512), the
>>>>>>>> program fails, and I can reproduce this with a simple "Hello, World"
>>>>>>>> program:
>>>>>

Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 4, 2010, at 7:51 AM, Prentice Bisbal wrote:
> 
>>
>> Ralph Castain wrote:
>>> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
>>>
>>>> Ralph Castain wrote:
>>>>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>>>>>
>>>>>> Eugene Loh wrote:
>>>>>>> Prentice Bisbal wrote:
>>>>>>>> Eugene Loh wrote:
>>>>>>>>
>>>>>>>>> Prentice Bisbal wrote:
>>>>>>>>>
>>>>>>>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>>>>>>>
>>>>>>> Depending on which OMPI release you're using, I think you need something
>>>>>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>>>>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>>>>>> I don't know for sure that that's the problem here.
>>>>>>>
>>>>>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>>>>>> consider updating as a first step?  Among other things, newer OMPIs will
>>>>>>> generate a much clearer error message if the descriptor limit is the
>>>>>>> problem.
>>>>>> While 1.2.8 might be "a while ago", upgrading software just because it's
>>>>>> "old" is not a valid argument.
>>>>>>
>>>>>> I can install the lastest version of OpenMPI, but it will take a little
>>>>>> while.
>>>>> Maybe not because it is "old", but Eugene is correct. The old versions of 
>>>>> OMPI required more file descriptors than the newer versions.
>>>>>
>>>>> That said, you'll still need a minimum of 4x the number of procs on the 
>>>>> node even with the latest release. I suggest talking to your sys admin 
>>>>> about getting the limit increased. It sounds like it has been set 
>>>>> unrealistically low.
>>>>>
>>>>>
>>>> I *am* the system admin! ;)
>>>>
>>>> The file descriptor limit is the default for RHEL,  1024, so I would not
>>>> characterize it as "unrealistically low".  I assume someone with much
>>>> more knowledge of OS design and administration than me came up with this
>>>> default, so I'm hesitant to change it without good reason. If there was
>>>> good reason, I'd have no problem changing it. I have read that setting
>>>> it to more than 8192 can lead to system instability.
>>> Never heard that, and most HPC systems have it set a great deal higher 
>>> without trouble.
>> I just read that the other day. Not sure where, though. Probably a forum
>> posting somewhere. I'll take your word for it that it's safe to increase
>> if necessary.
>>> However, the choice is yours. If you have a large SMP system, you'll 
>>> eventually be forced to change it or severely limit its usefulness for MPI. 
>>> RHEL sets it that low arbitrarily as a way of saving memory by keeping the 
>>> fd table small, not because the OS can't handle it.
>>>
>>> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
>>> fd's are required for socket-based communications and to forward I/O.
>> Thanks, Ralph, that's exactly the answer I was looking for - where this
>> limit was coming from.
>>
>> I can see how on a large SMP system the fd limit would have to be
>> increased. In normal circumstances, my cluster nodes should never have
>> more than 8 MPI processes running at once (per node), so I shouldn't be
>> hitting that limit on my cluster.
> 
> Ah, okay! That helps a great deal in figuring out what to advise you. In your 
> earlier note, it sounded like you were running all 512 procs on one node, so 
> I assumed you had a large single-node SMP.
> 
> In this case, though, the problem is solely that you are using the 1.2 
> series. In that series, mpirun and each process opened many more sockets to 
> all processes in the job. That's why you are overrunning your limit.
> 
> Starting with 1.3, the number of sockets being opened on each is only 3 times 
> the number of procs on the node, plus a couple for the daemon. If you are 
> using TCP for MPI communications, then each MPI connecti

Re: [OMPI users] 3D domain decomposition with MPI

2010-03-12 Thread Prentice Bisbal


Gus Correa wrote:
> At each time step you exchange halo/ghost sections across
> neighbor subdomains, using MPI_Send/MPI_Recv,
> or MPI_SendRecv.
> Even better if you use non-blocking calls
> MPI_ISend/MPI_[I]Recv/MPI_Wait[all].
> Read about the advantages of non-blocking communication
> in the "MPI The Complete Reference, Vol 1" book that I suggested
> to you.

"Using MPI, 2nd Edition, by Gropp, et al, (the same people who wrote the
above book, I think), also has a good discussion of this.
> 
> You can do the bookkeeping of "which subdomain/process_rank is my
> left neighbor?" etc, yourself, if you create domain neighbor
> tables when the program initializes.
> Alternatively, and more elegantly, you can use the MPI
> Cartesian topology functions to take care of this for you.

Also  described in Using MPI, 2nd Ed.

-- 
Prentice


Re: [OMPI users] Hide Abort output

2010-04-05 Thread Prentice Bisbal
I would suggest that MPI_Abort take a string as an argument, and print
out the that string when MPI_Abort is called. If a programmer wanted to
NOT have an abort message, they could just omit the argument:

MPI_Abort("Could not open file foo");

or

MPI_Abort();

This would be similar to Perl's die command, where you can provide a
string to printout when die is called.


--
Prentice

David Singleton wrote:
> 
> Yes, Dick has isolated the issue - novice users often believe Open MPI
> (not their application) had a problem.  Anything along the lines he
> suggests
> can only help.
> 
> David
> 
> On 04/01/2010 01:12 AM, Richard Treumann wrote:
>>
>> I do not know what the OpenMPI message looks like or why people want to
>> hide it. It should be phrased to avoid any implication of a problem with
>> OpenMPI itself.
>>
>> How about something like this which:
>>
>> "The application has called MPI_Abort. The application is terminated by
>> OpenMPI as the application demanded"
>>
>>
>> Dick Treumann  -  MPI Team
>> IBM Systems&  Technology Group
>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>> Tele (845) 433-7846 Fax (845) 433-8363
>>
>>
>>
>>
>>From:   "Jeff Squyres (jsquyres)"
>>
>>To:,
>>
>>Date:   03/31/2010 06:43 AM
>>
>>Subject:Re: [OMPI users] Hide Abort output
>>
>>Sent by:users-boun...@open-mpi.org
>>
>>
>>
>>
>>
>>
>> At present there is no such feature, but it should not be hard to add.
>>
>> Can you guys be a little more specific about exactly what you are seeing
>> and exactly what you want to see?  (And what version you're working
>> with -
>> I'll caveat my discussion that this may be a 1.5-and-forward thing)
>>
>> -jms
>> Sent from my PDA.  No type good.
>>
>> - Original Message -
>> From: users-boun...@open-mpi.org
>> To: Open MPI Users
>> Sent: Wed Mar 31 05:38:48 2010
>> Subject: Re: [OMPI users] Hide Abort output
>>
>>
>> I have to say this is a very common issue for our users.  They repeatedly
>> report the long Open MPI MPI_Abort() message in help queries and fail to
>> look for the application error message about the root cause.  A short
>> MPI_Abort() message that said "look elsewhere for the real error message"
>> would be useful.
>>
>> Cheers,
>> David
>>
>> On 03/31/2010 07:58 PM, Yves Caniou wrote:
>>> Dear all,
>>>
>>> I am using the MPI_Abort() command in a MPI program.
>>> I would like to not see the note explaining that the command caused Open
>> MPI
>>> to kill all the jobs and so on.
>>> I thought that I could find a --mca parameter, but couldn't grep it. The
>> only
>>> ones deal with the delay and printing more information (the stack).
>>>
>>> Is there a mean to avoid the printing of the note (except the
>>> 2>/dev/null
>>> tips)? Or to delay this printing?
>>>
>>> Thank you.
>>>
>>> .Yves.
>>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] Hide Abort output

2010-04-05 Thread Prentice Bisbal
I sent that last message to quick. I was ignoring that MPI_Abort already
takes two args. I've been doing a lot of Perl the past few weeks, so I'm
 in a perl state of mind.

I would say add a third arg to MPI_Abort for a programmer-defined error
string, but that would violate the standard, wouldn't it? Nevermind...

Prentice

Prentice Bisbal wrote:
> I would suggest that MPI_Abort take a string as an argument, and print
> out the that string when MPI_Abort is called. If a programmer wanted to
> NOT have an abort message, they could just omit the argument:
> 
> MPI_Abort("Could not open file foo");
> 
> or
> 
> MPI_Abort();
> 
> This would be similar to Perl's die command, where you can provide a
> string to printout when die is called.
> 
> 
> --
> Prentice
> 
> David Singleton wrote:
>> Yes, Dick has isolated the issue - novice users often believe Open MPI
>> (not their application) had a problem.  Anything along the lines he
>> suggests
>> can only help.
>>
>> David
>>
>> On 04/01/2010 01:12 AM, Richard Treumann wrote:
>>> I do not know what the OpenMPI message looks like or why people want to
>>> hide it. It should be phrased to avoid any implication of a problem with
>>> OpenMPI itself.
>>>
>>> How about something like this which:
>>>
>>> "The application has called MPI_Abort. The application is terminated by
>>> OpenMPI as the application demanded"
>>>
>>>
>>> Dick Treumann  -  MPI Team
>>> IBM Systems&  Technology Group
>>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>>> Tele (845) 433-7846 Fax (845) 433-8363
>>>
>>>
>>>
>>>
>>>From:   "Jeff Squyres (jsquyres)"
>>>
>>>To:,
>>>
>>>Date:   03/31/2010 06:43 AM
>>>
>>>Subject:Re: [OMPI users] Hide Abort output
>>>
>>>Sent by:users-boun...@open-mpi.org
>>>
>>>
>>>
>>>
>>>
>>>
>>> At present there is no such feature, but it should not be hard to add.
>>>
>>> Can you guys be a little more specific about exactly what you are seeing
>>> and exactly what you want to see?  (And what version you're working
>>> with -
>>> I'll caveat my discussion that this may be a 1.5-and-forward thing)
>>>
>>> -jms
>>> Sent from my PDA.  No type good.
>>>
>>> - Original Message -
>>> From: users-boun...@open-mpi.org
>>> To: Open MPI Users
>>> Sent: Wed Mar 31 05:38:48 2010
>>> Subject: Re: [OMPI users] Hide Abort output
>>>
>>>
>>> I have to say this is a very common issue for our users.  They repeatedly
>>> report the long Open MPI MPI_Abort() message in help queries and fail to
>>> look for the application error message about the root cause.  A short
>>> MPI_Abort() message that said "look elsewhere for the real error message"
>>> would be useful.
>>>
>>> Cheers,
>>> David
>>>
>>> On 03/31/2010 07:58 PM, Yves Caniou wrote:
>>>> Dear all,
>>>>
>>>> I am using the MPI_Abort() command in a MPI program.
>>>> I would like to not see the note explaining that the command caused Open
>>> MPI
>>>> to kill all the jobs and so on.
>>>> I thought that I could find a --mca parameter, but couldn't grep it. The
>>> only
>>>> ones deal with the delay and printing more information (the stack).
>>>>
>>>> Is there a mean to avoid the printing of the note (except the
>>>> 2>/dev/null
>>>> tips)? Or to delay this printing?
>>>>
>>>> Thank you.
>>>>
>>>> .Yves.
>>>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] libmpi_f90.so.0 problem

2010-04-15 Thread Prentice Bisbal
Jeff Squyres wrote:
> On Apr 14, 2010, at 10:17 PM, max marconi wrote:
> 
>> I have just installed openmpi on my system and tried to run the example
>> Hello_f90. The following error was generated upon executing.
>>
>> : error while loading shared libraries: libmpi_f90.so.0: cannot open
>> shared object file: No such file or directory
>>
>> The library with libmpi_f90  is located in /usr/local/lib
> 
> The usual cause for this is that the shared library cannot be found at run 
> time.  Can you verify if /usr/local/lib is in your LD_LIBRARY_PATH on all 
> nodes, and/or /usr/local/lib is in the normal run-time linker search paths?
> 
> Note that you might need to check this in a non-interactive rsh/ssh login -- 
> the setups may be different than for interactive logins.
> 
> Also, can you verify that libmpi_f90.so.0 is in /usr/local/lib?  You 
> mentioned "the library with libmpi_f90" -- there should likely be one library 
> and one or more sym links.
> 

If everything above checks out, check the file permissions of the
library file and the directories above it, too.

-- 
Prentice


  1   2   >