[OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
Greeting Open MPI users! After being off this list for several years, 
I'm back! And I need help:


I'm trying to compile OpenMPI 1.10.3 with the PGI compilers, version 
17.3. I'm using the following configure options:


./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work around 
from 2009:


https://www.open-mpi.org/community/lists/users/2009/04/8724.php

Interestingly, I participated in the discussion that lead to that 
workaround, stating that I had no problem compiling Open MPI with PGI 
v9. I'm assuming the problem now is that I'm specifying 
--enable-mpi-thread-multiple, which I'm doing because a user requested 
that feature.


It's been exactly 8 years and 2 days since that workaround was posted to 
the list. Please tell me a better way of dealing with this issue than 
writing a 'fakepgf90' script. Any suggestions?



--
Prentice

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Åke Sandgren
This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg wrapper.

On 04/03/2017 04:20 PM, Prentice Bisbal wrote:
> Greeting Open MPI users! After being off this list for several years,
> I'm back! And I need help:
> 
> I'm trying to compile OpenMPI 1.10.3 with the PGI compilers, version
> 17.3. I'm using the following configure options:
> 
> ./configure \
>   --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
>   --disable-silent-rules \
>   --enable-shared \
>   --enable-static \
>   --enable-mpi-thread-multiple \
>   --with-pmi=/usr/pppl/slurm/15.08.8 \
>   --with-hwloc \
>   --with-verbs \
>   --with-slurm \
>   --with-psm \
>   CC=pgcc \
>   CFLAGS="-tp x64 -fast" \
>   CXX=pgc++ \
>   CXXFLAGS="-tp x64 -fast" \
>   FC=pgfortran \
>   FCFLAGS="-tp x64 -fast" \
>   2>&1 | tee configure.log
> 
> Which leads to this error  from libtool during make:
> 
> pgcc-Error-Unknown switch: -pthread
> 
> I've searched the archives, which ultimately lead to this work around
> from 2009:
> 
> https://www.open-mpi.org/community/lists/users/2009/04/8724.php
> 
> Interestingly, I participated in the discussion that lead to that
> workaround, stating that I had no problem compiling Open MPI with PGI
> v9. I'm assuming the problem now is that I'm specifying
> --enable-mpi-thread-multiple, which I'm doing because a user requested
> that feature.
> 
> It's been exactly 8 years and 2 days since that workaround was posted to
> the list. Please tell me a better way of dealing with this issue than
> writing a 'fakepgf90' script. Any suggestions?
> 
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Gilles Gouaillardet
Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la
and/or libslurm.la
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal  wrote:

> Greeting Open MPI users! After being off this list for several years, I'm
> back! And I need help:
>
> I'm trying to compile OpenMPI 1.10.3 with the PGI compilers, version 17.3.
> I'm using the following configure options:
>
> ./configure \
>   --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
>   --disable-silent-rules \
>   --enable-shared \
>   --enable-static \
>   --enable-mpi-thread-multiple \
>   --with-pmi=/usr/pppl/slurm/15.08.8 \
>   --with-hwloc \
>   --with-verbs \
>   --with-slurm \
>   --with-psm \
>   CC=pgcc \
>   CFLAGS="-tp x64 -fast" \
>   CXX=pgc++ \
>   CXXFLAGS="-tp x64 -fast" \
>   FC=pgfortran \
>   FCFLAGS="-tp x64 -fast" \
>   2>&1 | tee configure.log
>
> Which leads to this error  from libtool during make:
>
> pgcc-Error-Unknown switch: -pthread
>
> I've searched the archives, which ultimately lead to this work around from
> 2009:
>
> https://www.open-mpi.org/community/lists/users/2009/04/8724.php
>
> Interestingly, I participated in the discussion that lead to that
> workaround, stating that I had no problem compiling Open MPI with PGI v9.
> I'm assuming the problem now is that I'm specifying
> --enable-mpi-thread-multiple, which I'm doing because a user requested that
> feature.
>
> It's been exactly 8 years and 2 days since that workaround was posted to
> the list. Please tell me a better way of dealing with this issue than
> writing a 'fakepgf90' script. Any suggestions?
>
>
> --
> Prentice
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Passive target sync. support

2017-04-03 Thread Sebastian Rinke
Dear all,

I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a 
distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other 
ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
…
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed 
for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
>From time to time the pending Rgets are then checked for completion and 
the corresponding particle can progress.

My questions are:

1) Does Get  and Rget use network hardware support on Infiniband (IB) for 
contiguous data?

2) How is RMA progress achieved for IB? Is there a progress thread option 
available?

3) If there is no progress thread option, would it be useful to use 
MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied? 
Would this be a reasonable option to ensure progress in MPI?

E.g.:
while (1)
MPI_Test()

Thank you for your help,
Sebastian


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//'/lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How would 
I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI 
the only option to fix this in slurm, or use Åke's suggestion above?


If I did use Åke's suggestion above, how would that affect the operation 
of Slurm, or future builds of OpenMPI and any other software that might 
rely on Slurm, particulary with regards to building those apps with 
non-PGI compilers?


Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la 
 and/or libslurm.la 

Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal > wrote:


Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions?


-- 
Prentice


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm



On Apr 03, 2017, at 08:36 AM, Sebastian Rinke  wrote:

Dear all,

I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a 
distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other 
ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
…
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed 
for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and 
the corresponding particle can progress.


My questions are:

1) Does Get and Rget use network hardware support on Infiniband (IB) for 
contiguous data?

In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use 
the two-sided implementation which may or may not use the hardware put/get 
support.
 

2) How is RMA progress achieved for IB? Is there a progress thread option 
available?

Progress threads are generally not needed for progressing RMA with Open MPI 
v2.0.0+. The only exception is when we have to queue up the operation (which 
may be the case with get). You can get origin-side progress by making another 
RMA call or by waiting on an operation initiated with on of the request-based 
calls.

If you want to progress each get independently you should use Rget.
 

3) If there is no progress thread option, would it be useful to use 
MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied? 
Would this be a reasonable option to ensure progress in MPI?


E.g.:
while (1)
MPI_Test()

This will get you progress but isn't possible with Open MPI v1.10.x and older. 
MPI_THREAD_MULTIPLE is only really supported from v2.0.0.
 
-Nathan___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Åke Sandgren
We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:
> This is the second suggestion to rebuild Slurm
> 
> The  other from Åke Sandgren, who recommended this:
> 
>> This usually comes from slurm, so we always do
>>
>> perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
>> /lap/slurm/${version}/lib/libslurm.la
>>
>> when installing a new slurm version. Thus no need for a fakepg wrapper.
> 
> I don't really have the luxury to rebuild Slurm at the moment. How would
> I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
> the only option to fix this in slurm, or use Åke's suggestion above?
> 
> If I did use Åke's suggestion above, how would that affect the operation
> of Slurm, or future builds of OpenMPI and any other software that might
> rely on Slurm, particulary with regards to building those apps with
> non-PGI compilers?
> 
> Prentice
> 
> On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:
>> Hi,
>>
>> The -pthread flag is likely pulled by libtool from the slurm libmpi.la
>>  and/or libslurm.la 
>> Workarounds are
>> - rebuild slurm with PGI
>> - remove the .la files (*.so and/or *.a are enough)
>> - wrap the PGI compiler to ignore the -pthread option
>>
>> Hope this helps
>>
>> Gilles
>>
>> On Monday, April 3, 2017, Prentice Bisbal > > wrote:
>>
>> Greeting Open MPI users! After being off this list for several
>> years, I'm back! And I need help:
>>
>> I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
>> version 17.3. I'm using the following configure options:
>>
>> ./configure \
>>   --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
>>   --disable-silent-rules \
>>   --enable-shared \
>>   --enable-static \
>>   --enable-mpi-thread-multiple \
>>   --with-pmi=/usr/pppl/slurm/15.08.8 \
>>   --with-hwloc \
>>   --with-verbs \
>>   --with-slurm \
>>   --with-psm \
>>   CC=pgcc \
>>   CFLAGS="-tp x64 -fast" \
>>   CXX=pgc++ \
>>   CXXFLAGS="-tp x64 -fast" \
>>   FC=pgfortran \
>>   FCFLAGS="-tp x64 -fast" \
>>   2>&1 | tee configure.log
>>
>> Which leads to this error  from libtool during make:
>>
>> pgcc-Error-Unknown switch: -pthread
>>
>> I've searched the archives, which ultimately lead to this work
>> around from 2009:
>>
>> https://www.open-mpi.org/community/lists/users/2009/04/8724.php
>> 
>>
>> Interestingly, I participated in the discussion that lead to that
>> workaround, stating that I had no problem compiling Open MPI with
>> PGI v9. I'm assuming the problem now is that I'm specifying
>> --enable-mpi-thread-multiple, which I'm doing because a user
>> requested that feature.
>>
>> It's been exactly 8 years and 2 days since that workaround was
>> posted to the list. Please tell me a better way of dealing with
>> this issue than writing a 'fakepgf90' script. Any suggestions?
>>
>>
>> -- 
>> Prentice
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>>
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Aaron Knister
To be thorough couldn't one replace -pthread in the slurm .la files with 
-lpthread? I ran into this last week and this was the solution I was 
thinking about implementing. Having said that, I can't think of a 
situation in which the -pthread/-lpthread argument would be required 
other than linking against statically compiled SLURM libraries and even 
then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the operation
of Slurm, or future builds of OpenMPI and any other software that might
rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la
 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions?


--
Prentice

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users





--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_WAIT hangs after a call to MPI_CANCEL

2017-04-03 Thread George Bosilca
Kevin,

In Open MPI we only support cancelling non-yet matched receives. So, you
cannot cancel sends nor receive requests that have already been matched.
While the latter are supposed to complete (otherwise they would not have
been matched), the former are trickier to complete if the corresponding
receive is never posted.

To sum this up, the bad news is that there is no way to correctly cancel
MPI requests without hitting deadlock.

That being said, I can hardly understand how Open MPI can drop a message.
There might be something else in here, that is more difficult to spot. We
do have an internal way to dump all pending (or known) communication.
Assuming you are using the OB1 PML here is how you dump all known
communications. Attach to a process and find the communicator pointer (you
will need to convert between the F90 communicator and the C pointer) and
then call mca_pml.pml_dump( commptr, 1).

Also, it is possible to check how one of the more recent versions of Open
MPI (> 2.1) behave with your code ?

  George.




On Sat, Apr 1, 2017 at 12:40 PM, McGrattan, Kevin B. Dr. (Fed) <
kevin.mcgrat...@nist.gov> wrote:

> I am running a large computational fluid dynamics code on a linux cluster
> (Centos 6.8, Open MPI 1.8.4). The code is written in Fortran and compiled
> with Intel Fortran 16.0.3. The cluster has 36 nodes, each node has two
> sockets, each socket has six cores. I have noticed that the code hangs when
> the size of the packages exchanged using a persistent send and receive call
> become large. I cannot say exactly how large, but generally on the order of
> 10 MB. Rather than let the code just hang, I implemented a timing loop
> using MPI_TESTALL. If MPI_TESTALL fails to return successfully after, say,
> 10 minutes, I attempt to MPI_CANCEL the unsuccessful request(s) and
> continue on with the calculation, even if the communication(s) did not
> succeed. It would not necessarily cripple the calculation if a few MPI
> communications were unsuccessful. This is a snippet of code that tests if
> the communications are successful and attempts to cancel if not:
>
>
>
>START_TIME = MPI_WTIME()
>
>FLAG = .FALSE.
>
>DO WHILE(.NOT.FLAG)
>
>   CALL MPI_TESTALL(NREQ,REQ(1:NREQ),FLAG,ARRAY_OF_STATUSES,IERR)
>
>   WAIT_TIME = MPI_WTIME() - START_TIME
>
>   IF (WAIT_TIME>TIMEOUT) THEN
>
>  WRITE(LU_ERR,'(A,A,I6,A,A)') ‘Request timed out for MPI process
> ',MYID,' running on ',PNAME(1:PNAMELEN)
>
>  DO NNN=1,NREQ
>
> IF (ARRAY_OF_STATUSES(1,NNN)==MPI_SUCCESS) CYCLE
>
> CALL MPI_CANCEL(REQ(NNN),IERR)
>
> write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_CANCEL'
>
> CALL MPI_WAIT(REQ(NNN),STATUS,IERR)
>
> write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_WAIT'
>
> CALL MPI_TEST_CANCELLED(STATUS,FLAG2,IERR)
>
> write(LU_ERR,*) ‘Request ',NNN,’ returns from
> MPI_TEST_CANCELLED'
>
>  ENDDO
>
>  ENDIF
>
>ENDDO
>
>
>
> The job still hangs, and when I look at the error file, I see that on MPI
> process A, one of the sends has not completed, and on process B, one of the
> receives has not completed. The failed send and failed receive are
> consistent – that is they are matching. What I do not understand is that
> for both the uncompleted send and receive, the code hangs in MPI_WAIT. That
> is, I do not get the printout that says that the process has returned from
> MPI_WAIT. I interpret this to mean that either some of the large message
> has been sent or received, but not all. The MPI standard seems a bit vague
> on what is supposed to happen if part of the message simply disappears due
> to some network glitch. These errors occur after hundreds or thousands of
> successful exchanges. They never happen at the same point in the
> calculation. They are random, but they occur only when the messages are
> large (like MBs). When the messages are not large, the code can run for
> days or weeks without errors.
>
>
>
> So why does MPI_WAIT hang? The MPI standard says
>
>
>
> “If a communication is marked for cancellation, then an MPI_Wait
>  call for that
> communication is guaranteed to return, irrespective of the activities of
> other processes (i.e., MPI_Wait
>  behaves as a
> local function)” (https://www.open-mpi.org/doc/v2.0/man3/MPI_Cancel.3.php).
>
>
>
>
> Could the problem be with my cluster – in that the large message is broken
> up into smaller packets, and one of these packets disappears and there is
> no way to cancel it? That’s really what I am looking for – a way to cancel
> the failed communication but still continue the calculation.
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed expression 
works on the command-line, but not in the script. I'm essentially 
reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? It's 
a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I add 
more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution I 
was thinking about implementing. Having said that, I can't think of a 
situation in which the -pthread/-lpthread argument would be required 
other than linking against statically compiled SLURM libraries and 
even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation

of Slurm, or future builds of OpenMPI and any other software that might
rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm libmpi.la
 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions?


--
Prentice

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
Nevermind. A coworker helped me figure this one out. Echo is treating 
the '-E' as an argument to echo and interpreting it instead of passing 
it to sed. Since that's used by the configure tests, that's a bit of a 
problem, Just adding another -E before $@, should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed expression 
works on the command-line, but not in the script. I'm essentially 
reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I 
add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution I 
was thinking about implementing. Having said that, I can't think of a 
situation in which the -pthread/-lpthread argument would be required 
other than linking against statically compiled SLURM libraries and 
even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software that 
might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to that
workaround, stating that I had no problem compiling Open MPI with
PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

It's been exactly 8 years and 2 days since that workaround was
posted to the list. Please tell me a better way of dealing with
this issue than writing a 'fakepgf90' script. Any suggestions

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is treating 
the '-E' as an argument to echo and interpreting it instead of passing 
it to sed. Since that's used by the configure tests, that's a bit of a 
problem, Just adding another -E before $@, should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper script 
for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I 
add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution 
I was thinking about implementing. Having said that, I can't think 
of a situation in which the -pthread/-lpthread argument would be 
required other than linking against statically compiled SLURM 
libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, Åke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from Åke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. How 
would

I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
the only option to fix this in slurm, or use Åke's suggestion above?

If I did use Åke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software that 
might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to 
that
workaround, stating that I had no problem compiling Open MPI 
with

PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thread-multiple, which I'm doing because a user
requested that feature.

   

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Reuti
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Am 03.04.2017 um 22:01 schrieb Prentice Bisbal:

> Nevermind. A coworker helped me figure this one out. Echo is treating the 
> '-E' as an argument to echo and interpreting it instead of passing it to sed. 
> Since that's used by the configure tests, that's a bit of a problem, Just 
> adding another -E before $@, should fix the problem.

It's often suggested to use printf instead of the non-portable echo.

- -- Reuti


> 
> Prentice
> 
> On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
>> I've decided to work around this problem by creating a wrapper script for 
>> pgcc that strips away the -pthread argument, but my sed expression works on 
>> the command-line, but not in the script. I'm essentially reproducing the 
>> workaround from 
>> https://www.open-mpi.org/community/lists/users/2009/04/8724.php.
>> 
>> Can anyone see what's wrong with my implementation the workaround? It's a 
>> very simple sed expression. Here's my script:
>> 
>> #!/bin/bash
>> 
>> realcmd=/path/to/pgcc
>> echo "original args: $@"
>> newargs=$(echo "$@" | sed s/-pthread//)
>> echo "new args: $newargs"
>> #$realcmd $newargs
>> exit
>> 
>> And here's what happens when I run it:
>> 
>> /path/to/pgcc -E conftest.c
>> original args: -E conftest.c
>> new args: conftest.c
>> 
>> As you can see, the -E argument is getting lost in translation. If I add 
>> more arguments, it works fine:
>> 
>> /path/to/pgcc -A -B -C -D -E conftest.c
>> original args: -A -B -C -D -E conftest.c
>> new args: -A -B -C -D -E conftest.c
>> 
>> It only seems to be a problem when -E is the first argument:
>> 
>> $ /path/to/pgcc -E -D -C -B -A conftest.c
>> original args: -E -D -C -B -A conftest.c
>> new args: -D -C -B -A conftest.c
>> 
>> Prentice
>> 
>> On 04/03/2017 02:24 PM, Aaron Knister wrote:
>>> To be thorough couldn't one replace -pthread in the slurm .la files with 
>>> -lpthread? I ran into this last week and this was the solution I was 
>>> thinking about implementing. Having said that, I can't think of a situation 
>>> in which the -pthread/-lpthread argument would be required other than 
>>> linking against statically compiled SLURM libraries and even then I'm not 
>>> so sure about that.
>>> 
>>> -Aaron
>>> 
>>> On 4/3/17 1:46 PM, Åke Sandgren wrote:
 We build slurm with GCC, drop the -pthread arg in the .la files, and
 have never seen any problems related to that. And we do build quite a
 lot of code. And lots of versions of OpenMPI with multiple different
 compilers (and versions).
 
 On 04/03/2017 04:51 PM, Prentice Bisbal wrote:
> This is the second suggestion to rebuild Slurm
> 
> The  other from Åke Sandgren, who recommended this:
> 
>> This usually comes from slurm, so we always do
>> 
>> perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
>> /lap/slurm/${version}/lib/libslurm.la
>> 
>> when installing a new slurm version. Thus no need for a fakepg wrapper.
> 
> I don't really have the luxury to rebuild Slurm at the moment. How would
> I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
> the only option to fix this in slurm, or use Åke's suggestion above?
> 
> If I did use Åke's suggestion above, how would that affect the operation
> of Slurm, or future builds of OpenMPI and any other software that might
> rely on Slurm, particulary with regards to building those apps with
> non-PGI compilers?
> 
> Prentice
> 
> On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:
>> Hi,
>> 
>> The -pthread flag is likely pulled by libtool from the slurm libmpi.la
>>  and/or libslurm.la 
>> Workarounds are
>> - rebuild slurm with PGI
>> - remove the .la files (*.so and/or *.a are enough)
>> - wrap the PGI compiler to ignore the -pthread option
>> 
>> Hope this helps
>> 
>> Gilles
>> 
>> On Monday, April 3, 2017, Prentice Bisbal > > wrote:
>> 
>>Greeting Open MPI users! After being off this list for several
>>years, I'm back! And I need help:
>> 
>>I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
>>version 17.3. I'm using the following configure options:
>> 
>>./configure \
>>  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
>>  --disable-silent-rules \
>>  --enable-shared \
>>  --enable-static \
>>  --enable-mpi-thread-multiple \
>>  --with-pmi=/usr/pppl/slurm/15.08.8 \
>>  --with-hwloc \
>>  --with-verbs \
>>  --with-slurm \
>>  --with-psm \
>>  CC=pgcc \
>>  CFLAGS="-tp x64 -fast" \
>>  CXX=pgc++ \
>>  CXXFLAGS="-tp x64 -fast" \
>>  FC=pgfortran \
>>  FCFLAGS="-tp x64 -fast" \
>>  2>&1 | tee configure.log
>> 
>>Which leads to this error  from

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Andy Riebs

Try
$ printf -- "-E" ...

On 04/03/2017 04:03 PM, Prentice Bisbal wrote:

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is treating 
the '-E' as an argument to echo and interpreting it instead of 
passing it to sed. Since that's used by the configure tests, that's a 
bit of a problem, Just adding another -E before $@, should fix the 
problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper 
script for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If I 
add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la files 
with -lpthread? I ran into this last week and this was the solution 
I was thinking about implementing. Having said that, I can't think 
of a situation in which the -pthread/-lpthread argument would be 
required other than linking against statically compiled SLURM 
libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, �ke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build quite a
lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from �ke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. 
How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm with 
PGI

the only option to fix this in slurm, or use �ke's suggestion above?

If I did use �ke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software that 
might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
  --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead to 
that
workaround, stating that I had no problem compiling Open MPI 
with

PGI v9. I'm assuming the problem now is that I'm specifying
--enable-mpi-thre

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal

A coworker came up with another idea that works, too:

newargs=sed s/-pthread//g <
Try
$ printf -- "-E" ...

On 04/03/2017 04:03 PM, Prentice Bisbal wrote:

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is 
treating the '-E' as an argument to echo and interpreting it instead 
of passing it to sed. Since that's used by the configure tests, 
that's a bit of a problem, Just adding another -E before $@, should 
fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper 
script for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If 
I add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la 
files with -lpthread? I ran into this last week and this was the 
solution I was thinking about implementing. Having said that, I 
can't think of a situation in which the -pthread/-lpthread 
argument would be required other than linking against statically 
compiled SLURM libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, �ke Sandgren wrote:

We build slurm with GCC, drop the -pthread arg in the .la files, and
have never seen any problems related to that. And we do build 
quite a

lot of code. And lots of versions of OpenMPI with multiple different
compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from �ke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. 
How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm 
with PGI
the only option to fix this in slurm, or use �ke's suggestion 
above?


If I did use �ke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software 
that might

rely on Slurm, particulary with regards to building those apps with
non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for several
years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
--prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  --enable-mpi-thread-multiple \
  --with-pmi=/usr/pppl/slurm/15.08.8 \
  --with-hwloc \
  --with-verbs \
  --with-slurm \
  --with-psm \
  CC=pgcc \
  CFLAGS="-tp x64 -fast" \
  CXX=pgc++ \
  CXXFLAGS="-tp x64 -fast" \
  FC=pgfortran \
  FCFLAGS="-tp x64 -fast" \
  2>&1 | tee configure.log

Which leads to this error  from libtool during make:

pgcc-Error-Unknown switch: -pthread

I've searched the archives, which ultimately lead to this work
around from 2009:

https://www.open-mpi.org/community/lists/users/2009/04/8724.php


Interestingly, I participated in the discussion that lead 
to that
workaround, stating that I had no problem compiling Open 
MPI with

P

Re: [OMPI users] MPI_WAIT hangs after a call to MPI_CANCEL

2017-04-03 Thread McGrattan, Kevin B. Dr. (Fed)
Thanks, George.

Are persistent send/receives matched from the start of the calculation? If so, 
then I guess MPI_CANCEL won’t work.

I don’t think Open MPI is the problem. I think there is something wrong with 
our cluster in that it just seems to hang up on these big packages. The 
calculation successfully exchanges hundreds or thousands before just hanging.

I’m not sure I understand completely your recommendation for dumping 
diagnostics. Is this documented somewhere?

Thanks

Kevin



From: George Bosilca [mailto:bosi...@icl.utk.edu]
Sent: Monday, April 03, 2017 2:29 PM
To: Open MPI Users 
Cc: McGrattan, Kevin B. Dr. (Fed) 
Subject: Re: [OMPI users] MPI_WAIT hangs after a call to MPI_CANCEL

Kevin,

In Open MPI we only support cancelling non-yet matched receives. So, you cannot 
cancel sends nor receive requests that have already been matched. While the 
latter are supposed to complete (otherwise they would not have been matched), 
the former are trickier to complete if the corresponding receive is never 
posted.

To sum this up, the bad news is that there is no way to correctly cancel MPI 
requests without hitting deadlock.

That being said, I can hardly understand how Open MPI can drop a message. There 
might be something else in here, that is more difficult to spot. We do have an 
internal way to dump all pending (or known) communication. Assuming you are 
using the OB1 PML here is how you dump all known communications. Attach to a 
process and find the communicator pointer (you will need to convert between the 
F90 communicator and the C pointer) and then call mca_pml.pml_dump( commptr, 1).

Also, it is possible to check how one of the more recent versions of Open MPI 
(> 2.1) behave with your code ?

  George.




On Sat, Apr 1, 2017 at 12:40 PM, McGrattan, Kevin B. Dr. (Fed) 
mailto:kevin.mcgrat...@nist.gov>> wrote:
I am running a large computational fluid dynamics code on a linux cluster 
(Centos 6.8, Open MPI 1.8.4). The code is written in Fortran and compiled with 
Intel Fortran 16.0.3. The cluster has 36 nodes, each node has two sockets, each 
socket has six cores. I have noticed that the code hangs when the size of the 
packages exchanged using a persistent send and receive call become large. I 
cannot say exactly how large, but generally on the order of 10 MB. Rather than 
let the code just hang, I implemented a timing loop using MPI_TESTALL. If 
MPI_TESTALL fails to return successfully after, say, 10 minutes, I attempt to 
MPI_CANCEL the unsuccessful request(s) and continue on with the calculation, 
even if the communication(s) did not succeed. It would not necessarily cripple 
the calculation if a few MPI communications were unsuccessful. This is a 
snippet of code that tests if the communications are successful and attempts to 
cancel if not:

   START_TIME = MPI_WTIME()
   FLAG = .FALSE.
   DO WHILE(.NOT.FLAG)
  CALL MPI_TESTALL(NREQ,REQ(1:NREQ),FLAG,ARRAY_OF_STATUSES,IERR)
  WAIT_TIME = MPI_WTIME() - START_TIME
  IF (WAIT_TIME>TIMEOUT) THEN
 WRITE(LU_ERR,'(A,A,I6,A,A)') ‘Request timed out for MPI process 
',MYID,' running on ',PNAME(1:PNAMELEN)
 DO NNN=1,NREQ
IF (ARRAY_OF_STATUSES(1,NNN)==MPI_SUCCESS) CYCLE
CALL MPI_CANCEL(REQ(NNN),IERR)
write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_CANCEL'
CALL MPI_WAIT(REQ(NNN),STATUS,IERR)
write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_WAIT'
CALL MPI_TEST_CANCELLED(STATUS,FLAG2,IERR)
write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_TEST_CANCELLED'
 ENDDO
 ENDIF
   ENDDO

The job still hangs, and when I look at the error file, I see that on MPI 
process A, one of the sends has not completed, and on process B, one of the 
receives has not completed. The failed send and failed receive are consistent – 
that is they are matching. What I do not understand is that for both the 
uncompleted send and receive, the code hangs in MPI_WAIT. That is, I do not get 
the printout that says that the process has returned from MPI_WAIT. I interpret 
this to mean that either some of the large message has been sent or received, 
but not all. The MPI standard seems a bit vague on what is supposed to happen 
if part of the message simply disappears due to some network glitch. These 
errors occur after hundreds or thousands of successful exchanges. They never 
happen at the same point in the calculation. They are random, but they occur 
only when the messages are large (like MBs). When the messages are not large, 
the code can run for days or weeks without errors.

So why does MPI_WAIT hang? The MPI standard says

“If a communication is marked for cancellation, then an 
MPI_Wait call for that 
communication is guaranteed to return, irrespective of the activities of other 
processes (i.e., 
MPI_Wait behaves as a 
local function)” (http

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Sebastian Rinke
Thank you very much for the quick response!

Do I need to configure with certain flags to enable the 
hardware put/get support?

Sebastian

On 03 Apr 2017, at 18:02, Nathan Hjelm  wrote:

> 
> 
> On Apr 03, 2017, at 08:36 AM, Sebastian Rinke  
> wrote:
> 
>> Dear all,
>> 
>> I’m using passive target sync. in my code and would like to
>> know how well it is supported in Open MPI.
>> 
>> In particular, the code is some sort of particle tree code that uses a 
>> distributed tree and every rank
>> gets non-local tree nodes that are needed for its own computation from other 
>> ranks
>> on demand, i.e.:
>> 
>> Win_lock(target)
>> 
>> Get()
>> Get()
>> …
>> Get()
>> 
>> (up to 8 Gets)
>> 
>> Win_unlock(target)
>> 
>> After closing the access epoch with Win_unlock(target),
>> the rank looks at the nodes that it got and decides if it needs to get
>> more non-local nodes in the same fashion.
>> 
>> Unfortunately, this implementation blocks until the access epoch is 
>> completed for one particle.
>> As every rank needs to do the same for several particles, it would be better
>> to use Rget and start processing other particles in the meantime already.
>> From time to time the pending Rgets are then checked for completion and 
>> the corresponding particle can progress.
>> 
>> My questions are:
>> 
>> 1) Does Get and Rget use network hardware support on Infiniband (IB) for 
>> contiguous data?
> 
> 
> In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use 
> the two-sided implementation which may or may not use the hardware put/get 
> support.
>  
>> 
>> 
>> 2) How is RMA progress achieved for IB? Is there a progress thread option 
>> available?
> 
> 
> Progress threads are generally not needed for progressing RMA with Open MPI 
> v2.0.0+. The only exception is when we have to queue up the operation (which 
> may be the case with get). You can get origin-side progress by making another 
> RMA call or by waiting on an operation initiated with on of the request-based 
> calls.
> 
> If you want to progress each get independently you should use Rget.
>  
>> 
>> 
>> 3) If there is no progress thread option, would it be useful to use 
>> MPI_THREAD_MULTIPLE
>> and have a pthread testing on a request that will not be satisfied? 
>> Would this be a reasonable option to ensure progress in MPI?
>> 
>> E.g.:
>> while (1)
>> MPI_Test()
> 
> 
> This will get you progress but isn't possible with Open MPI v1.10.x and 
> older. MPI_THREAD_MULTIPLE is only really supported from v2.0.0.
>  
> 
> -Nathan
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Prentice Bisbal
FYI - the proposed 'here-doc' solution below didn't work for me, it 
produced an error. Neither did printf. When I used printf, only the 
first arg was passed along:


#!/bin/bash

realcmd=/usr/pppl/pgi/17.3/linux86-64/17.3/bin/pgcc.real
echo "original args: $@"
newargs=$(printf -- "$@" | sed s/-pthread//g)
echo "new args: $newargs"
#$realcmd $newargs
exit

$ pgcc -tp=x64 -fast conftest.c
original args: -tp=x64 -fast conftest.c
new args: -tp=x64

Any ideas what I might be doing wrong here?

So, my original echo "" "$@" solution works, and another colleague also 
suggested this expressions, which appears to work, too:


newargs=${@/-pthread/}

Although I don't know how portable that is. I'm guessing that's very 
bash-specific syntax.


Prentice

On 04/03/2017 04:26 PM, Prentice Bisbal wrote:

A coworker came up with another idea that works, too:

newargs=sed s/-pthread//g <
Try
$ printf -- "-E" ...

On 04/03/2017 04:03 PM, Prentice Bisbal wrote:

Okay. the additional -E doesn't work,either. :(

Prentice Bisbal Lead Software Engineer Princeton Plasma Physics 
Laboratory http://www.pppl.gov

On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
Nevermind. A coworker helped me figure this one out. Echo is 
treating the '-E' as an argument to echo and interpreting it 
instead of passing it to sed. Since that's used by the configure 
tests, that's a bit of a problem, Just adding another -E before $@, 
should fix the problem.


Prentice

On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
I've decided to work around this problem by creating a wrapper 
script for pgcc that strips away the -pthread argument, but my sed 
expression works on the command-line, but not in the script. I'm 
essentially reproducing the workaround from 
https://www.open-mpi.org/community/lists/users/2009/04/8724.php.


Can anyone see what's wrong with my implementation the workaround? 
It's a very simple sed expression. Here's my script:


#!/bin/bash

realcmd=/path/to/pgcc
echo "original args: $@"
newargs=$(echo "$@" | sed s/-pthread//)
echo "new args: $newargs"
#$realcmd $newargs
exit

And here's what happens when I run it:

 /path/to/pgcc -E conftest.c
original args: -E conftest.c
new args: conftest.c

As you can see, the -E argument is getting lost in translation. If 
I add more arguments, it works fine:


/path/to/pgcc -A -B -C -D -E conftest.c
original args: -A -B -C -D -E conftest.c
new args: -A -B -C -D -E conftest.c

It only seems to be a problem when -E is the first argument:

$ /path/to/pgcc -E -D -C -B -A conftest.c
original args: -E -D -C -B -A conftest.c
new args: -D -C -B -A conftest.c

Prentice

On 04/03/2017 02:24 PM, Aaron Knister wrote:
To be thorough couldn't one replace -pthread in the slurm .la 
files with -lpthread? I ran into this last week and this was the 
solution I was thinking about implementing. Having said that, I 
can't think of a situation in which the -pthread/-lpthread 
argument would be required other than linking against statically 
compiled SLURM libraries and even then I'm not so sure about that.


-Aaron

On 4/3/17 1:46 PM, �ke Sandgren wrote:
We build slurm with GCC, drop the -pthread arg in the .la files, 
and
have never seen any problems related to that. And we do build 
quite a
lot of code. And lots of versions of OpenMPI with multiple 
different

compilers (and versions).

On 04/03/2017 04:51 PM, Prentice Bisbal wrote:

This is the second suggestion to rebuild Slurm

The  other from �ke Sandgren, who recommended this:


This usually comes from slurm, so we always do

perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
/lap/slurm/${version}/lib/libslurm.la

when installing a new slurm version. Thus no need for a fakepg 
wrapper.


I don't really have the luxury to rebuild Slurm at the moment. 
How would
I rebuild Slurm to change this behavior? Is rebuilding Slurm 
with PGI
the only option to fix this in slurm, or use �ke's suggestion 
above?


If I did use �ke's suggestion above, how would that affect the 
operation
of Slurm, or future builds of OpenMPI and any other software 
that might
rely on Slurm, particulary with regards to building those apps 
with

non-PGI compilers?

Prentice

On 04/03/2017 10:31 AM, Gilles Gouaillardet wrote:

Hi,

The -pthread flag is likely pulled by libtool from the slurm 
libmpi.la

 and/or libslurm.la 
Workarounds are
- rebuild slurm with PGI
- remove the .la files (*.so and/or *.a are enough)
- wrap the PGI compiler to ignore the -pthread option

Hope this helps

Gilles

On Monday, April 3, 2017, Prentice Bisbal mailto:pbis...@pppl.gov>> wrote:

Greeting Open MPI users! After being off this list for 
several

years, I'm back! And I need help:

I'm trying to compile OpenMPI 1.10.3 with the PGI compilers,
version 17.3. I'm using the following configure options:

./configure \
--prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \
  --disable-silent-rules \
  --enable-shared \
  --enable-static \
  -

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Reuti
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Am 03.04.2017 um 23:07 schrieb Prentice Bisbal:

> FYI - the proposed 'here-doc' solution below didn't work for me, it produced 
> an error. Neither did printf. When I used printf, only the first arg was 
> passed along:
> 
> #!/bin/bash
> 
> realcmd=/usr/pppl/pgi/17.3/linux86-64/17.3/bin/pgcc.real
> echo "original args: $@"
> newargs=$(printf -- "$@" | sed s/-pthread//g)

The format string is missing:

printf "%s " "$@"


> echo "new args: $newargs"
> #$realcmd $newargs
> exit
> 
> $ pgcc -tp=x64 -fast conftest.c
> original args: -tp=x64 -fast conftest.c
> new args: -tp=x64
> 
> Any ideas what I might be doing wrong here?
> 
> So, my original echo "" "$@" solution works, and another colleague also 
> suggested this expressions, which appears to work, too:
> 
> newargs=${@/-pthread/}
> 
> Although I don't know how portable that is. I'm guessing that's very 
> bash-specific syntax.
> 
> Prentice
> 
> On 04/03/2017 04:26 PM, Prentice Bisbal wrote:
>> A coworker came up with another idea that works, too:
>> 
>> newargs=sed s/-pthread//g <> $@
>> EOF
>> 
>> That should work, too, but I haven't test it.
>> 
>> Prentice
>> 
>> On 04/03/2017 04:11 PM, Andy Riebs wrote:
>>> Try
>>> $ printf -- "-E" ...
>>> 
>>> On 04/03/2017 04:03 PM, Prentice Bisbal wrote:
 Okay. the additional -E doesn't work,either. :(
 
 Prentice Bisbal Lead Software Engineer Princeton Plasma Physics Laboratory 
 http://www.pppl.gov
 On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
> Nevermind. A coworker helped me figure this one out. Echo is treating the 
> '-E' as an argument to echo and interpreting it instead of passing it to 
> sed. Since that's used by the configure tests, that's a bit of a problem, 
> Just adding another -E before $@, should fix the problem.
> 
> Prentice
> 
> On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
>> I've decided to work around this problem by creating a wrapper script 
>> for pgcc that strips away the -pthread argument, but my sed expression 
>> works on the command-line, but not in the script. I'm essentially 
>> reproducing the workaround from 
>> https://www.open-mpi.org/community/lists/users/2009/04/8724.php.
>> 
>> Can anyone see what's wrong with my implementation the workaround? It's 
>> a very simple sed expression. Here's my script:
>> 
>> #!/bin/bash
>> 
>> realcmd=/path/to/pgcc
>> echo "original args: $@"
>> newargs=$(echo "$@" | sed s/-pthread//)
>> echo "new args: $newargs"
>> #$realcmd $newargs
>> exit
>> 
>> And here's what happens when I run it:
>> 
>> /path/to/pgcc -E conftest.c
>> original args: -E conftest.c
>> new args: conftest.c
>> 
>> As you can see, the -E argument is getting lost in translation. If I add 
>> more arguments, it works fine:
>> 
>> /path/to/pgcc -A -B -C -D -E conftest.c
>> original args: -A -B -C -D -E conftest.c
>> new args: -A -B -C -D -E conftest.c
>> 
>> It only seems to be a problem when -E is the first argument:
>> 
>> $ /path/to/pgcc -E -D -C -B -A conftest.c
>> original args: -E -D -C -B -A conftest.c
>> new args: -D -C -B -A conftest.c
>> 
>> Prentice
>> 
>> On 04/03/2017 02:24 PM, Aaron Knister wrote:
>>> To be thorough couldn't one replace -pthread in the slurm .la files 
>>> with -lpthread? I ran into this last week and this was the solution I 
>>> was thinking about implementing. Having said that, I can't think of a 
>>> situation in which the -pthread/-lpthread argument would be required 
>>> other than linking against statically compiled SLURM libraries and even 
>>> then I'm not so sure about that.
>>> 
>>> -Aaron
>>> 
>>> On 4/3/17 1:46 PM, �ke Sandgren wrote:
 We build slurm with GCC, drop the -pthread arg in the .la files, and
 have never seen any problems related to that. And we do build quite a
 lot of code. And lots of versions of OpenMPI with multiple different
 compilers (and versions).
 
 On 04/03/2017 04:51 PM, Prentice Bisbal wrote:
> This is the second suggestion to rebuild Slurm
> 
> The  other from �ke Sandgren, who recommended this:
> 
>> This usually comes from slurm, so we always do
>> 
>> perl -pi -e 's/-pthread//' /lap/slurm/${version}/lib/libpmi.la
>> /lap/slurm/${version}/lib/libslurm.la
>> 
>> when installing a new slurm version. Thus no need for a fakepg 
>> wrapper.
> 
> I don't really have the luxury to rebuild Slurm at the moment. How 
> would
> I rebuild Slurm to change this behavior? Is rebuilding Slurm with PGI
> the only option to fix this in slurm, or use �ke's suggestion above?
> 
> If I did use �ke's suggestion above, how would that affect t

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm

No, support is enabled by default. You can check whether it is working by 
running with --mca osc ^pt2pt . This will disable the two-sided implementation.

-Nathan

On Apr 03, 2017, at 03:02 PM, Sebastian Rinke  wrote:

Thank you very much for the quick response!

Do I need to configure with certain flags to enable the 
hardware put/get support?


Sebastian

On 03 Apr 2017, at 18:02, Nathan Hjelm  wrote:



On Apr 03, 2017, at 08:36 AM, Sebastian Rinke  wrote:

Dear all,

I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a 
distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other 
ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
…
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed 
for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.

My questions are:

1) Does Get and Rget use network hardware support on Infiniband (IB) for 
contiguous data?


In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use 
the two-sided implementation which may or may not use the hardware put/get 
support.



2) How is RMA progress achieved for IB? Is there a progress thread option 
available?


Progress threads are generally not needed for progressing RMA with Open MPI 
v2.0.0+. The only exception is when we have to queue up the operation (which 
may be the case with get). You can get origin-side progress by making another 
RMA call or by waiting on an operation initiated with on of the request-based 
calls.

If you want to progress each get independently you should use Rget.



3) If there is no progress thread option, would it be useful to use 
MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied?
Would this be a reasonable option to ensure progress in MPI?

E.g.:
while (1)
MPI_Test()


This will get you progress but isn't possible with Open MPI v1.10.x and older. 
MPI_THREAD_MULTIPLE is only really supported from v2.0.0.


-Nathan
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_WAIT hangs after a call to MPI_CANCEL

2017-04-03 Thread George Bosilca
On Mon, Apr 3, 2017 at 4:47 PM, McGrattan, Kevin B. Dr. (Fed) <
kevin.mcgrat...@nist.gov> wrote:

> Thanks, George.
>
>
>
> Are persistent send/receives matched from the start of the calculation? If
> so, then I guess MPI_CANCEL won’t work.
>

A persistent request is only matched when it is started. The MPI_Cancel on
a persistent receive doesn't affect the persistent request itself, but
instead only cancel the started instance of the request.


>  I don’t think Open MPI is the problem. I think there is something wrong
> with our cluster in that it just seems to hang up on these big packages.
> The calculation successfully exchanges hundreds or thousands before just
> hanging.
>

While possible, it is highly unlikely that a message gets dropped by the
network without some kind of warning (system log at least). You might want
to take  a look in the dmesg to see if there is nothing unexpected there.


>  I’m not sure I understand completely your recommendation for dumping
> diagnostics. Is this documented somewhere?
>

Unfortunately not, this is basically a developer trick to dump the state of
the MPI library. This goes a little like this. Once you have attached a
debugger to your process (let's assume gdb), you need to find the
communicator where you have posted your requests (I can't help here this is
not part of the code you sent). With  set to this value:

gdb$ p ompi_comm_f_to_c_table.addr[]

will give you the C pointer of the communicator.

gdb$ call mca_pml.pml_dump(
ompi_comm_f_to_c_table.addr[], 1)

should print all the local known messages by the MPI library, including
pending sends and receives. This will also print additional information
(the status of the requests, the tag, the size, and so on) that can be
understood by the developers. If you post the info here, we might be able
to provide additional information on the issue.

George.



>
>
> Thanks
>
>
>
> Kevin
>
>
>
>
>
>
>
> *From:* George Bosilca [mailto:bosi...@icl.utk.edu]
> *Sent:* Monday, April 03, 2017 2:29 PM
> *To:* Open MPI Users 
> *Cc:* McGrattan, Kevin B. Dr. (Fed) 
> *Subject:* Re: [OMPI users] MPI_WAIT hangs after a call to MPI_CANCEL
>
>
>
> Kevin,
>
>
>
> In Open MPI we only support cancelling non-yet matched receives. So, you
> cannot cancel sends nor receive requests that have already been matched.
> While the latter are supposed to complete (otherwise they would not have
> been matched), the former are trickier to complete if the corresponding
> receive is never posted.
>
>
>
> To sum this up, the bad news is that there is no way to correctly cancel
> MPI requests without hitting deadlock.
>
>
>
> That being said, I can hardly understand how Open MPI can drop a message.
> There might be something else in here, that is more difficult to spot. We
> do have an internal way to dump all pending (or known) communication.
> Assuming you are using the OB1 PML here is how you dump all known
> communications. Attach to a process and find the communicator pointer (you
> will need to convert between the F90 communicator and the C pointer) and
> then call mca_pml.pml_dump( commptr, 1).
>
>
>
> Also, it is possible to check how one of the more recent versions of Open
> MPI (> 2.1) behave with your code ?
>
>
>
>   George.
>
>
>
>
>
>
>
>
>
> On Sat, Apr 1, 2017 at 12:40 PM, McGrattan, Kevin B. Dr. (Fed) <
> kevin.mcgrat...@nist.gov> wrote:
>
> I am running a large computational fluid dynamics code on a linux cluster
> (Centos 6.8, Open MPI 1.8.4). The code is written in Fortran and compiled
> with Intel Fortran 16.0.3. The cluster has 36 nodes, each node has two
> sockets, each socket has six cores. I have noticed that the code hangs when
> the size of the packages exchanged using a persistent send and receive call
> become large. I cannot say exactly how large, but generally on the order of
> 10 MB. Rather than let the code just hang, I implemented a timing loop
> using MPI_TESTALL. If MPI_TESTALL fails to return successfully after, say,
> 10 minutes, I attempt to MPI_CANCEL the unsuccessful request(s) and
> continue on with the calculation, even if the communication(s) did not
> succeed. It would not necessarily cripple the calculation if a few MPI
> communications were unsuccessful. This is a snippet of code that tests if
> the communications are successful and attempts to cancel if not:
>
>
>
>START_TIME = MPI_WTIME()
>
>FLAG = .FALSE.
>
>DO WHILE(.NOT.FLAG)
>
>   CALL MPI_TESTALL(NREQ,REQ(1:NREQ),FLAG,ARRAY_OF_STATUSES,IERR)
>
>   WAIT_TIME = MPI_WTIME() - START_TIME
>
>   IF (WAIT_TIME>TIMEOUT) THEN
>
>  WRITE(LU_ERR,'(A,A,I6,A,A)') ‘Request timed out for MPI process
> ',MYID,' running on ',PNAME(1:PNAMELEN)
>
>  DO NNN=1,NREQ
>
> IF (ARRAY_OF_STATUSES(1,NNN)==MPI_SUCCESS) CYCLE
>
> CALL MPI_CANCEL(REQ(NNN),IERR)
>
> write(LU_ERR,*) ‘Request ',NNN,’ returns from MPI_CANCEL'
>
> CALL MPI_WAIT(REQ(NNN),STATUS,IERR)
>
> write(LU_ERR,*)

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Matt Thompson
Coming in near the end here. I've had "fun" with PGI + Open MPI + macOS
(and still haven't quite solved it, see:
https://www.mail-archive.com/users@lists.open-mpi.org//msg30865.html, still
unanswered!) The solution that PGI gave me, and which seems the magic sauce
on macOS is to use a siterc file (
http://www.pgroup.com/userforum/viewtopic.php?p=21105#21105):

=
siterc for gcc commands PGI does not support
=
switch -ffast-math is hide;

switch -pipe is hide;

switch -fexpensive-optimizations is hide;

switch -pthread is
append(LDLIB1= -lpthread);

switch -qversion is
early
help(Display compiler version)
helpgroup(overall)
set(VERSION=YES);

switch -Wno-deprecated-declarations is hide;

switch -flat_namespace is hide;


If you use that, -pthread is "rerouted" to append -lpthread. You might try
that and see if that helps. Since you are on Linux (I assume?), then you
should be able to proceed as you shouldn't encounter the libtool
bug/issue/*shrug* that is breaking macOS use.

On Mon, Apr 3, 2017 at 5:14 PM, Reuti  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
> Am 03.04.2017 um 23:07 schrieb Prentice Bisbal:
>
> > FYI - the proposed 'here-doc' solution below didn't work for me, it
> produced an error. Neither did printf. When I used printf, only the first
> arg was passed along:
> >
> > #!/bin/bash
> >
> > realcmd=/usr/pppl/pgi/17.3/linux86-64/17.3/bin/pgcc.real
> > echo "original args: $@"
> > newargs=$(printf -- "$@" | sed s/-pthread//g)
>
> The format string is missing:
>
> printf "%s " "$@"
>
>
> > echo "new args: $newargs"
> > #$realcmd $newargs
> > exit
> >
> > $ pgcc -tp=x64 -fast conftest.c
> > original args: -tp=x64 -fast conftest.c
> > new args: -tp=x64
> >
> > Any ideas what I might be doing wrong here?
> >
> > So, my original echo "" "$@" solution works, and another colleague also
> suggested this expressions, which appears to work, too:
> >
> > newargs=${@/-pthread/}
> >
> > Although I don't know how portable that is. I'm guessing that's very
> bash-specific syntax.
> >
> > Prentice
> >
> > On 04/03/2017 04:26 PM, Prentice Bisbal wrote:
> >> A coworker came up with another idea that works, too:
> >>
> >> newargs=sed s/-pthread//g < >> $@
> >> EOF
> >>
> >> That should work, too, but I haven't test it.
> >>
> >> Prentice
> >>
> >> On 04/03/2017 04:11 PM, Andy Riebs wrote:
> >>> Try
> >>> $ printf -- "-E" ...
> >>>
> >>> On 04/03/2017 04:03 PM, Prentice Bisbal wrote:
>  Okay. the additional -E doesn't work,either. :(
> 
>  Prentice Bisbal Lead Software Engineer Princeton Plasma Physics
> Laboratory http://www.pppl.gov
>  On 04/03/2017 04:01 PM, Prentice Bisbal wrote:
> > Nevermind. A coworker helped me figure this one out. Echo is
> treating the '-E' as an argument to echo and interpreting it instead of
> passing it to sed. Since that's used by the configure tests, that's a bit
> of a problem, Just adding another -E before $@, should fix the problem.
> >
> > Prentice
> >
> > On 04/03/2017 03:54 PM, Prentice Bisbal wrote:
> >> I've decided to work around this problem by creating a wrapper
> script for pgcc that strips away the -pthread argument, but my sed
> expression works on the command-line, but not in the script. I'm
> essentially reproducing the workaround from https://www.open-mpi.org/
> community/lists/users/2009/04/8724.php.
> >>
> >> Can anyone see what's wrong with my implementation the workaround?
> It's a very simple sed expression. Here's my script:
> >>
> >> #!/bin/bash
> >>
> >> realcmd=/path/to/pgcc
> >> echo "original args: $@"
> >> newargs=$(echo "$@" | sed s/-pthread//)
> >> echo "new args: $newargs"
> >> #$realcmd $newargs
> >> exit
> >>
> >> And here's what happens when I run it:
> >>
> >> /path/to/pgcc -E conftest.c
> >> original args: -E conftest.c
> >> new args: conftest.c
> >>
> >> As you can see, the -E argument is getting lost in translation. If
> I add more arguments, it works fine:
> >>
> >> /path/to/pgcc -A -B -C -D -E conftest.c
> >> original args: -A -B -C -D -E conftest.c
> >> new args: -A -B -C -D -E conftest.c
> >>
> >> It only seems to be a problem when -E is the first argument:
> >>
> >> $ /path/to/pgcc -E -D -C -B -A conftest.c
> >> original args: -E -D -C -B -A conftest.c
> >> new args: -D -C -B -A conftest.c
> >>
> >> Prentice
> >>
> >> On 04/03/2017 02:24 PM, Aaron Knister wrote:
> >>> To be thorough couldn't one replace -pthread in the slurm .la
> files with -lpthread? I ran into this last week and this was the solution I
> was thinking about implementing. Having said that, I can't think of a
> situation in which the -pthread/-lpthread argument would be required other
> than linking against statically compiled SLURM libraries and even then I'm
> not so sure about that.

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Sebastian Rinke
Thanks!
Sebastian

On 03 Apr 2017, at 23:23, Nathan Hjelm  wrote:

> No, support is enabled by default. You can check whether it is working by 
> running with --mca osc ^pt2pt . This will disable the two-sided 
> implementation.
> 
> -Nathan
> 
> On Apr 03, 2017, at 03:02 PM, Sebastian Rinke  
> wrote:
> 
>> Thank you very much for the quick response!
>> 
>> Do I need to configure with certain flags to enable the 
>> hardware put/get support?
>> 
>> Sebastian
>> 
>> On 03 Apr 2017, at 18:02, Nathan Hjelm  wrote:
>> 
>>> 
>>> 
>>> On Apr 03, 2017, at 08:36 AM, Sebastian Rinke  
>>> wrote:
>>> 
 Dear all,
 
 I’m using passive target sync. in my code and would like to
 know how well it is supported in Open MPI.
 
 In particular, the code is some sort of particle tree code that uses a 
 distributed tree and every rank
 gets non-local tree nodes that are needed for its own computation from 
 other ranks
 on demand, i.e.:
 
 Win_lock(target)
 
 Get()
 Get()
 …
 Get()
 
 (up to 8 Gets)
 
 Win_unlock(target)
 
 After closing the access epoch with Win_unlock(target),
 the rank looks at the nodes that it got and decides if it needs to get
 more non-local nodes in the same fashion.
 
 Unfortunately, this implementation blocks until the access epoch is 
 completed for one particle.
 As every rank needs to do the same for several particles, it would be 
 better
 to use Rget and start processing other particles in the meantime already.
 From time to time the pending Rgets are then checked for completion and
 the corresponding particle can progress.
 
 My questions are:
 
 1) Does Get and Rget use network hardware support on Infiniband (IB) for 
 contiguous data?
>>> 
>>> 
>>> In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always 
>>> use the two-sided implementation which may or may not use the hardware 
>>> put/get support.
>>> 
 
 
 2) How is RMA progress achieved for IB? Is there a progress thread option 
 available?
>>> 
>>> 
>>> Progress threads are generally not needed for progressing RMA with Open MPI 
>>> v2.0.0+. The only exception is when we have to queue up the operation 
>>> (which may be the case with get). You can get origin-side progress by 
>>> making another RMA call or by waiting on an operation initiated with on of 
>>> the request-based calls.
>>> 
>>> If you want to progress each get independently you should use Rget.
>>> 
 
 
 3) If there is no progress thread option, would it be useful to use 
 MPI_THREAD_MULTIPLE
 and have a pthread testing on a request that will not be satisfied?
 Would this be a reasonable option to ensure progress in MPI?
 
 E.g.:
 while (1)
 MPI_Test()
>>> 
>>> 
>>> This will get you progress but isn't possible with Open MPI v1.10.x and 
>>> older. MPI_THREAD_MULTIPLE is only really supported from v2.0.0.
>>> 
>>> 
>>> -Nathan
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users