from:"Michael.Rachner"

[OMPI users] Bug in OpenMPI-1.8.1: missing routines mpi_win_allocate_shared, mpi_win_shared_query called from Ftn95-code

2014-06-05 Thread Michael.Rachner

Dear developers of OpenMPI,

I found that when building an executable from a Fortran95-code on a LINUX 
cluster with OpenMPI-1.8.1 (and INTEL-14.0.2 Ftn-compiler)
the following two MPI-3 routines do not exist:

/dat/KERNEL/mpi3_sharedmem.f90:176: undefined reference to 
`mpi_win_allocate_shared_'
/dat/KERNEL/mpi3_sharedmem.f90:198: undefined reference to 
`mpi_win_shared_query_'

It is just these 2 routines, which are  necessary for the MPI3- shared memory 
access of the same Fortran array from different processes on same node.
This is a breakthrough enabled by MPI-3, and for me the most important new 
feature of MPI-3,
because it allows saving much storage in the Ftn-code  and reduces a lot of 
MPI-data transmission required otherwise.

Can you tell  me, when these 2 important MPI-routines will be available?

Thank You
Michael Rachner



Details:

Version of MPI library used in this run:
[1,0]:  Open MPI v1.8.1, package: Open MPI hpcoft14@cl3fr4 
Distribution, ident: 1.8.1, Apr 22, 2014

d000 cl3fr1 230$mpif90 -show
ifort -I/opt/mpi/openmpi/1.8.1-intel-14.0.2/include 
-I/opt/mpi/openmpi/1.8.1-intel-14.0.2/lib -L/opt/system/torque/4.2.7/lib 
-Wl,-rpath -Wl,/opt/system/torque/4.2.7/lib -Wl,-rpath 
-Wl,/opt/system/torque/4.2.7/lib -Wl,-rpath -Wl,/opt/system/torque/4.2.7/lib 
-Wl,-rpath -Wl,/opt/system/torque/4.2.7/lib -Wl,-rpath 
-Wl,/opt/mpi/openmpi/1.8.1-intel-14.0.2/lib -Wl,--enable-new-dtags 
-L/opt/mpi/openmpi/1.8.1-intel-14.0.2/lib -lmpi_usempif08 
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
d000 cl3fr1 231$

--end of the email ---

Re: [OMPI users] latest stable and win7/msvc2013

2014-07-17 Thread Michael.Rachner

Dear people,

As a continuation of the hint of   Damien   , who suggested using MPICH on WIN7 
:

MPICH has already stopped supporting WINDOWS in the past. MPICH recommends 
using  MS-MPI for WINDOWS, which is a derivative from MPICH2.
You may download the binary (for free) from the landing page for MS-MPI:  
http://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx
Most recent version is:version: 4.2.4400.0   Date published: 1/14/2014
However note, that MS-MPI does conform to MPI-2, not yet to MPI-3.
The installation is easy and it works well with my CFD-code under WIN7 (64-bit).

Greetings
Michael Rachner



Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Damien
Gesendet: Mittwoch, 16. Juli 2014 18:15
An: us...@open-mpi.org
Betreff: Re: [OMPI users] latest stable and win7/msvc2013

Guys,

Don't do it.  It doesn't work at all.  I couldn't pick up maintenance of it 
either, and the majority of the Windows support is removed as Ralph said.  Just 
use MPICH for Windows work and save yourself the pain.

Cheers,

Damien
On 2014-07-16 9:57 AM, Nathan Hjelm wrote:

It likely won't build because last I check the Microsoft toolchain does

not fit the minimum requirements (C99 or higher). You will have better

luck with either gcc or intel's compiler.



-Nathan



On Wed, Jul 16, 2014 at 04:52:53PM +0100, MM wrote:

hello,

I'm about to try to build 1.8.1 with win msvc2013 toolkit in 64bit mode.

I know the win binaries were dropped after failure to find someone to

pick them up (following shiqin departure), and i'm afraid I wouldn't

volunteer due to lack of time, but is there any general advice before

I start?



rds,

MM

___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/07/24787.php




___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/07/24789.php

Re: [OMPI users] latest stable and win7/msvc2013 and shared memory feature

2014-07-18 Thread Michael.Rachner

Dear Mr. Tillier  and  other MPI-developers,

I am glad to hear that MS-MPI development is still active   and interested in 
User feature requests.

You want User feature requests for your further MS-MPI development?
Here is my request (I am doing Fortran CFD-code development for decades now 
under WINDOWS and LINUX): 
--Extend MS-MPI to support MPI-3  in Fortran95-codes.
  Therein: The most important feature for me (and I think for many other 
users  too, but they possibly don't even know that there exist such a fine 
feature in MPI-3)
   is the MPI-3 shared memory feature.
   It  requires these 3 MPI-routines:  sbr  MPI_WIN_SHARED_ALLOC,  
MPI_WIN_SHARED_QUERY,  MPI_WIN_FREE
   along with the Ftn2003 routine  C_F_POINTER   and the Ftn2003 intrinsic 
module   ISO_C_BINDING(both already contained in INTEL-Ftn95-compiler).
 This shared memory feature allows using the same Fortran array (reading 
and writing access) on MPI-processes running on the same node.
 A breakthrough in the Fortran world, enabled by MPI-3  . The savings in 
storage and the reduction of the amount of MPI-data transfer can be huge!

My knowledge on the state of the art of realizing that feature by 
MPI-developers:
   - That MPI-3 shared memory feature works fine with MPICH-3.0.4 .
   - It does not yet work with OPENMPI-1.8.1 (but shall be working in 1.8.2) .
   - It still has a bug in MVAPICH2 2.0rc2  (at begin of June 2014 they 
answered me to look at the problem) .
   - It is not supported by INTEL-MPI 4.1.1  (only MPI-2 so far).
   - It is not supported by MS-MPI 4.2.4400.0  (only MPI-2 so far).

Maybe this encourages you and your MPI-teams to provide that feature soon.

Greetings to you all!
  Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Fab Tillier
Gesendet: Freitag, 18. Juli 2014 00:30
An: Jed Brown; Damien; us...@open-mpi.org
Cc: MPI External Communications
Betreff: Re: [OMPI users] latest stable and win7/msvc2013

[resending now that I've joined the Open MPI users list, sorry for the 
duplicate]

Hi Jed,

Thanks for looping me on this mail thread. 

Jed Brown wrote on Thu, 17 Jul 2014 at 11:19:42

> Damien  writes:
> 
>> Is this something that could be funded by Microsoft, and is it time 
>> to approach them perhaps?  MS MPI is based on MPICH, and if mainline 
>> MPICH isn't supporting Windows anymore, then there won't be a whole 
>> lot of development in an increasingly older Windows build. With the 
>> Open-MPI roadmap, there's a lot happening.

Open-MPI isn't supporting Windows anymore either, and I would think it fair to 
say that a lot is happening in both Open-MPI and MPICH (for non-Windows 
environments).

>> Would it be a
>> better business model for MS to piggy-back off of Open-MPI ongoing 
>> innovation, and put their resources into maintaining a Windows build 
>> of Open-MPI instead?

Microsoft doesn't simply maintain a Windows build of MPICH.  While MS-MPI is 
derived from MPICH, at this point it is really a more of a fork given how much 
Windows-specific work we've done that isn't applicable to the mainline MPICH 
development.  We're continuing to invest in the development of MS-MPI, and our 
focus continues to be on user-requested features.  We strongly believe that 
users care more about feature content than which codebase we are derived from - 
after all, portability is one of the main goals of the MPI standard.

We've worked very hard to maintain ABI over the various versions of MS-MPI, and 
a fundamental shift to a different implementation would wreak havoc on users 
and our ISV partners.

> Maybe Fab can comment on Microsoft's intentions regarding MPI and
> C99/C11 (just dreaming now).

I can't really comment on the C99/C11 stuff, as that's a completely different 
organization within Microsoft.  Rob seems to have shed some light on this 
(thanks for finding that Rob!)

>From an MPI perspective, we've been investing in making ourselves available to 
>our user and developer community, whether through email 
>(mailto:ask...@microsoft.com, CC'd), through our beta program on Microsoft 
>Connect (https://connect.microsoft.com/HPC/MS-MPI), where users can request 
>(and vote for) features (https://connect.microsoft.com/HPC/Feedback), or 
>through our web forums 
>(http://social.microsoft.com/Forums/en-US/home?forum=windowshpcmpi).  We'd 
>very much like to get input from our user community to help shape our features 
>content going forward.

I'm not familiar with PETSc, but would be happy to develop a closer 
relationship with the developers to enable better integration of MS-MPI into 
the PETSc environment.  Conceptually, a --download-msmpi option would be great, 
and we already allow redistribution of our installer package with third party 
applications (to enable bundling) if that makes more sense.

-Fab

>> On 2014-07-17 11:42 AM, Jed Brown wrote:
>>> Rob Latham  writes:
 Well, I (and dgoodell and jsquyers a

[OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-24 Thread Michael.Rachner

Dear developers of OPENMPI,

I am running a small downsized Fortran-testprogram for shared memory allocation 
(using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
Intel-14.0.4 /Intel-13.0.1, respectively.

The program simply allocates a sequence of shared data windows, each consisting 
of 1 integer*4-array.
None of the windows is freed, so the amount of allocated data  in shared 
windows raises during the course of the execution.

That worked well on the 1st cluster (Laki, having 8 procs per node))  when 
allocating even 1000 shared windows each having 5 integer*4 array elements,
i.e. a total of  200 MBytes.
On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
login node, but it did NOT work on a compute node.
In that error case, there occurs something like an internal storage limit of ~ 
140 MB for the total storage allocated in all shared windows.
When that limit is reached, all later shared memory allocations fail (but 
silently).
So the first attempt to use such a bad shared data window results in a bus 
error due to the bad storage address encountered.

That strange behavior could be observed in the small testprogram but also with 
my large Fortran CFD-code.
If the error occurs, then it occurs with both codes, and both at a storage 
limit of  ~140 MB.
I found that this storage limit depends only weakly on  the number of processes 
(for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 132.2 MB)

Note that the shared memory storage available on both clusters was very large 
(many GB of free memory).

Here is the error message when running with np=2 and an  array dimension of 
idim_1=5  for the integer*4 array allocated per shared window
on the compute node of Cluster5:
In that case, the error occurred at the 723-th shared window, which is the 1st 
badly allocated window in that case:
(722 successfully allocated shared windows * 5 array elements * 4 Bytes/el. 
= 144.4 MB)


[1,0]: on nodemaster: iwin= 722 :
[1,0]:  total storage [MByte] alloc. in shared windows so far:   
144.4000
[1,0]: === allocation of shared window no. iwin= 723
[1,0]:  starting now with idim_1=   5
[1,0]: on nodemaster for iwin= 723 : before writing on 
shared mem
[1,0]:[r5i5n13:12597] *** Process received signal ***
[1,0]:[r5i5n13:12597] Signal: Bus error (7)
[1,0]:[r5i5n13:12597] Signal code: Non-existant physical address (2)
[1,0]:[r5i5n13:12597] Failing at address: 0x7fffe08da000
[1,0]:[r5i5n13:12597] [ 0] 
[1,0]:/lib64/libpthread.so.0(+0xf800)[0x76d67800]
[1,0]:[r5i5n13:12597] [ 1] ./a.out[0x408a8b]
[1,0]:[r5i5n13:12597] [ 2] ./a.out[0x40800c]
[1,0]:[r5i5n13:12597] [ 3] 
[1,0]:/lib64/libc.so.6(__libc_start_main+0xe6)[0x769fec36]
[1,0]:[r5i5n13:12597] [ 4] [1,0]:./a.out[0x407f09]
[1,0]:[r5i5n13:12597] *** End of error message ***
[1,1]:forrtl: error (78): process killed (SIGTERM)
[1,1]:Image  PCRoutineLine  
  Source
[1,1]:libopen-pal.so.6   74B74580  Unknown   
Unknown  Unknown
[1,1]:libmpi.so.177267F3E  Unknown   
Unknown  Unknown
[1,1]:libmpi.so.17733B555  Unknown   
Unknown  Unknown
[1,1]:libmpi.so.17727DFFD  Unknown   
Unknown  Unknown
[1,1]:libmpi_mpifh.so.2  7779BA03  Unknown   
Unknown  Unknown
[1,1]:a.out  00408D15  Unknown   
Unknown  Unknown
[1,1]:a.out  0040800C  Unknown   
Unknown  Unknown
[1,1]:libc.so.6  769FEC36  Unknown   
Unknown  Unknown
[1,1]:a.out  00407F09  Unknown   
Unknown  Unknown
--
mpiexec noticed that process rank 0 with PID 12597 on node r5i5n13 exited on 
signal 7 (Bus error).
--


The small Ftn-testprogram was built by
  mpif90 sharedmemtest.f90
  mpiexec -np 2 -bind-to core -tag-output ./a.out

Why does it work on the Laki  (both on login-node and on a compute node)  as 
well as on the login-node of Cluster5,
but fails on an compute node of Cluster5?

Greetings
   Michael Rachner

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner

Dear Mr. Squyres.

We will try to install your bug-fixed nigthly tarball of 2014-10-24 on Cluster5 
to see whether it works or not.
The installation however will take some time. I get back to you, if I know more.

Let me add the information that on the Laki each nodes has 16 GB of shared 
memory (there it worked),
the login-node on Cluster 5 has 64 GB (there it worked too), whereas the 
compute nodes on Cluster5 have 128 GB (there it did not work).
So possibly the bug might have something to do with the size of the physical 
shared memory available on the node.

Greetings
Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/



On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for shared memory 
> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
> Intel-14.0.4 /Intel-13.0.1, respectively.
>  
> The program simply allocates a sequence of shared data windows, each 
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data  in shared 
> windows raises during the course of the execution.
>  
> That worked well on the 1st cluster (Laki, having 8 procs per node))  
> when allocating even 1000 shared windows each having 5 integer*4 array 
> elements, i.e. a total of  200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of 
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but 
> silently).
> So the first attempt to use such a bad shared data window results in a bus 
> error due to the bad storage address encountered.
>  
> That strange behavior could be observed in the small testprogram but also 
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage 
> limit of  ~140 MB.
> I found that this storage limit depends only weakly on  the number of 
> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
> 132.2 MB)
>  
> Note that the shared memory storage available on both clusters was very large 
> (many GB of free memory).
>  
> Here is the error message when running with np=2 and an  array 
> dimension of idim_1=5  for the integer*4 array allocated per shared 
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the 
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 5 array elements * 4 
> Bytes/el. = 144.4 MB)
>  
>  
> [1,0]: on nodemaster: iwin= 722 :
> [1,0]:  total storage [MByte] alloc. in shared windows so far:   
> 144.4000
> [1,0]: === allocation of shared window no. iwin= 723
> [1,0]:  starting now with idim_1=   5
> [1,0]: on nodemaster for iwin= 723 : before writing 
> on shared mem
> [1,0]:[r5i5n13:12597] *** Process received signal *** 
> [1,0]:[r5i5n13:12597] Signal: Bus error (7) 
> [1,0]:[r5i5n13:12597] Signal code: Non-existant physical 
> address (2) [1,0]:[r5i5n13:12597] Failing at address: 
> 0x7fffe08da000 [1,0]:[r5i5n13:12597] [ 0] 
> [1,0]:/lib64/libpthread.so.0(+0xf800)[0x76d67800]
> [1,0]:[r5i5n13:12597] [ 1] ./a.out[0x408a8b] 
> [1,0]:[r5i5n13:12597] [ 2] ./a.out[0x40800c] 
> [1,0]:[r5i5n13:12597] [ 3] 
> [1,0]:/lib64/libc.so.6(__libc_start_main+0xe6)[0x769fec36]
> [1,0]:[r5i5n13:12597] [ 4] [1,0]:./a.out[0x407f09] 
> [1,0]:[r5i5n13:12597] *** End of error message ***
> [1,1]:forrtl: error (78): process killed (SIGTERM)
> [1,1]:Image  PCRoutineLine
> Source
> [1,1]:libopen-pal.so.6   74B74580  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.177267F3E  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.17733B555  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.17727DFFD  Unknown   
> Unknown  Unknown
> [1,1]:libmpi_mpifh.so.2  7779BA03  Unknown   
> Unknown  Unknown
> [1,1]:a.

[OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner

Dear developers of OPENMPI,

We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
correctly on the login-node.
As before the program aborts on the compute node, but now with a different 
error message: 

The following message appears when launching the program with 2 processes: 
mpiexec -np 2 -bind-to core -tag-output ./a.out

[1,0]: on nodemaster: iwin= 685 :
[1,0]:  total storage [MByte] alloc. in shared windows so far:   
137.
[ [1,0]: === allocation of shared window no. iwin= 686
[1,0]:  starting now with idim_1=   5
-
It appears as if there is not enough space for 
/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n13 (the 
shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

  Local host:  r5i5n13
  Space Requested: 204256 B
  Space Available: 208896 B
--
[r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared
[r5i5n13:26917] *** reported by process [3154051073,140733193388032]
[r5i5n13:26917] *** on communicator MPI_COMM_WORLD
[r5i5n13:26917] *** MPI_ERR_INTERN: internal error
[r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[r5i5n13:26917] ***and potentially your MPI job)
rachner@r5i5n13:~/dat>

When I repeat the run using 24 processes (on same compute node) the same kind 
of abort message occurs, but earlier:

[1,0]: on nodemaster: iwin= 231 :
[1,0]:  total storage [MByte] alloc. in shared windows so far:   
46.2
 [1,0]: === allocation of shared window no. iwin= 232
[1,0]:  starting now with idim_1=   5
-
It appears as if there is not enough space for 
/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n13 (the 
shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

  Local host:  r5i5n13
  Space Requested: 204784 B
  Space Available: 131072 B
--
[r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared
[r5i5n13:26947] *** reported by process [3147628545,140733193388032]
[r5i5n13:26947] *** on communicator MPI_COMM_WORLD
[r5i5n13:26947] *** MPI_ERR_INTERN: internal error
[r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[r5i5n13:26947] ***and potentially your MPI job)
rachner@r5i5n13:~/dat>

So the problem is not yet resolved.

Greetings
 Michael Rachner

-Ursprüngliche Nachricht-
Von: Rachner, Michael 
Gesendet: Montag, 27. Oktober 2014 11:49
An: 'Open MPI Users'
Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Dear Mr. Squyres.

We will try to install your bug-fixed nigthly tarball of 2014-10-24 on Cluster5 
to see whether it works or not.
The installation however will take some time. I get back to you, if I know more.

Let me add the information that on the Laki each nodes has 16 GB of shared 
memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
not work).
So possibly the bug might have something to do with the size of the physical 
shared memory available on the node.

Greetings
Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/

On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for sh

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner



-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Montag, 27. Oktober 2014 14:49
An: Open MPI Users
Betreff: Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

Could you please run
mpirun -np 1 df -h
mpirun -np 1 df -hi
on both compute and login nodes

Thanks

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>---
>--- [r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032] 
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD [r5i5n13:26917] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204784 B
>  Space Available: 131072 B
>---
>--- [r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26947] *** reported by process [3147628545,140733193388032] 
>[r5i5n13:26947] *** on communicator MPI_COMM_WORLD [r5i5n13:26947] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26947] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>So the problem is not yet resolved.
>
>Greetings
> Michael Rachner
>
>
>
>
>
>
>-Ursprüngliche Nachricht-
>Von: Rachner, Michael
>Gesendet: Montag, 27. Oktober 2014 11:49
>An: 'Open MPI Users'
>Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
>shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Dear Mr. Squyres.
>
>We will try to install your bug-fixed nigthly tarball of 2014-10-24 on 
>Cluster5 to see whether it works or not.
>The installation however will take some time. I get back to you, if I know 
>more.
>
>Let me add the information that on the Laki each nodes has 16 GB of shared 
>memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
>worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
>not work).
>So possibly the bug might have something to do with the size of the physical 
>shared memory available on the node.
>
>Greetings
>Michael Rachner
>
>-Ursprüngliche Nachricht-
>Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
>Squyres (jsquyres)
>Gesendet: Freitag, 24. Oktober 2014 22:45
>An: Open MPI User's List
>Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3:

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner

Dear Gilles,

This is  the system response on the login node of cluster5:

cluster5:~/dat> mpirun -np 1 df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda31  228G  5.6G  211G   3% /
udev 32G  232K   32G   1% /dev
tmpfs32G 0   32G   0% /dev/shm
/dev/sda11  291M   39M  237M  15% /boot
/dev/gpfs10 495T  280T  216T  57% /gpfs10
/dev/loop1  3.2G  3.2G 0 100% /media
cluster5:~/dat> mpirun -np 1 df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda3115M  253K   15M2% /
udev0 0 0 - /dev
tmpfs7.9M 3  7.9M1% /dev/shm
/dev/sda1176K41   76K1% /boot
/dev/gpfs10  128M   67M   62M   53% /gpfs10
/dev/loop1  0 0 0 - /media
cluster5:~/dat>


And this the system response on the compute node of cluster5:

rachner@r5i5n13:~>  mpirun -np 1 df -h
Filesystem  Size  Used Avail Use% Mounted on
tmpfs63G  1.4G   62G   3% /
udev 63G   92K   63G   1% /dev
tmpfs63G 0   63G   0% /dev/shm
tmpfs   150M   12M  139M   8% /tmp
/dev/gpfs10 495T  280T  216T  57% /gpfs10
rachner@r5i5n13:~>  mpirun -np 1 df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 16M   63K   16M1% /
udev0 0 0 - /dev
tmpfs 16M 3   16M1% /dev/shm
tmpfs 16M   183   16M1% /tmp
/dev/gpfs10  128M   67M   62M   53% /gpfs10
rachner@r5i5n13:~>

You wrote: 
"From the logs, the error message makes sense to me : there is not enough space 
in /tmp Since the compute nodes have a lot of memory, you might want to try 
using /dev/shm instead of /tmp for the backing files"

I do not understand that system output.  Is it required now to switch to   
/dev/shm  ?   And how can I do that?  Or must our operators change something 
(the cluster is very new)? 

Greetings
 Michael Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Montag, 27. Oktober 2014 14:49
An: Open MPI Users
Betreff: Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

Could you please run
mpirun -np 1 df -h
mpirun -np 1 df -hi
on both compute and login nodes

Thanks

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>---
>--- [r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032] 
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD [r5i5n13:26917] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n
>13 (the shared-memory backing file). It is likely

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-31 Thread Michael.Rachner

Dear developers of OPENMPI,

There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.

But first: 
Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
It indeed turned out, that the bad (but silent) allocations  by 
MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
allocated shared memory, 
were indeed caused by  a too small available storage for the sharedmem backing 
files. Applying the MCA parameter resolved the problem.

Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
OPENMPI-1.8.3 release version works on both clusters!
I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
Ftn-CFD-code.
It worked  even when allocating 1000 shared data windows containing a total of 
40 GB.  Very well.

But now I come to the problem remaining:
According to the attached email of Jeff (see below) of 2014-10-24, 
we have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
 of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
That version worked well, when our CFD-code was running on only 1 node.
But I observe now, that when running the CFD-code on 2 node with  2 processes 
per node,
after having allocated a total of 200 MB of data in 20 shared windows, the 
allocation of the 21-th window fails, 
because all 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The 
code hangs in that routine, without any message.

In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version   
with same program on same machine.

That means for you:  
   In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
the shared memory allocation may be not yet correctly coded ,
   or that version contains another new bug in sharedmemory allocation  
compared to the working(!) 1.8.3-release version.

Greetings to you all
  Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/

On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for shared memory 
> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
> Intel-14.0.4 /Intel-13.0.1, respectively.
>  
> The program simply allocates a sequence of shared data windows, each 
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data  in shared 
> windows raises during the course of the execution.
>  
> That worked well on the 1st cluster (Laki, having 8 procs per node))  
> when allocating even 1000 shared windows each having 5 integer*4 array 
> elements, i.e. a total of  200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of 
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but 
> silently).
> So the first attempt to use such a bad shared data window results in a bus 
> error due to the bad storage address encountered.
>  
> That strange behavior could be observed in the small testprogram but also 
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage 
> limit of  ~140 MB.
> I found that this storage limit depends only weakly on  the number of 
> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
> 132.2 MB)
>  
> Note that the shared memory storage available on both clusters was very large 
> (many GB of free memory).
>  
> Here is the error message when running with np=2 and an  array 
> dimension of idim_1=5  for the integer*4 array allocated per shared 
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the 
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 5 array elements * 4 
> Bytes/el. = 144.4 MB)
>  
>  
> [1,0]: on nodemaster: iwin= 722 :
> [1,0]:  total storage [MByte] alloc. in shared windows so far:   
> 144.4000
> [1,0]: === allocation of shared window no. iw

[OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Michael.Rachner

Dear OPENMPI developers,

In OPENMPI-1.8.3 the Ftn-bindings for  MPI_SIZEOF  are missing, when using the 
mpi-module and when using mpif.h .
(I have not controlled, whether they are present in the mpi_08 module.)

I get this message from the linker (Intel-14.0.2):
 /home/vat/src/KERNEL/mpi_ini.f90:534: undefined reference to 
`mpi_sizeof0di4_'

So can you add  the Ftn-bindings for MPI_SIZEOF?

Once again I feel, that Fortran-bindings are unloved step-children for 
C-programmers. 

Greetings to you all
 Michael Rachner

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Michael.Rachner

Dear Gilles,

My small downsized Ftn-testprogram for testing the shared memory  feature 
(MPI_WIN_ALLOCATE_SHARED,  MPI_WIN_SHARED_QUERY, C_F_POINTER)
 presumes for simplicity that all processes are running on the same node (i.e. 
the communicator containing the procs on the same node  is just MPI_COMM_WORLD).
So the hanging of MPI_WIN_ALLOCATE_SHARED when running on 2 nodes could only be 
observed with our large CFD-code. 

Are OPENMPI-developers nevertheless interested in that testprogram?

Greetings
Michael






-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Mittwoch, 5. November 2014 10:46
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

could you please share your test program so we can investigate it ?

Cheers,

Gilles

On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
> Dear developers of OPENMPI,
>
> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>
> But first: 
> Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
> It indeed turned out, that the bad (but silent) allocations  by 
> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
> allocated shared memory, were indeed caused by  a too small available storage 
> for the sharedmem backing files. Applying the MCA parameter resolved the 
> problem.
>
> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
> OPENMPI-1.8.3 release version works on both clusters!
> I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
> Ftn-CFD-code.
> It worked  even when allocating 1000 shared data windows containing a total 
> of 40 GB.  Very well.
>
> But now I come to the problem remaining:
> According to the attached email of Jeff (see below) of 2014-10-24, we 
> have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball  
> of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
> That version worked well, when our CFD-code was running on only 1 node.
> But I observe now, that when running the CFD-code on 2 node with  2 
> processes per node, after having allocated a total of 200 MB of data 
> in 20 shared windows, the allocation of the 21-th window fails, because all 4 
> processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The code hangs in 
> that routine, without any message.
>
> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version  
>  with same program on same machine.
>
> That means for you:  
>In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
> the shared memory allocation may be not yet correctly coded ,
>or that version contains another new bug in sharedmemory allocation  
> compared to the working(!) 1.8.3-release version.
>
> Greetings to you all
>   Michael Rachner
> 
>
>
>
> -Ursprüngliche Nachricht-
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
> Squyres (jsquyres)
> Gesendet: Freitag, 24. Oktober 2014 22:45
> An: Open MPI User's List
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
> Nathan tells me that this may well be related to a fix that was literally 
> just pulled into the v1.8 branch today:
>
> https://github.com/open-mpi/ompi-release/pull/56
>
> Would you mind testing any nightly tarball after tonight?  (i.e., the 
> v1.8 tarballs generated tonight will be the first ones to contain this 
> fix)
>
> http://www.open-mpi.org/nightly/master/
>
>
>
> On Oct 24, 2014, at 11:46 AM,  
>  wrote:
>
>> Dear developers of OPENMPI,
>>  
>> I am running a small downsized Fortran-testprogram for shared memory 
>> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
>> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
>> Intel-14.0.4 /Intel-13.0.1, respectively.
>>  
>> The program simply allocates a sequence of shared data windows, each 
>> consisting of 1 integer*4-array.
>> None of the windows is freed, so the amount of allocated data  in shared 
>> windows raises during the course of the execution.
>>  
>> That worked well on the 1st cluster (Laki, having 8 procs per node)) 
>> when allocating even 1000 shared windows each having 5 integer*4 array 
>> elements, i.e. a total of  200 MBytes.
>> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on 
>> the login node, but it did NOT work on a compute node.
>> In that error case, there occurs something like an internal storage limit of 
>> ~ 140 MB for the total storage allocated in all shared windows.
>> When that limit is reached, all later shared memory allocations fail (but 
>> silently).
>> So the first attempt to use such a bad shared data window results in a bus 
>> error due to th

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Michael.Rachner



-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von 
michael.rach...@dlr.de
Gesendet: Mittwoch, 5. November 2014 11:09
An: us...@open-mpi.org
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Dear Gilles,

My small downsized Ftn-testprogram for testing the shared memory  feature 
(MPI_WIN_ALLOCATE_SHARED,  MPI_WIN_SHARED_QUERY, C_F_POINTER)  presumes for 
simplicity that all processes are running on the same node (i.e. the 
communicator containing the procs on the same node  is just MPI_COMM_WORLD).
So the hanging of MPI_WIN_ALLOCATE_SHARED when running on 2 nodes could only be 
observed with our large CFD-code. 

Are OPENMPI-developers nevertheless interested in that testprogram?

Greetings
Michael






-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Mittwoch, 5. November 2014 10:46
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

could you please share your test program so we can investigate it ?

Cheers,

Gilles

On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
> Dear developers of OPENMPI,
>
> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>
> But first: 
> Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
> It indeed turned out, that the bad (but silent) allocations  by 
> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
> allocated shared memory, were indeed caused by  a too small available storage 
> for the sharedmem backing files. Applying the MCA parameter resolved the 
> problem.
>
> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
> OPENMPI-1.8.3 release version works on both clusters!
> I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
> Ftn-CFD-code.
> It worked  even when allocating 1000 shared data windows containing a total 
> of 40 GB.  Very well.
>
> But now I come to the problem remaining:
> According to the attached email of Jeff (see below) of 2014-10-24, we 
> have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball  
> of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
> That version worked well, when our CFD-code was running on only 1 node.
> But I observe now, that when running the CFD-code on 2 node with  2 
> processes per node, after having allocated a total of 200 MB of data 
> in 20 shared windows, the allocation of the 21-th window fails, because all 4 
> processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The code hangs in 
> that routine, without any message.
>
> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version  
>  with same program on same machine.
>
> That means for you:  
>In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
> the shared memory allocation may be not yet correctly coded ,
>or that version contains another new bug in sharedmemory allocation  
> compared to the working(!) 1.8.3-release version.
>
> Greetings to you all
>   Michael Rachner
> 
>
>
>
> -Ursprüngliche Nachricht-
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
> Squyres (jsquyres)
> Gesendet: Freitag, 24. Oktober 2014 22:45
> An: Open MPI User's List
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
> Nathan tells me that this may well be related to a fix that was literally 
> just pulled into the v1.8 branch today:
>
> https://github.com/open-mpi/ompi-release/pull/56
>
> Would you mind testing any nightly tarball after tonight?  (i.e., the
> v1.8 tarballs generated tonight will be the first ones to contain this
> fix)
>
> http://www.open-mpi.org/nightly/master/
>
>
>
> On Oct 24, 2014, at 11:46 AM,  
>  wrote:
>
>> Dear developers of OPENMPI,
>>  
>> I am running a small downsized Fortran-testprogram for shared memory 
>> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
>> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
>> Intel-14.0.4 /Intel-13.0.1, respectively.
>>  
>> The program simply allocates a sequence of shared data windows, each 
>> consisting of 1 integer*4-array.
>> None of the windows is freed, so the amount of allocated data  in shared 
>> windows raises during the course of the execution.
>>  
>> That worked well on the 1st cluster (Laki, having 8 procs per node)) 
>> when allocating even 1000 shared windows each having 5 integer*4 array 
>> elements, i.e. a total of  200 MBytes.
>> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on 
>> the login node, but it did NOT work on a compute node.
>> In that error

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Michael.Rachner

Dear Gilles,

Sorry, the source of our CFD-code is not public. I could share the small 
downsized testprogram, not the large CFD-code.
The small testprogram uses the relevant MPI-routines for the shared memory 
allocation in the same manner as is done in the CFD-code.

Greetings
  Michael Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Mittwoch, 5. November 2014 11:11
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Hi Michael,

bigger the program, bigger the fun ;-)

i will have a look at it.

Cheers,

Gilles

On 2014/11/05 19:08, michael.rach...@dlr.de wrote:
> Dear Gilles,
>
> My small downsized Ftn-testprogram for testing the shared memory  
> feature (MPI_WIN_ALLOCATE_SHARED,  MPI_WIN_SHARED_QUERY, C_F_POINTER)  
> presumes for simplicity that all processes are running on the same node (i.e. 
> the communicator containing the procs on the same node  is just 
> MPI_COMM_WORLD).
> So the hanging of MPI_WIN_ALLOCATE_SHARED when running on 2 nodes could only 
> be observed with our large CFD-code. 
>
> Are OPENMPI-developers nevertheless interested in that testprogram?
>
> Greetings
> Michael
>
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
> Gouaillardet
> Gesendet: Mittwoch, 5. November 2014 10:46
> An: Open MPI Users
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
> Michael,
>
> could you please share your test program so we can investigate it ?
>
> Cheers,
>
> Gilles
>
> On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
>> Dear developers of OPENMPI,
>>
>> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>>
>> But first: 
>> Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
>> It indeed turned out, that the bad (but silent) allocations  by 
>> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
>> allocated shared memory, were indeed caused by  a too small available 
>> storage for the sharedmem backing files. Applying the MCA parameter resolved 
>> the problem.
>>
>> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
>> OPENMPI-1.8.3 release version works on both clusters!
>> I tested it both with my small sharedmem-Ftn-testprogram  as well as with 
>> our Ftn-CFD-code.
>> It worked  even when allocating 1000 shared data windows containing a total 
>> of 40 GB.  Very well.
>>
>> But now I come to the problem remaining:
>> According to the attached email of Jeff (see below) of 2014-10-24, we 
>> have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
>>  of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>> That version worked well, when our CFD-code was running on only 1 node.
>> But I observe now, that when running the CFD-code on 2 node with  2 
>> processes per node, after having allocated a total of 200 MB of data 
>> in 20 shared windows, the allocation of the 21-th window fails, because all 
>> 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The code hangs 
>> in that routine, without any message.
>>
>> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version 
>>   with same program on same machine.
>>
>> That means for you:  
>>In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix 
>> concerning the shared memory allocation may be not yet correctly coded ,
>>or that version contains another new bug in sharedmemory allocation  
>> compared to the working(!) 1.8.3-release version.
>>
>> Greetings to you all
>>   Michael Rachner
>> 
>>
>>
>>
>> -Ursprüngliche Nachricht-
>> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
>> Squyres (jsquyres)
>> Gesendet: Freitag, 24. Oktober 2014 22:45
>> An: Open MPI User's List
>> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
>> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>>
>> Nathan tells me that this may well be related to a fix that was literally 
>> just pulled into the v1.8 branch today:
>>
>> https://github.com/open-mpi/ompi-release/pull/56
>>
>> Would you mind testing any nightly tarball after tonight?  (i.e., the
>> v1.8 tarballs generated tonight will be the first ones to contain 
>> this
>> fix)
>>
>> http://www.open-mpi.org/nightly/master/
>>
>>
>>
>> On Oct 24, 2014, at 11:46 AM,  
>>  wrote:
>>
>>> Dear developers of OPENMPI,
>>>  
>>> I am running a small downsized Fortran-testprogram for shared memory 
>>> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
>>> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
>>> Intel-14.0.4 /Intel-13.0.1, respectively.
>>>  
>>> The prog

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Michael.Rachner

Sorry, Gilles, you might be wrong:

The error occurs also with gfortran-4.9.1, when running my small shared memory 
testprogram:

This is the answer of the linker with gfortran-4.9.1 :  
 sharedmemtest.f90:(.text+0x1145): undefined reference to `mpi_sizeof0di4_'

and this is the answer with intel-14.0.4:
sharedmemtest.f90:(.text+0x11c3): undefined reference to `mpi_sizeof0di4_'

If openmpi  actually provides a module file   mpi.mod,  that was  precompiled 
by openmpi for a certain Fortran compiler,
then the whole installation of openmpi on a User machine from the 
openmpi-sourcecode for a User chosen Ftn-compiler would be a farce.
The module file  mpi.mod  must be either generated during the installation 
process of openmpi on the User-machine for the User chosen Ftn-compiler,
or alternatively Openmpi must provide the module not by a  mpi.mod file,  but a 
mpi.f90 file.  MS-MPI does it that way.
In my opinion, providing a  mpi.f90  file is indeed  better than providing an  
mpi.mod file, because the User can look inside the module
and can directly see, if something is missing or possibly wrongly coded. 

Greetings 
  Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Mittwoch, 5. November 2014 11:33
An: Open MPI Users
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Michael,

the root cause is openmpi was not compiled with the intel compilers but the gnu 
compiler.
fortran modules are not binary compatible so openmpi and your application must 
be compiled with the same compiler.

Cheers,

Gilles

On 2014/11/05 18:25, michael.rach...@dlr.de wrote:
> Dear OPENMPI developers,
>
> In OPENMPI-1.8.3 the Ftn-bindings for  MPI_SIZEOF  are missing, when using 
> the mpi-module and when using mpif.h .
> (I have not controlled, whether they are present in the mpi_08 
> module.)
>
> I get this message from the linker (Intel-14.0.2):
>  /home/vat/src/KERNEL/mpi_ini.f90:534: undefined reference to 
> `mpi_sizeof0di4_'
>
> So can you add  the Ftn-bindings for MPI_SIZEOF?
>
> Once again I feel, that Fortran-bindings are unloved step-children for 
> C-programmers. 
>
> Greetings to you all
>  Michael Rachner
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25676.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25682.php

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Michael.Rachner

Dear Mr. Squyres, 
  Dear MPI-Users and MPI-developers,

Here is my small  MPI-3 shared memory  Ftn95-testprogram.

I am glad that it can be used in your test suite, because this will help to 
keep the shared-memory feature working in future OPENMPI-releases.

Moreover, it can help any MPI User (independent of the MPI producer of his MPI),
which intends to benefit from the storage and cputime savings of the MPI-3 
shared memory feature in his Ftn-code development.

Namely, in the past my disappointing experience was, that there was no example 
how to make that MPI-3 shared memory feature run in a Fortran code.
It is more tricky to do as in a C-code, but the explanations in the MPI-3.0 
standard document (Sept. 21, 2012) are not sufficient in this regard.
So we tried it, and finally we succeeded.
The outcome is comprised in this small example code  sharedmemtest.f90 .
In other words:  This code  sharedmemtest.f90  may serve as an coding recipe 
for (hopefully) many other Ftn application programmers .

Greetings , and God bless you all
  Michael Rachner, DLR



-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Mittwoch, 5. November 2014 13:49
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Yes, I would love to have a copy of that test program, if you could share it.  
I'll add it to our internal test suite.


On Nov 5, 2014, at 5:08 AM,   
wrote:

> Dear Gilles,
> 
> My small downsized Ftn-testprogram for testing the shared memory  
> feature (MPI_WIN_ALLOCATE_SHARED,  MPI_WIN_SHARED_QUERY, C_F_POINTER) 
> presumes for simplicity that all processes are running on the same node (i.e. 
> the communicator containing the procs on the same node  is just 
> MPI_COMM_WORLD).
> So the hanging of MPI_WIN_ALLOCATE_SHARED when running on 2 nodes could only 
> be observed with our large CFD-code. 
> 
> Are OPENMPI-developers nevertheless interested in that testprogram?
> 
> Greetings
> Michael
> 
> 
> 
> 
> 
> 
> -Ursprüngliche Nachricht-
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
> Gouaillardet
> Gesendet: Mittwoch, 5. November 2014 10:46
> An: Open MPI Users
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
> 
> Michael,
> 
> could you please share your test program so we can investigate it ?
> 
> Cheers,
> 
> Gilles
> 
> On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
>> Dear developers of OPENMPI,
>> 
>> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>> 
>> But first: 
>> Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
>> It indeed turned out, that the bad (but silent) allocations  by 
>> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
>> allocated shared memory, were indeed caused by  a too small available 
>> storage for the sharedmem backing files. Applying the MCA parameter resolved 
>> the problem.
>> 
>> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
>> OPENMPI-1.8.3 release version works on both clusters!
>> I tested it both with my small sharedmem-Ftn-testprogram  as well as with 
>> our Ftn-CFD-code.
>> It worked  even when allocating 1000 shared data windows containing a total 
>> of 40 GB.  Very well.
>> 
>> But now I come to the problem remaining:
>> According to the attached email of Jeff (see below) of 2014-10-24, we 
>> have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
>>  of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>> That version worked well, when our CFD-code was running on only 1 node.
>> But I observe now, that when running the CFD-code on 2 node with  2 
>> processes per node, after having allocated a total of 200 MB of data 
>> in 20 shared windows, the allocation of the 21-th window fails, because all 
>> 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The code hangs 
>> in that routine, without any message.
>> 
>> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version 
>>   with same program on same machine.
>> 
>> That means for you:  
>>   In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
>> the shared memory allocation may be not yet correctly coded ,
>>   or that version contains another new bug in sharedmemory allocation  
>> compared to the working(!) 1.8.3-release version.
>> 
>> Greetings to you all
>>  Michael Rachner
>> 
>> 
>> 
>> 
>> -Ursprüngliche Nachricht-
>> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
>> Squyres (jsquyres)
>> Gesendet: Freitag, 24. Oktober 2014 22:45
>> An: Open MPI User's List
>> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
>> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Michael.Rachner

Dear Mr. Squyres,

In my   sharedmemtest.f90  coding   just sent to you,
I have added a call of MPI_SIZEOF (at present it is deactivated, because of the 
missing Ftn-binding in OPENMPI-1.8.3).
I suggest, that you may activate the 2 respective statements in the coding ,
and use yourself the program for testing whether MPI_SIZEOF works now in the 
upcoming 1.8.4-version.
For me, the installation of a tarball version is not so easy to do as for you, 
and the problem with the missing Ftn-bindings is not limited to a special 
machine.

Can you tell me, from which OPENMPI-version on  the bug will be removed?

To generalize the problem with the Ftn-bindings:
   I think OPENMPI-development should go the whole hog,  
   and check, whether for all MPI-routines the Ftn-bindings exist.
  This not so much a complicated task, but a somewhat time-consuming task.
  But otherwise, over a long time more or less angry Users will write emails on 
missing FTN-bindings and grumble on "that buggy OPENMPI".
  And you will have to write the answers on and on... .
  This will finally take more time for developers and users then doing that 
work now once-for-all.

Thank You
   Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Mittwoch, 5. November 2014 13:40
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Yes, this is a correct report.

In short, the MPI_SIZEOF situation before the upcoming 1.8.4 was a bit of a 
mess; it actually triggered a bunch of discussion up in the MPI Forum Fortran 
working group (because the design of MPI_SIZEOF actually has some unintended 
consequences that came to light when another OMPI user noted the same thing you 
did a few months ago).

Can you download a 1.8.4 nightly tarball (or the rc) and see if MPI_SIZEOF is 
working for you there?

On Nov 5, 2014, at 6:24 AM,   
wrote:

> Sorry, Gilles, you might be wrong:
> 
> The error occurs also with gfortran-4.9.1, when running my small shared 
> memory testprogram:
> 
> This is the answer of the linker with gfortran-4.9.1 :  
> sharedmemtest.f90:(.text+0x1145): undefined reference to `mpi_sizeof0di4_'
> 
> and this is the answer with intel-14.0.4:
>sharedmemtest.f90:(.text+0x11c3): undefined reference to `mpi_sizeof0di4_'
> 
> 
> If openmpi  actually provides a module file   mpi.mod,  that was  precompiled 
> by openmpi for a certain Fortran compiler,
> then the whole installation of openmpi on a User machine from the 
> openmpi-sourcecode for a User chosen Ftn-compiler would be a farce.
> The module file  mpi.mod  must be either generated during the 
> installation process of openmpi on the User-machine for the User chosen 
> Ftn-compiler, or alternatively Openmpi must provide the module not by a  
> mpi.mod file,  but a mpi.f90 file.  MS-MPI does it that way.
> In my opinion, providing a  mpi.f90  file is indeed  better than 
> providing an  mpi.mod file, because the User can look inside the module and 
> can directly see, if something is missing or possibly wrongly coded.
> 
> Greetings
>  Michael Rachner
> 
> 
> -Ursprüngliche Nachricht-
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
> Gouaillardet
> Gesendet: Mittwoch, 5. November 2014 11:33
> An: Open MPI Users
> Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for 
> MPI_SIZEOF
> 
> Michael,
> 
> the root cause is openmpi was not compiled with the intel compilers but the 
> gnu compiler.
> fortran modules are not binary compatible so openmpi and your application 
> must be compiled with the same compiler.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/11/05 18:25, michael.rach...@dlr.de wrote:
>> Dear OPENMPI developers,
>> 
>> In OPENMPI-1.8.3 the Ftn-bindings for  MPI_SIZEOF  are missing, when using 
>> the mpi-module and when using mpif.h .
>> (I have not controlled, whether they are present in the mpi_08
>> module.)
>> 
>> I get this message from the linker (Intel-14.0.2):
>> /home/vat/src/KERNEL/mpi_ini.f90:534: undefined reference to 
>> `mpi_sizeof0di4_'
>> 
>> So can you add  the Ftn-bindings for MPI_SIZEOF?
>> 
>> Once again I feel, that Fortran-bindings are unloved step-children for 
>> C-programmers. 
>> 
>> Greetings to you all
>> Michael Rachner
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25676.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25682.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner

Dear Mr. Squyres,

a) When looking in your  mpi_sizeof_mpifh.f90  test program I found a little 
thing:  You may (but need not) change the name of the integer variable  size
to e.g.   isize  , because   size   is just an intrinsic function in 
Fortran (you may see it already, if you have an editor with 
Fortran-highlighting).
   Although your type declaration overrides the intrinsic function, a renaming 
would make the coding unambiguous. 

b)  My idea was, that OPENMPI should provide always an declaration (interface) 
for each MPI-routine
(and that's what the MPI-3.0 Standard document (Sept.21, 2012) prescribes 
(p. 599+601+603)),
 independent whether you have already a test program in your suite for an 
MPI-routine or not.
 Because:  If all the interfaces are present, you a priory will avoid 
"2-step" User messages: 
   first the User complains about a missing MPI-routine, 
and when the MPI-routine is made available, possibly later about a bug in that 
MPI-routine.
   So bugs in MPI-routines will be detected and removed 
faster in the course of the OPENMPI development. Good for all.

Greetings 
 Michael Rachner 

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Mittwoch, 5. November 2014 16:48
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Meh.  I forgot to attach the test.  :-)

Here it is.

On Nov 5, 2014, at 10:46 AM, Jeff Squyres (jsquyres)  wrote:

> On Nov 5, 2014, at 9:59 AM,   
> wrote:
> 
>> In my   sharedmemtest.f90  coding   just sent to you,
>> I have added a call of MPI_SIZEOF (at present it is deactivated, because of 
>> the missing Ftn-binding in OPENMPI-1.8.3).
> 
> FWIW, I attached one of the tests I put in our test suite for SIZEOF issues 
> after the last bug was found.  I have that same test replicated essentially 
> three times:
> 
> - once for mpif.h
> - once for "use mpi"
> - ones for "use mpi_f08"
> 
>> I suggest, that you may activate the 2 respective statements in the 
>> coding , and use yourself the program for testing whether MPI_SIZEOF works 
>> now in the upcoming 1.8.4-version.
>> For me, the installation of a tarball version is not so easy to do as 
>> for you, and the problem with the missing Ftn-bindings is not limited to a 
>> special machine.
> 
> Right; it was a larger problem.
> 
>> Can you tell me, from which OPENMPI-version on  the bug will be removed?
> 
> 1.8.4 will contain the fix.
> 
>> To generalize the problem with the Ftn-bindings:
>>  I think OPENMPI-development should go the whole hog,  and check, 
>> whether for all MPI-routines the Ftn-bindings exist.
>> This not so much a complicated task, but a somewhat time-consuming task.
>> But otherwise, over a long time more or less angry Users will write emails 
>> on missing FTN-bindings and grumble on "that buggy OPENMPI".
>> And you will have to write the answers on and on... .
>> This will finally take more time for developers and users then doing that 
>> work now once-for-all.
> 
> We do have a bunch of fortran tests, but I admit that our coverage is 
> not complete.  SIZEOF was not tested at all, for example, until 
> recently.  :-(
> 
> SIZEOF is also a bit of a special case in the MPI API because it *must* be 
> polymorphic (I don't think any other MPI API is) -- even for mpif.h.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25689.php

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner

Dear Mr. Squyres,

I agree fully with omitting the explicit interfaces from  mpif.h   . It is an 
important  resort for legacy codes.
But, in the mpi and mpi_f08 module  explicit interfaces are required for  
all(!)  MPI-routines.
So far, this is not fulfilled in MPI-versions I know. 
I want to point out here, that this has a negative consequence for the 
Ftn-coding:
  'If someone uses the mpi (or mpi_f08) module, then he cannot put the name of 
an MPI-routine in the "only"-list of the mpi module'.

I explain that now: 
   The following stmt is an example of a desirable stmt, because the programmer 
sees at a glance, which quantities are used from this module in his subroutine,
   and this stmt limits the quantities in the mpi module only to those actually 
needed in the subroutine:

  use MPI, only:   MPI_COMM_WORLD, MPI_IN_PLACE, MPI_REDUCE
   However this stmt will work only, if the explicit interface for MPI_REDUCE 
is actually present in the mpi module.
   Unfortunately the explicit interfaces are not complete in the MPI 
distributions I know,
   so the programmer must use instead:   a) use MPI, only:  MPI_COMM_WORLD, 
MPI_IN_PLACE

This has the drawback, that always the implicit interface for MPI_REDUCE will 
be used, 

i.e. there is no control of the parameter list by the explicit interface, 

even if there exists an explicit interface in the mpi module
or :   b)   
use MPI

Here the explicit interface will be used if present in the module, otherwise 
the implicit interface will be used,

this is o.k., but the drawback is now, that the whole MPI world is (silently) 
present in the subroutine, 

and the programmer cannot see at a glance, what quantities are really used from 
the module in the sbr.

Greetings 
  Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Donnerstag, 6. November 2014 12:42
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

On Nov 6, 2014, at 5:37 AM,   
wrote:

> a) When looking in your  mpi_sizeof_mpifh.f90  test program I found a little 
> thing:  You may (but need not) change the name of the integer variable  size
>to e.g.   isize  , because   size   is just an intrinsic function in 
> Fortran (you may see it already, if you have an editor with 
> Fortran-highlighting).
>   Although your type declaration overrides the intrinsic function, a renaming 
> would make the coding unambiguous. 

Good catch.  I'll do that.

> b)  My idea was, that OPENMPI should provide always an declaration 
> (interface) for each MPI-routine
>(and that's what the MPI-3.0 Standard document (Sept.21, 2012) prescribes 
> (p. 599+601+603)),

Note that MPI-3 p603 says (about mpif.h):

"For each MPI routine, an implementation can choose to use an implicit or 
explicit interface..."

I.e., it is *not* mandated that MPI implementations have explicit interfaces 
for mpif.h (although, obviously, it *is* mandated for the mpi and mpi_f08 
modules).

There are several reasons why MPI implementations have not added explicit 
interfaces to their mpif.h files, mostly boiling down to: they may/will break 
real world MPI programs.

1. All modern compilers have ignore-TKR syntax, so it's at least not a problem 
for subroutines like MPI_SEND (with a choice buffer).  However: a) this was not 
true at the time when MPI-3 was written, and b) it's not standard fortran.

2. There are (very) likely real-world programs out there that aren't quite 
right (i.e., would fail to compile with explicit interfaces), but still work 
fine.  On the one hand, it's terrible that we implementers continue to allow 
people to run "incorrect" programs.  But on the other hand, users *have* very 
simple option to run their codes through explicit interfaces (the mpi module), 
and can do so if they choose to.  Hence, the MPI Forum has decided that 
backwards compatibility is important enough for legacy codes -- some of which 
are tens of thousands of lines long (and more!), and there are no maintainers 
for them any more (!) -- to allow the "good enough" to keep going.

3. But #1 and #2 are mostly trumped by: the goal is to deprecate mpif.h, anyway 
(perhaps in MPI-4?) -- so why bother spending any more time on it than we have 
to?  Ultimately, we'd

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-06 Thread Michael.Rachner

Dear Mr. Squyres,

Thank you for your clear answer on the state of the interfaces in the mpi 
modules of OPENMPI.  A good state!
And I have coded sufficiently bugs myself, so I do not become too angry about 
the bugs of others.
If I should stumble upon missing Ftn-bindings in the future, I will send you a 
hint.

Greetings to you all!
 Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Donnerstag, 6. November 2014 15:10
An: Open MPI User's List
Betreff: Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

On Nov 6, 2014, at 8:55 AM,   
wrote:

> I agree fully with omitting the explicit interfaces from  mpif.h   . It is an 
> important  resort for legacy codes.
> But, in the mpi and mpi_f08 module  explicit interfaces are required for  
> all(!)  MPI-routines.
> So far, this is not fulfilled in MPI-versions I know. 

Bugs happen.

I think you're saying that we don't intend to have all the routines in the mpi 
and mpi_f08 modules.  That's not correct.  We *do* have all explicit MPI 
interface in the mpi and mpi_f08 modules.  If some are missing -- like 
WIN_ALLOCATE was just discovered to be missing in the 1.8.3 release -- those 
are bugs.  We try really hard to avoid bugs, but sometimes they happen.  :-(

Are you aware of other routines that are missing from the OMPI mpi / mpi_f08 
modules?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25700.php

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Michael.Rachner

It may be possibly a bug in Intel-15.0 .
I suspect it has to do with the   contains-block   and with the fact, that you 
call an intrinsic sbr in that contains-block.
Normally this must work. You may try to separate the influence of both:

What happens with these 3 variants of your code:

variant a):   using an own sbr instead of the intrinsic sbr

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
contains
  subroutine sub
 real :: a(10)
 call mydummy_random_number(a)
   end subroutine sub
   subroutine mydummy_random_number(a)
 real :: a(10)
 print *,’---I am in sbr mydummy_random_number’
   end subroutine mydummy_random_number
end program fred


variant b):   removing the  contains-block

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
end program fred
!
subroutine sub
real :: a(10)
call random_number(a)
end subroutine sub


variant c): moving the contains-block into a module

module MYMODULE
contains
  subroutine sub
real :: a(10)
call random_number(a)
   end subroutine sub
end module MYMODULE
!
program fred
use MYMODULE
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
end program fred


Greetings
Michael Rachner



Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von John Bray
Gesendet: Dienstag, 18. November 2014 10:10
An: Open MPI Users
Betreff: Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does 
nothing silently

A delightful bug this, you get a segfault if you code contains a random_number 
call and is compiled with -fopenmp, EVEN IF YOU CANNOT CALL IT!

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
contains
  subroutine sub
real :: a(10)
call random_number(a)
   end subroutine sub
end program fred
The segfault is nothing to do with OpenMPI, but there remains a mystery as to 
why I only get the segfault error messages on lower node counts.

mpif90 -O0 -fopenmp ./fred.f90
mpiexec -n 6 ./a.out
--
mpiexec noticed that process rank 4 with PID 28402 on node mic2 exited on 
signal 11 (Segmentation fault).
--
jbray@mic2:intel-15_openmpi-1.8.3% 
mpiexec -n 12 ./a.out

It was the silence that made me raise the issue here. I am running on a 12 
physical core hyperthreaded Xeon Phi. Is there something in OpenMPI that is 
suppressing the messages, as I am getting 4/5 core files each time.
John

On 18 November 2014 04:24, Ralph Castain 
mailto:r...@open-mpi.org>> wrote:
Just checked the head of the 1.8 branch (soon to be released as 1.8.4), and 
confirmed the same results. I know the thread-multiple option is still broken 
there, but will test that once we get the final fix committed.


On Mon, Nov 17, 2014 at 7:29 PM, Ralph Castain 
mailto:r...@open-mpi.org>> wrote:
FWIW: I don't have access to a Linux box right now, but I built the OMPI devel 
master on my Mac using Intel 2015 compilers and was able to build/run all of 
the Fortran examples in our "examples" directory.

I suspect the problem here is your use of the --enable-mpi-thread-multiple 
option. The 1.8 series had an issue with that option - we are in the process of 
fixing it (I'm waiting for an updated patch), and you might be hitting it.

If you remove that configure option, do things then work?
Ralph


On Mon, Nov 17, 2014 at 5:56 PM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@iferc.org>> wrote:
Hi John,

do you MPI_Init() or do you MPI_Init_thread(MPI_THREAD_MULTIPLE) ?

does your program calls MPI anywhere from an OpenMP region ?
does your program calls MPI only within an !$OMP MASTER section ?
does your program does not invoke MPI at all from any OpenMP region ?

can you reproduce this issue with a simple fortran program ? or can you publish 
all your files ?

Cheers,

Gilles


On 2014/11/18 1:41, John Bray wrote:

I have succesfully been using OpenMPI 1.8.3 compiled with Intel-14, using



./configure --prefix=/usr/local/mpi/$(basename $PWD) --with-threads=posix

--enable-mpi-thread-multiple --disable-vt --with-scif=no



I have now switched to Intel 15.0.1, and configuring with the same options,

I get minor changes in config.log about warning spotting, but it makes all

the binaries, and I can compile my own fortran code with mpif90/mpicc



but a command 'mpiexec --verbose -n 12 ./fortran_binary' does nothing



I checked the FAQ and started using



./configure --prefix=/usr/local/mpi/$(basename $PWD) --with-threads=posix

--enable-mpi-thread-multiple --disable-vt --with-scif=no CC=icc CXX=icpc

F77=ifort FC=ifort



but that makes no difference.



Only with -d do I get any more information



mpirun -d --verbose -n 12

/home/jbray/5.0/mic2/one/intel-15_openmpi-1.8.3/one_f_debug.exe

[mic2:21851] pr

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-18 Thread Michael.Rachner

Tip:  INTEL-Ftn-compiler problems can be communicated to INTEL there:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x

Greetings
Michael Rachner

Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von John Bray
Gesendet: Dienstag, 18. November 2014 11:03
An: Open MPI Users
Betreff: Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does 
nothing silently

The original problem used a separate file and not a module. Its clearly a 
bizarre Intel bug, I am only continuing to persue it here as I'm curious as to 
why the segfault messages disappear at higher process counts
John

On 18 November 2014 09:58, 
mailto:michael.rach...@dlr.de>> wrote:
It may be possibly a bug in Intel-15.0 .
I suspect it has to do with the   contains-block   and with the fact, that you 
call an intrinsic sbr in that contains-block.
Normally this must work. You may try to separate the influence of both:

What happens with these 3 variants of your code:

variant a):   using an own sbr instead of the intrinsic sbr

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
contains
  subroutine sub
 real :: a(10)
 call mydummy_random_number(a)
   end subroutine sub
   subroutine mydummy_random_number(a)
 real :: a(10)
 print *,’---I am in sbr mydummy_random_number’
   end subroutine mydummy_random_number
end program fred


variant b):   removing the  contains-block

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
end program fred
!
subroutine sub
real :: a(10)
call random_number(a)
end subroutine sub

variant c): moving the contains-block into a module

module MYMODULE
contains
  subroutine sub
real :: a(10)
call random_number(a)
   end subroutine sub
end module MYMODULE
!
program fred
use MYMODULE
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
end program fred


Greetings
Michael Rachner



Von: users 
[mailto:users-boun...@open-mpi.org] Im 
Auftrag von John Bray
Gesendet: Dienstag, 18. November 2014 10:10
An: Open MPI Users
Betreff: Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does 
nothing silently

A delightful bug this, you get a segfault if you code contains a random_number 
call and is compiled with -fopenmp, EVEN IF YOU CANNOT CALL IT!

program fred
use mpi
integer :: ierr
call mpi_init(ierr)
print *,"hello"
call mpi_finalize(ierr)
contains
  subroutine sub
real :: a(10)
call random_number(a)
   end subroutine sub
end program fred
The segfault is nothing to do with OpenMPI, but there remains a mystery as to 
why I only get the segfault error messages on lower node counts.

mpif90 -O0 -fopenmp ./fred.f90
mpiexec -n 6 ./a.out
--
mpiexec noticed that process rank 4 with PID 28402 on node mic2 exited on 
signal 11 (Segmentation fault).
--
jbray@mic2:intel-15_openmpi-1.8.3% 
mpiexec -n 12 ./a.out

It was the silence that made me raise the issue here. I am running on a 12 
physical core hyperthreaded Xeon Phi. Is there something in OpenMPI that is 
suppressing the messages, as I am getting 4/5 core files each time.
John

On 18 November 2014 04:24, Ralph Castain 
mailto:r...@open-mpi.org>> wrote:
Just checked the head of the 1.8 branch (soon to be released as 1.8.4), and 
confirmed the same results. I know the thread-multiple option is still broken 
there, but will test that once we get the final fix committed.


On Mon, Nov 17, 2014 at 7:29 PM, Ralph Castain 
mailto:r...@open-mpi.org>> wrote:
FWIW: I don't have access to a Linux box right now, but I built the OMPI devel 
master on my Mac using Intel 2015 compilers and was able to build/run all of 
the Fortran examples in our "examples" directory.

I suspect the problem here is your use of the --enable-mpi-thread-multiple 
option. The 1.8 series had an issue with that option - we are in the process of 
fixing it (I'm waiting for an updated patch), and you might be hitting it.

If you remove that configure option, do things then work?
Ralph


On Mon, Nov 17, 2014 at 5:56 PM, Gilles Gouaillardet 
mailto:gilles.gouaillar...@iferc.org>> wrote:
Hi John,

do you MPI_Init() or do you MPI_Init_thread(MPI_THREAD_MULTIPLE) ?

does your program calls MPI anywhere from an OpenMP region ?
does your program calls MPI only within an !$OMP MASTER section ?
does your program does not invoke MPI at all from any OpenMP region ?

can you reproduce this issue with a simple fortran program ? or can you publish 
all your files ?

Cheers,

Gilles


On 2014/11/18 1:41, John Bray wrote:

I have succesfully been using OpenMPI 1.8.3 compiled with Intel-14, using



./configure --prefix=/usr/local/mpi/$(basename $PWD) --with-threads=posix

--enable-mp

Re: [OMPI users] Open MPI SC'14 BOF slides: mpif.h --> module mpi

2014-11-21 Thread Michael.Rachner

Dear community,

Slide 92 of the OpenMPI Sc'14  slides describes the simple migration from   
mpif.hto   use mpiin a  Fortran application code.

However the description is not correct.
In a Fortran routine, the use-stmts (if there are) must come before (!) any 
other stmts,
i.e. you cannot place the   implicit nonebefore theuse mpif90.

Correct is only this:

subroutine foo  --> subroutine foo
  include 'mpif.h' use mpi
  implicit none implicit none
  integer  a, ... integer  a, ...


However (for the developers of the mpi-module), you can (and should!!) employ 
theimplicit none  -stmt inside the mpi-module itself:

module mpi
implicit none
integer MPI_...
contains
...
end module mpi


Greetings
Michael Rachner 



-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Donnerstag, 20. November 2014 16:48
An: Open MPI User's List; Open MPI Developers List
Betreff: [OMPI users] Open MPI SC'14 BOF slides

For those of you who weren't able to be at the SC'14 BOF yesterday -- and even 
for those of you who were there and wanted to be able to read the slides in a 
little more detail (and get the links from the slides) -- I have posted them 
here:

http://www.open-mpi.org/papers/sc-2014/

Enjoy!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25849.php

[OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

2015-11-19 Thread Michael.Rachner

Dear developers of OpenMPI,

I am trying to run our parallelized Ftn-95 code on a Linux cluster with 
OpenMPI-1-10.0 and Intel-16.0.0 Fortran compiler.
In the code I use the  module MPI  ("use MPI"-stmts).

However I am not able to compile the code, because of compiler error messages 
like this:

/src_SPRAY/mpi_wrapper.f90(2065): error #6285: There is no matching specific 
subroutin for this generic subroutine call.   [MPI_REDUCE]


The problem seems for me to be this one:

The interfaces in the module MPI for the MPI-routines do not accept a send or 
receive buffer array, which is
actually a variable, an array element or a constant (like MPI_IN_PLACE).

Example 1:
 This does not work (gives the compiler error message:  error #6285: 
There is no matching specific subroutin for this generic subroutine call  )
 ivar=123! <-- ivar is an integer variable, not an array
  call MPI_BCAST( ivar, 1, MPI_INTEGER, 0, MPI_COMM_WORLD), ierr_mpi )  
  ! <--- this should work, but is not accepted by the compiler

  only this cumbersome workaround works:
  ivar=123
allocate( iarr(1) )
iarr(1) = ivar
 call MPI_BCAST( iarr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr_mpi )
! <--- this workaround works
ivar = iarr(1)
deallocate( iarr(1) )

Example 2:
 Any call of an MPI-routine with MPI_IN_PLACE does not work, like that 
coding:

  if(lmaster) then
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
! <--- this should work, but is not accepted by the compiler
 ,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  else  ! slaves
call MPI_REDUCE( rbuffarr, rdummyarr, nelem, MPI_REAL8, MPI_MAX &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  endif

This results in this compiler error message:

  /src_SPRAY/mpi_wrapper.f90(2122): error #6285: There is no matching 
specific subroutine for this generic subroutine call.   [MPI_REDUCE]
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
-^


In our code I observed the bug with MPI_BCAST, MPI_REDUCE, MPI_ALLREDUCE,
but probably there may be other MPI-routines with the same kind of bug.

This bug occurred for   : OpenMPI-1.10.0  with 
Intel-16.0.0
In contrast, this bug did NOT occur for: OpenMPI-1.8.8with Intel-16.0.0

OpenMPI-1.8.8with Intel-15.0.3

OpenMPI-1.10.0  with gfortran-5.2.0

Greetings
Michael Rachner

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

2015-11-19 Thread Michael.Rachner

Sorry, Gilles,

I cannot  update to more recent versions, because what I used is the newest 
combination of OpenMPI and Intel-Ftn  available on that cluster.

When looking at the list of improvements  on the OpenMPI website for  OpenMPI 
1.10.1 compared to 1.10.0, I do not remember having seen this item to be 
corrected.

Greeting
Michael Rachner


Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Donnerstag, 19. November 2015 10:21
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Michael,

I remember i saw similar reports.

Could you give a try to the latest v1.10.1 ?
And if that still does not work, can you upgrade icc suite and give it an other 
try ?

I cannot remember whether this is an ifort bug or the way ompi uses fortran...

Btw, any reason why you do not
Use mpi_f08 ?

HTH

Gilles

michael.rach...@dlr.de wrote:
Dear developers of OpenMPI,

I am trying to run our parallelized Ftn-95 code on a Linux cluster with 
OpenMPI-1-10.0 and Intel-16.0.0 Fortran compiler.
In the code I use the  module MPI  (“use MPI”-stmts).

However I am not able to compile the code, because of compiler error messages 
like this:

/src_SPRAY/mpi_wrapper.f90(2065): error #6285: There is no matching specific 
subroutin for this generic subroutine call.   [MPI_REDUCE]


The problem seems for me to be this one:

The interfaces in the module MPI for the MPI-routines do not accept a send or 
receive buffer array, which is
actually a variable, an array element or a constant (like MPI_IN_PLACE).

Example 1:
 This does not work (gives the compiler error message:  error #6285: 
There is no matching specific subroutin for this generic subroutine call  )
 ivar=123! <-- ivar is an integer variable, not an array
  call MPI_BCAST( ivar, 1, MPI_INTEGER, 0, MPI_COMM_WORLD), ierr_mpi )  
  ! <--- this should work, but is not accepted by the compiler

  only this cumbersome workaround works:
  ivar=123
allocate( iarr(1) )
iarr(1) = ivar
 call MPI_BCAST( iarr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr_mpi )
! <--- this workaround works
ivar = iarr(1)
deallocate( iarr(1) )

Example 2:
 Any call of an MPI-routine with MPI_IN_PLACE does not work, like that 
coding:

  if(lmaster) then
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
! <--- this should work, but is not accepted by the compiler
 ,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  else  ! slaves
call MPI_REDUCE( rbuffarr, rdummyarr, nelem, MPI_REAL8, MPI_MAX &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  endif

This results in this compiler error message:

  /src_SPRAY/mpi_wrapper.f90(2122): error #6285: There is no matching 
specific subroutine for this generic subroutine call.   [MPI_REDUCE]
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
-^


In our code I observed the bug with MPI_BCAST, MPI_REDUCE, MPI_ALLREDUCE,
but probably there may be other MPI-routines with the same kind of bug.

This bug occurred for   : OpenMPI-1.10.0  with 
Intel-16.0.0
In contrast, this bug did NOT occur for: OpenMPI-1.8.8with Intel-16.0.0

OpenMPI-1.8.8with Intel-15.0.3

OpenMPI-1.10.0  with gfortran-5.2.0

Greetings
Michael Rachner

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

2015-11-19 Thread Michael.Rachner

Thank You,  Nick and Gilles,

I hope the administrators of the cluster will be so kind  and will update 
OpenMPI for me (and others) soon.

Greetings
Michael

Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Donnerstag, 19. November 2015 12:59
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Thanks Nick for the pointer !

Michael,

good news is you do not have to upgrade ifort,
but you have to update to 1.10.1
(intel 16 changed the way gcc pragmas are handled, and ompi has been made aware 
in 1.10.1)
1.10.1 fixes many bugs from 1.10.0, so I strongly encourage anyone to use 1.10.1

Cheers,

Gilles

On Thursday, November 19, 2015, Nick Papior 
mailto:nickpap...@gmail.com>> wrote:
Maybe I can chip in,

We use OpenMPI 1.10.1 with Intel /2016.1.0.423501 without problems.

I could not get 1.10.0 to work, one reason is: 
http://www.open-mpi.org/community/lists/users/2015/09/27655.php

On a side-note, please note that if you require scalapack you may need to 
follow this approach:
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/590302

2015-11-19 11:24 GMT+01:00 
>:
Sorry, Gilles,

I cannot  update to more recent versions, because what I used is the newest 
combination of OpenMPI and Intel-Ftn  available on that cluster.

When looking at the list of improvements  on the OpenMPI website for  OpenMPI 
1.10.1 compared to 1.10.0, I do not remember having seen this item to be 
corrected.

Greeting
Michael Rachner


Von: users 
[mailto:users-boun...@open-mpi.org]
 Im Auftrag von Gilles Gouaillardet
Gesendet: Donnerstag, 19. November 2015 10:21
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Michael,

I remember i saw similar reports.

Could you give a try to the latest v1.10.1 ?
And if that still does not work, can you upgrade icc suite and give it an other 
try ?

I cannot remember whether this is an ifort bug or the way ompi uses fortran...

Btw, any reason why you do not
Use mpi_f08 ?

HTH

Gilles

michael.rach...@dlr.de 
wrote:
Dear developers of OpenMPI,

I am trying to run our parallelized Ftn-95 code on a Linux cluster with 
OpenMPI-1-10.0 and Intel-16.0.0 Fortran compiler.
In the code I use the  module MPI  (“use MPI”-stmts).

However I am not able to compile the code, because of compiler error messages 
like this:

/src_SPRAY/mpi_wrapper.f90(2065): error #6285: There is no matching specific 
subroutin for this generic subroutine call.   [MPI_REDUCE]


The problem seems for me to be this one:

The interfaces in the module MPI for the MPI-routines do not accept a send or 
receive buffer array, which is
actually a variable, an array element or a constant (like MPI_IN_PLACE).

Example 1:
 This does not work (gives the compiler error message:  error #6285: 
There is no matching specific subroutin for this generic subroutine call  )
 ivar=123! <-- ivar is an integer variable, not an array
  call MPI_BCAST( ivar, 1, MPI_INTEGER, 0, MPI_COMM_WORLD), ierr_mpi )  
  ! <--- this should work, but is not accepted by the compiler

  only this cumbersome workaround works:
  ivar=123
allocate( iarr(1) )
iarr(1) = ivar
 call MPI_BCAST( iarr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr_mpi )
! <--- this workaround works
ivar = iarr(1)
deallocate( iarr(1) )

Example 2:
 Any call of an MPI-routine with MPI_IN_PLACE does not work, like that 
coding:

  if(lmaster) then
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
! <--- this should work, but is not accepted by the compiler
 ,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  else  ! slaves
call MPI_REDUCE( rbuffarr, rdummyarr, nelem, MPI_REAL8, MPI_MAX &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  endif

This results in this compiler error message:

  /src_SPRAY/mpi_wrapper.f90(2122): error #6285: There is no matching 
specific subroutine for this generic subroutine call.   [MPI_REDUCE]
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
-^


In our code I observed the bug with MPI_BCAST, MPI_REDUCE, MPI_ALLREDUCE,
but probably there may be other MPI-routines with the same kind of bug.

This bug occurred for   : OpenMPI-1.10.0  with 
Intel-16.0.0
In contrast, this bug did NOT occur for: OpenMPI-1.8.8with Intel-16.0.0

OpenMPI-1.8.8with Intel-15.0.3

OpenMPI-1.10.0  with gfortran-5.2.0

Greetings
Michael Rachner

___
users mailing list
us...@open-mpi.org
Subscription: http://www.ope

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

2015-11-23 Thread Michael.Rachner

Dear Gilles,

In the meantime the administrators have installed (Thanks!)  OpenMPI-1.10.1 
with Intel-16.0.0 on the cluster.
I have tested it with our code:  It works.
The time spent for MPI-data transmission was the same as with 
OpenMPI-1.8.3&Intel-14.0.4, but was ~20% higher than with 
IMPI-5.1.1&Intel-16.0.0
for the same case running on 3 nodes and 8 procs per node.

Greetings
  Michael Rachner


Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Freitag, 20. November 2015 00:53
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Michael,

in the mean time, you can use 'mpi_f08' instead of 'use mpi'
this is really a f90 binding issue, and f08 is safe

Cheers,

Gilles
On 11/19/2015 10:21 PM, michael.rach...@dlr.de 
wrote:
Thank You,  Nick and Gilles,

I hope the administrators of the cluster will be so kind  and will update 
OpenMPI for me (and others) soon.

Greetings
Michael

Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Donnerstag, 19. November 2015 12:59
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Thanks Nick for the pointer !

Michael,

good news is you do not have to upgrade ifort,
but you have to update to 1.10.1
(intel 16 changed the way gcc pragmas are handled, and ompi has been made aware 
in 1.10.1)
1.10.1 fixes many bugs from 1.10.0, so I strongly encourage anyone to use 1.10.1

Cheers,

Gilles

On Thursday, November 19, 2015, Nick Papior 
mailto:nickpap...@gmail.com>> wrote:
Maybe I can chip in,

We use OpenMPI 1.10.1 with Intel /2016.1.0.423501 without problems.

I could not get 1.10.0 to work, one reason is: 
http://www.open-mpi.org/community/lists/users/2015/09/27655.php

On a side-note, please note that if you require scalapack you may need to 
follow this approach:
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/590302

2015-11-19 11:24 GMT+01:00 
mailto:michael.rach...@dlr.de>>:
Sorry, Gilles,

I cannot  update to more recent versions, because what I used is the newest 
combination of OpenMPI and Intel-Ftn  available on that cluster.

When looking at the list of improvements  on the OpenMPI website for  OpenMPI 
1.10.1 compared to 1.10.0, I do not remember having seen this item to be 
corrected.

Greeting
Michael Rachner


Von: users 
[mailto:users-boun...@open-mpi.org]
 Im Auftrag von Gilles Gouaillardet
Gesendet: Donnerstag, 19. November 2015 10:21
An: Open MPI Users
Betreff: Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with 
Intel-Ftn-compiler

Michael,

I remember i saw similar reports.

Could you give a try to the latest v1.10.1 ?
And if that still does not work, can you upgrade icc suite and give it an other 
try ?

I cannot remember whether this is an ifort bug or the way ompi uses fortran...

Btw, any reason why you do not
Use mpi_f08 ?

HTH

Gilles

michael.rach...@dlr.de 
wrote:
Dear developers of OpenMPI,

I am trying to run our parallelized Ftn-95 code on a Linux cluster with 
OpenMPI-1-10.0 and Intel-16.0.0 Fortran compiler.
In the code I use the  module MPI  ("use MPI"-stmts).

However I am not able to compile the code, because of compiler error messages 
like this:

/src_SPRAY/mpi_wrapper.f90(2065): error #6285: There is no matching specific 
subroutin for this generic subroutine call.   [MPI_REDUCE]


The problem seems for me to be this one:

The interfaces in the module MPI for the MPI-routines do not accept a send or 
receive buffer array, which is
actually a variable, an array element or a constant (like MPI_IN_PLACE).

Example 1:
 This does not work (gives the compiler error message:  error #6285: 
There is no matching specific subroutin for this generic subroutine call  )
 ivar=123! <-- ivar is an integer variable, not an array
  call MPI_BCAST( ivar, 1, MPI_INTEGER, 0, MPI_COMM_WORLD), ierr_mpi )  
  ! <--- this should work, but is not accepted by the compiler

  only this cumbersome workaround works:
  ivar=123
allocate( iarr(1) )
iarr(1) = ivar
 call MPI_BCAST( iarr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr_mpi )
! <--- this workaround works
ivar = iarr(1)
deallocate( iarr(1) )

Example 2:
 Any call of an MPI-routine with MPI_IN_PLACE does not work, like that 
coding:

  if(lmaster) then
call MPI_REDUCE( MPI_IN_PLACE, rbuffarr, nelem, MPI_REAL8, MPI_MAX &
! <--- this should work, but is not accepted by the compiler
 ,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  else  ! slaves
call MPI_REDUCE( rbuffarr, rdummyarr, nelem, MPI_REAL8, MPI_MAX &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
  endif

This results in this compiler error message:

  /src_SPRAY/mpi_wrapper

[OMPI users] Bug in OpenMPI-1.8.1: missing routines mpi_win_allocate_shared, mpi_win_shared_query called from Ftn95-code

Re: [OMPI users] latest stable and win7/msvc2013

Re: [OMPI users] latest stable and win7/msvc2013 and shared memory feature

[OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

[OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

[OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

Re: [OMPI users] Open MPI SC'14 BOF slides: mpif.h --> module mpi

[OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

26 matches

Site Navigation

Mail list logo

Footer information