Re: [OMPI users] MPI_BCAST and fortran subarrays

David Warren Wed, 14 Dec 2011 13:04:24 -0500

Actually, sub array passing is part of the F90 standard (at leastaccording to every document I can find), and not an Intel extension. Soif it doesn't work you should complain to the compiler company. One ofthe reasons for using it is that the compiler should be optimized forwhatever method they chose to use. As there are multiple options in theF90 standard for how arrays get passed, it is not really a good idea tocircumvent the official method. Using user defined data types is greatas long as the compiler chooses to do a simple pointer pass, however ifthey use the copy in/out option you will be making much larger temporaryarrays than if you just pass the correct subarray. Anyway, this is notreally an MPI issue as much as an F90 bug in your compiler.


On 12/14/11 08:57, Gustavo Correa wrote:

Hi Patrick


> From my mere MPI and Fortran-90 user point of view,
I think that the solution offered by the MPI standard [at least up to MPI-2]
to address the problem of non-contiguous memory layouts is to use MPI 
user-defined types,
as I pointed out in my previous email.
I like this solution because it is portable and doesn't require the allocation 
of
temporary arrays, and the additional programming effort is not that big.

As far as I know, MPI doesn't parse or comply with the Fortran-90
array-section notation and syntax.  All buffers in the MPI calls are 
pointers/addresses to the
first element on the buffer, which will  be tracked according to the number of 
elements passed
to the MPI call, and according to the MPI type passed to the MPI routine [which 
should be
a user-defined type, if you need to implement a fancy memory layout].

That MPI doesn't understand Fortran-90 array-sections doesn't surprise me so 
much.
I think Lapack doesn't do it either, and many other legitimate Fortran 
libraries don't
'understand' array-sections either.
FFTW, for instance, goes a long way do define its own mechanism to
specify fancy memory layouts independently of the Fortran-90 array-section 
notation.
Amongst the libraries with Fortran interfaces that I've used, MPI probably 
provides the most
flexible and complete mechanism to describe memory layout, through user-defined 
types.
In your case I think the work required to declare a MPI_TYPE_VECTOR to handle 
your
table 'tab' is not really big or complicated.

As two other list subscribers mentioned, and you already tried,
the Intel compiler seems to offer an extension
to deal with this, and shortcut the use of MPI user-defined types.
This Intel compiler extension apparently uses under the hood the same idea of a
temporary array that you used programatically in one of the 'bide' program 
versions
that you sent in your original message.
The temporary array is used to ship data to/from contiguous/non-contiguous 
memory before/after the MPI call is invoked.
I presume this Intel compiler extension would work with libraries other than 
MPI,
whenever the library doesn't understand the Fortran-90 array-section notation.
I never used this extension, though.
For one thing, this solution may not be portable to other compilers.
Another aspect to consider is how much 'under the hood memory allocation' this 
solution
would require if the array you pass to MPI_BCAST is really big,
and how much this may impact performance.

I hope this helps,
Gus Correa

On Dec 14, 2011, at 11:03 AM, Patrick Begou wrote:

Thanks all for your anwers. yes, I understand well that it is a non contiguous 
memory access problem as the MPI_BCAST should wait for a pointer on a valid 
memory  zone. But I'm surprised that with the MPI module usage Fortran does not 
hide this discontinuity in a contiguous temporary copy of the array. I've spent 
some time to build openMPI with g++/gcc/ifort (to create the right mpi module) 
and ran some additional tests:


Default OpenMPI is openmpi-1.2.8-17.4.x86_64

# module load openmpi
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
            0           1           2           3           0           1       
    2           3           0           1           2           3           0   
        1           2           3
# module unload openmpi
The result is Ok but sometime it hangs (when I require are a lot of processes)

With OpenMPI 1.4.4 and gfortran from gcc-fortran-4.5-19.1.x86_64

# module load openmpi-1.4.4-gcc-gfortran
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
            0          -1          -1          -1           0          -1       
   -1          -1           0          -1          -1          -1           0   
       -1          -1          -1
# module unload openmpi-1.4.4-gcc-gfortran
Node 0 only update the global array with it's subarray. (i only print node 0 
result)


With OpenMPI 1.4.4 and ifort 10.1.018 (yes, it's quite old, I have the latest 
one but it isn't installed!)

# module load openmpi-1.4.4-gcc-intel
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
ess.F90(15): (col. 5) remark: LOOP WAS VECTORIZED.
            0          -1          -1          -1           0          -1
           -1          -1           0          -1          -1          -1
            0          -1          -1          -1

# mpif90 -check arg_temp_created ess.F90&&  mpirun -np 4 ./a.out
gives a lot of messages like:
forrtl: warning (402): fort: (1): In call to MPI_BCAST1DI4, an array temporary 
was created for argument #1

So a temporary array is created for each call. So where is the problem ?

About the fortran compiler, I'm using similar behavior (non contiguous 
subarrays) in MPI_sendrecv calls and all is working fine: I ran some intensive 
tests from 1 to 128 processes on my quad-core workstation. This Fortran 
solution was easier than creating user defined data types.

Can you reproduce this behavior with the test case ? What are your OpenMPI and 
Gfortran/ifort versions ?

Thanks again

Patrick

The test code:

PROGRAM bide
USE mpi
    IMPLICIT NONE
    INTEGER :: nbcpus
    INTEGER :: my_rank
    INTEGER :: ierr,i,buf
    INTEGER, ALLOCATABLE:: tab(:,:)

     CALL MPI_INIT(ierr)
     CALL MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
     CALL MPI_COMM_SIZE(MPI_COMM_WORLD,nbcpus,ierr)

     ALLOCATE (tab(0:nbcpus-1,4))

     tab(:,:)=-1
     tab(my_rank,:)=my_rank
     DO i=0,nbcpus-1
        CALL MPI_BCAST(tab(i,:),4,MPI_INTEGER,i,MPI_COMM_WORLD,ierr)
     ENDDO
     IF (my_rank .EQ. 0) print*,tab
     CALL MPI_FINALIZE(ierr)

END PROGRAM bide

-- =============================================================== | Equipe 
M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | ------------ | | LEGI | 
mailto:patrick.be...@hmg.inpg.fr | | BP 53 X | Tel 04 76 82 51 35 | | 38041 
GRENOBLE CEDEX | Fax 04 76 82 52 71 | 
===============================================================
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_BCAST and fortran subarrays

Reply via email to