[OMPI users] freezing in mpi_allreduce operation

2011-09-08 Thread Greg Fischer
I am seeing mpi_allreduce operations freeze execution of my code on some
moderately-sized problems.  The freeze does not manifest itself in every
problem.  In addition, it is in a portion of the code that is repeated many
times.  In the problem discussed below, the problem appears in the 60th
iteration.

The current test case that I'm looking at is a 64-processor job.  This
particular mpi_allreduce call applies to all 64 processors, with each
communicator in the call containing a total of 4 processors.  When I add
print statements before and after the offending line, I see that all 64
processors successfully make it to the mpi_allreduce call, but only 32
successfully exit.  Stack traces on the other 32 yield something along the
lines of the trace listed at the bottom of this message.  The call, itself,
looks like:

 call mpi_allreduce(MPI_IN_PLACE,
phim(0:(phim_size-1),1:im,1:jm,1:kmloc(coords(2)+1),grp), &

phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)

These messages are sized to remain under the 32-bit integer size limitation
for the "count" parameter.  The intent is to perform the allreduce operation
on a contiguous block of the array.  Previously, I had been passing an
assumed-shape array (i.e. phim(:,:,:,:,grp), but found some documentation
indicating that was potentially dangerous.  Making the change from assumed-
to explicit-shaped arrays doesn't solve the problem.   However, if I declare
an additional array and use separate send and receive buffers:

 call
mpi_allreduce(phim_local,phim_global,phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
 phim(:,:,:,:,grp) = phim_global

Then the problem goes away, and every thing works normally.  Does anyone
have any insight as to what may be happening here?  I'm using "include
'mpif.h'" rather than the f90 module, does that potentially explain this?

Thanks,
Greg

Stack trace(s) for thread: 1
-
[0] (1 processes)
-
main() at ?:?
  solver() at solver.f90:31
solver_q_down() at solver_q_down.f90:52
  iter() at iter.f90:56
mcalc() at mcalc.f90:38
  pmpi_allreduce__() at ?:?
PMPI_Allreduce() at ?:?
  ompi_coll_tuned_allreduce_intra_dec_fixed() at ?:?
ompi_coll_tuned_allreduce_intra_ring_segmented() at ?:?
  ompi_coll_tuned_sendrecv_actual() at ?:?
ompi_request_default_wait_all() at ?:?
  opal_progress() at ?:?
Stack trace(s) for thread: 2
-
[0] (1 processes)
-
start_thread() at ?:?
  btl_openib_async_thread() at ?:?
poll() at ?:?
Stack trace(s) for thread: 3
-
[0] (1 processes)
-
start_thread() at ?:?
  service_thread_start() at ?:?
select() at ?:?


Re: [OMPI users] freezing in mpi_allreduce operation

2011-09-08 Thread Greg Fischer
Note also that coding the mpi_allreduce as:

   call
mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)

results in the same freezing behavior in the 60th iteration.  (I don't
recall why the arrays were being passed, possibly just a mistake.)


On Thu, Sep 8, 2011 at 4:17 PM, Greg Fischer wrote:

> I am seeing mpi_allreduce operations freeze execution of my code on some
> moderately-sized problems.  The freeze does not manifest itself in every
> problem.  In addition, it is in a portion of the code that is repeated many
> times.  In the problem discussed below, the problem appears in the 60th
> iteration.
>
> The current test case that I'm looking at is a 64-processor job.  This
> particular mpi_allreduce call applies to all 64 processors, with each
> communicator in the call containing a total of 4 processors.  When I add
> print statements before and after the offending line, I see that all 64
> processors successfully make it to the mpi_allreduce call, but only 32
> successfully exit.  Stack traces on the other 32 yield something along the
> lines of the trace listed at the bottom of this message.  The call, itself,
> looks like:
>
>  call mpi_allreduce(MPI_IN_PLACE,
> phim(0:(phim_size-1),1:im,1:jm,1:kmloc(coords(2)+1),grp), &
>
> phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>
> These messages are sized to remain under the 32-bit integer size limitation
> for the "count" parameter.  The intent is to perform the allreduce operation
> on a contiguous block of the array.  Previously, I had been passing an
> assumed-shape array (i.e. phim(:,:,:,:,grp), but found some documentation
> indicating that was potentially dangerous.  Making the change from assumed-
> to explicit-shaped arrays doesn't solve the problem.   However, if I declare
> an additional array and use separate send and receive buffers:
>
>  call
> mpi_allreduce(phim_local,phim_global,phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>  phim(:,:,:,:,grp) = phim_global
>
> Then the problem goes away, and every thing works normally.  Does anyone
> have any insight as to what may be happening here?  I'm using "include
> 'mpif.h'" rather than the f90 module, does that potentially explain this?
>
> Thanks,
> Greg
>
> Stack trace(s) for thread: 1
> -
> [0] (1 processes)
> -
> main() at ?:?
>   solver() at solver.f90:31
> solver_q_down() at solver_q_down.f90:52
>   iter() at iter.f90:56
> mcalc() at mcalc.f90:38
>   pmpi_allreduce__() at ?:?
> PMPI_Allreduce() at ?:?
>   ompi_coll_tuned_allreduce_intra_dec_fixed() at ?:?
> ompi_coll_tuned_allreduce_intra_ring_segmented() at ?:?
>   ompi_coll_tuned_sendrecv_actual() at ?:?
> ompi_request_default_wait_all() at ?:?
>   opal_progress() at ?:?
> Stack trace(s) for thread: 2
> -
> [0] (1 processes)
> -
> start_thread() at ?:?
>   btl_openib_async_thread() at ?:?
> poll() at ?:?
> Stack trace(s) for thread: 3
> -
> [0] (1 processes)
> -
> start_thread() at ?:?
>   service_thread_start() at ?:?
> select() at ?:?
>


[OMPI users] OMPI error in MPI_Cart_create (in code that works with MPICH2)

2009-09-01 Thread Greg Fischer
I'm receiving the error posted at the bottom of this message with a code
compiled with Intel Fortran/C Version 11.1 against OpenMPI version 1.3.2.

The same code works correctly when compiled against MPICH2.  (We have
re-compiled with OpenMPI to take advantage of newly-installed Infiniband
hardware.  The "ring" test problem appears to work correctly over
Infiniband.)

There are no "fork()" calls in our code, so I can only guess that something
weird is going on with MPI_COMM_WORLD.  The code in question is a Fortran 90
code.  Right now, it is being compiled with "include 'mpif.h'" statements at
the beginning of each MPI subroutine, instead of  making use of the "mpi"
modules.  Could this be causing the problem?  How else should I go about
diagnosing the problem?

Thanks,
Greg

--
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:  bl316 (PID 26806)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--
[bl205:5014] *** An error occurred in MPI_Cart_create
[bl205:5014] *** on communicator MPI_COMM_WORLD
[bl205:5014] *** MPI_ERR_ARG: invalid argument of some other kind
[bl205:5014] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

--
mpirun has exited due to process rank 4 with PID 5010 on
node bl205 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[bl205:05008] 7 more processes have sent help message help-mpi-errors.txt /
mpi_errors_are_fatal
[bl205:05008] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages


Re: [OMPI users] OMPI error in MPI_Cart_create (in code that works withMPICH2)

2009-09-02 Thread Greg Fischer
Thanks, Jeff.

OK, I've found the offending code and gotten rid of the fork() warning.  I'm
still left with this:

[bl302:26556] *** An error occurred in MPI_Cart_create
[bl302:26556] *** on communicator MPI_COMM_WORLD
[bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind
[bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--
mpirun has exited due to process rank 4 with PID 13693 on
node bl316 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[bl316:13691] 7 more processes have sent help message help-mpi-errors.txt /
mpi_errors_are_fatal
[bl316:13691] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

I'm going to try re-compiling OpenMPI, itself, with the Intel compilers.
Any other ideas?


On Wed, Sep 2, 2009 at 12:01 AM, Jeff Squyres  wrote:

> *Something* in your code is calling fork() -- it may be an indirect call
> such as system() or popen() or somesuch.  This particular error message is
> only printed during a "fork pre-hook" that Open MPI installs during MPI_INIT
> (registered via pthread_atfork()).
>
> Grep through your code for calls to system and popen -- see if any of these
> are used.
>
> There is no functional difference between "include 'mpif.h'" and "use mpi"
> in terms of MPI functionality at run time -- the only difference you get is
> a "better" level of compile-time protection from the Fortran compiler.
>  Specifically, "use mpi" will introduce strict type checking for many (but
> not all) of the MPI functions at compile time.  Hence, the compiler will
> complain if you forget an IERR parameter to an MPI function, for example.
>
> "use mpi" is not perfect, though -- there are many well-documented problems
> because of the design of the MPI-2 Fortran 90 interface (which are currently
> being addressed in MPI-3, if you care :-) ).  More generally: "use mpi" will
> catch *many* compile errors, but not *all* of them.
>
> But to answer your question succinctly: this problem won't be affected by
> using "use mpi" or "include 'mpif.h'".
>
>
>
>
> On Sep 1, 2009, at 9:02 PM, Greg Fischer wrote:
>
>  I'm receiving the error posted at the bottom of this message with a code
>> compiled with Intel Fortran/C Version 11.1 against OpenMPI version 1.3.2.
>>
>> The same code works correctly when compiled against MPICH2.  (We have
>> re-compiled with OpenMPI to take advantage of newly-installed Infiniband
>> hardware.  The "ring" test problem appears to work correctly over
>> Infiniband.)
>>
>> There are no "fork()" calls in our code, so I can only guess that
>> something weird is going on with MPI_COMM_WORLD.  The code in question is a
>> Fortran 90 code.  Right now, it is being compiled with "include 'mpif.h'"
>> statements at the beginning of each MPI subroutine, instead of  making use
>> of the "mpi" modules.  Could this be causing the problem?  How else should I
>> go about diagnosing the problem?
>>
>> Thanks,
>> Greg
>>
>> --
>> An MPI process has executed an operation involving a call to the
>> "fork()" system call to create a child process.  Open MPI is currently
>> operating in a condition that could result in memory corruption or
>> other system errors; your MPI job may hang, crash, or produce silent
>> data corruption.  The use of fork() (or system() or other calls that
>> create child processes) is strongly discouraged.
>>
>> The process that invoked fork was:
>>
>>  Local host:  bl316 (PID 26806)
>>  MPI_COMM_WORLD rank: 0
>>
>> If you are *absolutely sure* that your application will successfully
>> and correctly survive a call to fork(), you may disable this warning
>> by setting the mpi_warn_on_fork MCA parameter to 0.
>> --
>> [bl205:5014] *** An error occurred in MPI_Cart_create
>> [bl205:5014] *** on communicator MPI_COMM_WORLD
>> [bl205:5014] *** MPI_ERR_ARG: invalid argument of some other kind
>> [bl205:5014] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>
>> --
>> mpirun has exited due to process rank 4 

[OMPI users] error compiling OpenMPI 1.3.3 with Intel compiler suite 11.1 on Linux

2009-09-04 Thread Greg Fischer
I'm attempting to compile OpenMPI version 1.3.3 with Intel C/C++/Fortran
version 11.1.046.  Others have reported success using these compilers (
http://software.intel.com/en-us/forums/intel-c-compiler/topic/68111/).  The
line where compilation fails is included at the end of this message.  I have
also attached complete "./configure" and "make" outputs.  Does anyone have
any insight as to what I'm doing wrong?

Thanks,
Greg

icpc11.1 -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include
-I../../../ompi/include
-I../../../opal/mca/paffinity/linux/plpa/src/libplpa
-DOMPI_CONFIGURE_USER="\"fischega\"" -DOMPI_CONFIGURE_HOST="\"susedev1\""
-DOMPI_CONFIGURE_DATE="\"Fri Sep  4 09:53:03 EDT 2009\""
-DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\""
-DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -restrict -pthread
-fvisibility=hidden\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../..  \""
-DOMPI_BUILD_CXXFLAGS="\"-O3 -DNDEBUG -finline-functions -pthread\""
-DOMPI_BUILD_CXXCPPFLAGS="\"-I../../..  \"" -DOMPI_BUILD_FFLAGS="\"\""
-DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\"-export-dynamic  \""
-DOMPI_BUILD_LIBS="\"-lnsl -lutil  \""
-DOMPI_CC_ABSOLUTE="\"/usr/scripts/icc11.1\""
-DOMPI_CXX_ABSOLUTE="\"/usr/scripts/icpc11.1\""
-DOMPI_F77_ABSOLUTE="\"/usr/scripts/ifort11.1\""
-DOMPI_F90_ABSOLUTE="\"/usr/scripts/ifort11.1\""
-DOMPI_F90_BUILD_SIZE="\"small\"" -I../../..-O3 -DNDEBUG
-finline-functions -pthread -MT components.o -MD -MP -MF $depbase.Tpo -c -o
components.o components.cc &&\
mv -f $depbase.Tpo $depbase.Po
icpc: error #10236: File not found:  'Sep'
icpc: error #10236: File not found:  '4'
icpc: error #10236: File not found:  '09:53:03'
icpc: error #10236: File not found:  'EDT'
icpc: error #10236: File not found:  '2009"'
icpc: error #10236: File not found:  'Sep'
icpc: error #10236: File not found:  '4'
icpc: error #10236: File not found:  '10:11:04'
icpc: error #10236: File not found:  'EDT'
icpc: error #10236: File not found:  '2009"'
icpc: command line warning #10159: invalid argument for option
'-fvisibility'
icpc: error #10236: File not found:  '"'
icpc: command line warning #10156: ignoring option '-p'; no argument
required
icpc: error #10236: File not found:  '"'
icpc: error #10236: File not found:  '"'
icpc: error #10236: File not found:  '"'
make[2]: *** [components.o] Error 1
make[2]: Leaving directory
`/home/fischega/src/openmpi-1.3.3/ompi/tools/ompi_info'


ompi-output.tar.bz2
Description: BZip2 compressed data


Re: [OMPI users] error compiling OpenMPI 1.3.3 with Intel compilersuite 11.1 on Linux

2009-09-05 Thread Greg Fischer
Yep, that was it.

The icpc11.1, ifort11.1, and icc11.1 scripts are included in the tar file
attached to my original email.  They set the PATH, LD_LIBRARY_PATH, and
INTEL_LICENSE_FILE correctly.  When I set the environment variables manually
and use the regular icpc, ifort, and icc commands, it works fine.  Good
catch!

Thanks,
Greg

On Fri, Sep 4, 2009 at 11:54 PM, Jeff Squyres  wrote:

> Can you clarify what icpc11.1 is?  Is it a sym link to the icpc 11.1
> compiler, or is it a shell script that ends up invoking the icpc v11.1
> compiler?
>
> I ask because the compile line in question ends up with a complex quoting
> scheme that includes a token with spaces in it:
>
>-DOMPI_CONFIGURE_DATE="\"Fri Sep  4 09:53:03 EDT 2009\""
>
> If icpc11.1 is a shell script that ends up invoking the real icpc compiler
> underneath, I could see how the quoting might get screwed up and end up
> passing "Sep" (and following) as individual tokens rather than One Big Token
> (including quotes).
>
> That's just a first guess -- can you check to see if this is happening?
>
>
>
>
> On Sep 4, 2009, at 5:28 PM, Greg Fischer wrote:
>
>  I'm attempting to compile OpenMPI version 1.3.3 with Intel C/C++/Fortran
>> version 11.1.046.  Others have reported success using these compilers (
>> http://software.intel.com/en-us/forums/intel-c-compiler/topic/68111/).
>>  The line where compilation fails is included at the end of this message.  I
>> have also attached complete "./configure" and "make" outputs.  Does anyone
>> have any insight as to what I'm doing wrong?
>>
>> Thanks,
>> Greg
>>
>> icpc11.1 -DHAVE_CONFIG_H -I. -I../../../opal/include
>> -I../../../orte/include -I../../../ompi/include
>> -I../../../opal/mca/paffinity/linux/plpa/src/libplpa
>>  -DOMPI_CONFIGURE_USER="\"fischega\"" -DOMPI_CONFIGURE_HOST="\"susedev1\""
>> -DOMPI_CONFIGURE_DATE="\"Fri Sep  4 09:53:03 EDT 2009\""
>> -DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\""
>> -DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O3 -DNDEBUG
>> -finline-functions -fno-strict-aliasing -restrict -pthread
>> -fvisibility=hidden\"" -DOMPI_BUILD_CPPFLAGS="\"-I../../..  \""
>> -DOMPI_BUILD_CXXFLAGS="\"-O3 -DNDEBUG -finline-functions -pthread\""
>> -DOMPI_BUILD_CXXCPPFLAGS="\"-I../../..  \"" -DOMPI_BUILD_FFLAGS="\"\""
>> -DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\"-export-dynamic  \""
>> -DOMPI_BUILD_LIBS="\"-lnsl -lutil  \""
>> -DOMPI_CC_ABSOLUTE="\"/usr/scripts/icc11.1\""
>> -DOMPI_CXX_ABSOLUTE="\"/usr/scripts/icpc11.1\""
>> -DOMPI_F77_ABSOLUTE="\"/usr/scripts/ifort11.1\""
>> -DOMPI_F90_ABSOLUTE="\"/usr/scripts/ifort11.1\""
>> -DOMPI_F90_BUILD_SIZE="\"small\"" -I../../..-O3 -DNDEBUG
>> -finline-functions -pthread -MT components.o -MD -MP -MF $depbase.Tpo -c -o
>> components.o components.cc &&\
>> mv -f $depbase.Tpo $depbase.Po
>> icpc: error #10236: File not found:  'Sep'
>> icpc: error #10236: File not found:  '4'
>> icpc: error #10236: File not found:  '09:53:03'
>> icpc: error #10236: File not found:  'EDT'
>> icpc: error #10236: File not found:  '2009"'
>> icpc: error #10236: File not found:  'Sep'
>> icpc: error #10236: File not found:  '4'
>> icpc: error #10236: File not found:  '10:11:04'
>> icpc: error #10236: File not found:  'EDT'
>> icpc: error #10236: File not found:  '2009"'
>> icpc: command line warning #10159: invalid argument for option
>> '-fvisibility'
>> icpc: error #10236: File not found:  '"'
>> icpc: command line warning #10156: ignoring option '-p'; no argument
>> required
>> icpc: error #10236: File not found:  '"'
>> icpc: error #10236: File not found:  '"'
>> icpc: error #10236: File not found:  '"'
>> make[2]: *** [components.o] Error 1
>> make[2]: Leaving directory
>> `/home/fischega/src/openmpi-1.3.3/ompi/tools/ompi_info'
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] best way to ALLREDUCE multi-dimensional arrays in Fortran?

2009-09-24 Thread Greg Fischer
(I apologize in advance for the simplistic/newbie question.)

I'm performing an ALLREDUCE operation on a multi-dimensional array.  This
operation is the biggest bottleneck in the code, and I'm wondering if
there's a way to do it more efficiently than what I'm doing now.  Here's a
representative example of what's happening:

   ir=1
   do ikl=1,km
 do ij=1,jm
   do ii=1,im
 albuf(ir)=array(ii,ij,ikl,nl,0,ng)
 ir=ir+1
   enddo
 enddo
   enddo
   agbuf=0.0
   call
mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
   ir=1
   do ikl=1,km
 do ij=1,jm
   do ii=1,im
 phim(ii,ij,ikl,nl,0,ng)=agbuf(ir)
 ir=ir+1
   enddo
 enddo
   enddo

Is there any way to just do this in one fell swoop, rather than buffering,
transmitting, and unbuffering?  This operation is looped over many times.
Are there savings to be had here?

Thanks,
Greg


Re: [OMPI users] best way to ALLREDUCE multi-dimensional arrays in Fortran?

2009-09-25 Thread Greg Fischer
It looks like the buffering operations consume about 15% as much time as the
allreduce operations.  Not huge, but not trivial, all the same.  Is there
any way to avoid the buffering step?



On Thu, Sep 24, 2009 at 6:03 PM, Eugene Loh  wrote:

>  Greg Fischer wrote:
>
> (I apologize in advance for the simplistic/newbie question.)
>
> I'm performing an ALLREDUCE operation on a multi-dimensional array.  This
> operation is the biggest bottleneck in the code, and I'm wondering if
> there's a way to do it more efficiently than what I'm doing now.  Here's a
> representative example of what's happening:
>
>ir=1
>do ikl=1,km
>  do ij=1,jm
>do ii=1,im
>  albuf(ir)=array(ii,ij,ikl,nl,0,ng)
>  ir=ir+1
>enddo
>  enddo
>enddo
>agbuf=0.0
>call
> mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>ir=1
>do ikl=1,km
>  do ij=1,jm
>do ii=1,im
>  phim(ii,ij,ikl,nl,0,ng)=agbuf(ir)
>  ir=ir+1
>enddo
>  enddo
>enddo
>
> Is there any way to just do this in one fell swoop, rather than buffering,
> transmitting, and unbuffering?  This operation is looped over many times.
> Are there savings to be had here?
>
> There are three steps here:  buffering, transmitting, and unbuffering.  Any
> idea how the run time is distributed among those three steps?  E.g., if most
> time is spent in the MPI call, then combining all three steps into one is
> unlikely to buy you much... and might even hurt.  If most of the time is
> spent in the MPI call, then there may be some tuning of collective
> algorithms to do.  I don't have any experience doing this with OMPI.  I'm
> just saying it makes some sense to isolate the problem a little bit more.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] strange performance fluctuations and problems with mpif90-vt

2009-10-09 Thread Greg Fischer
I'm seeing some sporadic strange behavior in one of our MPI codes.  Here are
selected portions of the output:

---
|   |   |im |jm |km |  phi0   | | iter | sync |mcalc ||
|grp|itn|loc|loc|loc|Max Error|   NSR   |t(sec)|t(sec)|t(sec)| sysbal |
---
   1   2   1   1   9 1.000E+00 1.000E+00 16.789 15.923  0.079 1.00E+00
   1   3   1   1   5 1.000E+00 1.000E+00 16.800 15.935  0.078 1.00E+00
   1   4   1   1   1 1.000E+00 1.000E+00 17.500 15.906  0.079 1.00E+00
...
  11   7  18 118  84 1.485E-01 1.117E+00 16.600 15.929  0.077 1.00E+00
  11   8  20 124  84 1.516E-01 1.021E+00 16.600 15.929  0.077 1.00E+00
  11   9  21 127  86 1.596E-01 1.053E+00  1.253  0.450  0.083 1.00E+00
  11  10   7 131  88 1.290E-01 8.083E-01  0.808  0.014  0.272 1.00E+00
  11  11   7 131  85 8.267E-02 6.408E-01  1.000  0.002  0.262 1.00E+00
...
 101  10  25 111  77 5.690E-02 8.179E-01  0.480  0.023  0.087 1.00E+00
 101  11  32 113  77 4.782E-02 8.404E-01  0.479  0.023  0.087 1.00E+00
 101  12  37 116  79 4.330E-02 9.055E-01  0.479  0.023  0.087 1.00E+00

This is an iterative calculation.  The critical quantity of interest is
"iter t(sec)", which is the time per iteration.  (The other "t(sec)"
quantities are subsets of "iter t(sec)".)  Between "grp" 1 and 111, the
calculation is not becoming appreciably more or less difficult, yet there is
a factor of ~30 difference in performance between the beginning and the
end.  This problem does not appear all of the time.  In many cases,
performance is good throughout the entire calculation.  ("Good", here, is
being defined as what is seen in grp 101 above, which is roughly what I
expect to be seeing.)  However, when the problem does appear, it seems to
mysteriously go away after grinding through the calculation for a while.

Has anyone ever seen behavior like this?  Any thoughts as to what could be
causing it?

I tried to recompile the code with mpif90-vt and mpicc-vt, in hopes that the
vampirtrace outputs might shine some light as to the true nature of the
problem.  After recompiling, the code complains:

[lx102:15254] *** An error occurred in MPI_Cart_create
[lx102:15254] *** on communicator MPI_COMM_WORLD
[lx102:15254] *** MPI_ERR_ARG: invalid argument of some other kind
[lx102:15254] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

...and then crashes out before doing anything useful.  My understanding is
that I only need to use the -vt compiler wrappers, and it will automatically
"instrument" my code.  Is there something else I should be doing?

Thanks
Greg


[OMPI users] MPI-IO: reading an unformatted binary fortran file

2009-06-11 Thread Greg Fischer
Hello,

I'm attempting to wrap my brain around the MPI I/O mechanisms, and I was
hoping to find some guidance.  I'm trying to read a file that contains a
117-character string, followed by a series records that contain integers and
reals.  The following code would read it in serial:

---
character(len=117) :: cfx1

read (nin) cfx1
do i=1,end_of_file
  read(nin) integer1,integer2,real1,real2,real3,real4,real5,real6,real7
enddo
---

To simplify the problem, I removed the "cfx1" string from the file I'm
reading, and created an MPI_TYPE_STRUCT as follows:

---
  length( 1 ) =
1

  length( 2 ) =
2

  length( 3 ) =
7

  length( 3 ) =
1

  disp( 1 ) =
0

  disp( 2 ) = sizeof( MPI_LB
)

  disp( 3 ) = disp( 2 ) + 2*sizeof(MPI_INTEGER)
  disp( 4 ) = disp( 3 ) + 7*sizeof(MPI_REAL)
  type( 1 ) = MPI_LB
  type( 2 ) = MPI_INTEGER
  type( 3 ) = MPI_REAL
  type( 4 ) = MPI_UB

  call MPI_TYPE_STRUCT( 4, length, disp, type, sptype, ierr )
  call MPI_TYPE_COMMIT( sptype, ierr )
---

I then open the file, set the view as follows and try to do a read:

---
  mode = MPI_MODE_RDONLY
  call MPI_FILE_OPEN( MPI_COMM_WORLD, filename, mode,
 +MPI_INFO_NULL, fh, ierr )

  offset = 0
  call MPI_FILE_SET_VIEW( fh, offset, sptype,
 +sptype, 'native', MPI_INFO_NULL, ierr )

  call MPI_FILE_READ( fh, sourcepart, 1, sptype,
 +   status, ierr )
---

where "sourcepart" is:

---
 type source_particle_datatype
integer :: ipt,idm
real :: xxx,yyy,zzz,uuu,vvv,www,erg
  end type
---

This almost works.  With some fiddling (I can't seem to make it work right
now), I'm able to get most of the reals and integers into "sourcepart", but
something doesn't line up quite correctly.  I've spent a lot of time looking
at the documentation and tutorials on the web, but haven't found a resource
that helps me work through this problem.

Ultimately, the objective will be to allow an arbitrary number of processes
read this file, with each record being uniquely read by a single process.
(e.g. process 1 read record 1, process 2 reads record 2, process 1 reads
record 3, process 2 reads record 4, etc.)

What's the best way to skin this cat?  Any assistance would be greatly
appreciated.

Thanks,
Greg