Eugene Loh wrote:
RightCFD wrote:

    Date: Thu, 29 Oct 2009 15:45:06 -0400
    From: Brock Palen <bro...@umich.edu <mailto:bro...@umich.edu>>
    Subject: Re: [OMPI users] mpi functions are slow when first called and
           become normal afterwards
    To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
    Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu
    <mailto:890cc430-68b0-4307-8260-24a6fadae...@umich.edu>>
    Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

    > When MPI_Bcast and MPI_Reduce are called for the first time, they
    > are very slow. But after that, they run at normal and stable speed.
    > Is there anybody out there who have encountered such problem? If you
    > need any other information, please let me know and I'll provide.
    > Thanks in advance.

    This is expected, and I think you can dig though the message archive
    to find the answer.  OMPI does not wireup all the communication at
    startup, thus the first time you communicate with a host the
    connection is made, but after that it is fast because it is already
    open.  This behavior is expected, and is needed for very large systems
    where you could run out of sockets for some types of communication
    with so many hosts.

    Brock Palen
    www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
    Center for Advanced Computing
    bro...@umich.edu <mailto:bro...@umich.edu>
    (734)936-1985

    Thanks for your reply. I am surprised to know this is an expected
    behavior of OMPI. I searched the archival but did not find many
    relevant messages. I am wondering why other users of OMPI do not
    complain this. Is there a way to avoid this when timing an MPI
    program?

An example of this is the NAS Parallel Benchmarks, which have been around nearly 20 years. They:

*) turn timers on after MPI_Init and off before MPI_Finalize
*) execute at least one iteration before starting timers

Even so, with at least one of the NPB tests and with at least one MPI implementation, I've seen more than one iteration needed to warm things up. That is, if you timed each iteration, you could see that multiple iterations were needed to warm everything up. In performance analysis, it is reasonably common to expect to have to run multiple iterations and correct data set size to get representative behavior.



And I would guess in OpenMPI, maybe in other implementations too,
the time you spend warming up, probing the best way to do things,
is widely compensated for during steady state execution,
if the number of iterations is not very small.
This seems to be required to accommodate for the large variety
of hardware and software platforms, and be efficient on all of them.
Right?

AFAIK, other high quality software (e.g. FFTW)
do follow a similar rationale.

Gus Correa

------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to