Glen,

Thanks for the spending time benchmarking OpenMPI and for sending us the
feedback. We know we have some issues on the 1.0.2 version, more precisely
with the collective communications. We just look inside the CMAQ code, and
there are a lot of reduce and Allreduce. As it look like the collective
are intensively used it's normal that the 1.0.2a4 is slower than MPICH (I
expect the same behaviour for both MPICH1 and MPICH2). The collective are
now fixed in the nightly build, we are working toward moving them on the
next stable release. Until then, if you can redo the benchmark with one of
the nightly build that will be very usefull. I'm confident that the
results will improve considerably.

For the second problem, Brian is taking a look. He identify the problem,
we just have to figure out how to solve it correctly. It will go in the
trunk shorthly.

  Thanks,
    george.

On Thu, 2 Feb 2006, Glen Kaukola wrote:

> Hi everyone,
>
> I recently took Open MPI (1.0.2a4) for a spin and thought you all might
> like to see how it's currently stacking up against MPICH (1.2.7p1).  The
> benchmark I used was the EPA's CMAQ (Community Multiscale Air Quality)
> model.
>
> Now bear in mind my results aren't completely scientific.  For one thing
> I'd need to run a series of jobs and take the averages.  Forgive me if
> I'm too lazy to do that.  I also didn't go through the trouble of
> completely isolating my jobs while they were running.  However, I did
> monitor them pretty closely and I'm fairly certain no jobs from other
> users crept in on the machines I was using.
>
> Anyway, without further ado, here are my results (in h:mm):
>
> Open MPI
> 1 cpu job: 2:38
> 2 cpu job: 1:26
> 4 cpu job: 1:38
> 8 cpu job: 1:08
> 36 cpu job: 3:09
>
> MPICH
> 1 cpu job: 2:38
> 2 cpu job: 1:27
> 4 cpu job: 0:48
> 8 cpu job: 0:32
>
> And while Open MPI does seem a bit slower, one real nice thing I can say
> is that a 16+ cpu job runs without a hitch.  I could never get away with
> that while using MPICH, as the jobs would just crash.  Whether MPICH is
> at fault, or the CMAQ code is buggy, or gigabit ethernet just isn't good
> enough, I really couldn't say.  But Open MPI sure doesn't seem to have
> that problem.
>
> It's also rather odd how the 4 cpu Open MPI job takes longer than the 2
> cpu Open MPI job.  In fact that's slightly faster compared to the first
> time I ran a 4 cpu Open MPI job (I couldn't believe it the first time so
> I reran that one).
>
> And on a totally unrelated note, after swapping out MPICH for Open MPI,
> I can't seem to background my scripts.  When I do my shell (bash) tells
> me the job has stopped.  Somewhat annoying.
>
> Anyway, keep up the good work.  I'll be paying close attention, and
> hopefully see some speedups in the not too distant future.
>
>
> Glen
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

"We must accept finite disappointment, but we must never lose infinite
hope."
                                  Martin Luther King

Reply via email to