On Fri, 02 Jun 2006 13:37:07 -0600, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

Troy --

Just to make sure I understand the issues:

- 1.1
  - presta com works fine
  - presta allred fails with the MPI_Gather error
- 1.0.3
  - presta com fails with MPI_Gather error
  - presta allred fails with the MPI_Gather error

And these all *only* fail on the pre-production Linux version you've
got; they all pass on FC4.

Is that correct?

Quite correct. (well, with caveats -- FC4 has shown some scaling issues that are in tickets #40 & #41; but Open MPI/FC4 works fine with -np 4)

If I didn't say so already, here's what I would add:
* If I add -mca btl tcp,sm,self (effectively disabling the openib mca), and allred works fine. If I use -mca btl openib,sm,self, it breaks. * If I use -mca btl tcp,sm,self with com, the error is the same as with -mca btl openib,sm,self. (And com works fine in either case with 1.1, but breaks with 1.0.3)

A bit of additional info: I am able to run linpack (hpl), HPCC, and IMB on Open MPI 1.1, 1.0.3, and 1.0.2 on this pre-production distro.

All tests were done with two nodes, each having two CPUs per node. (-np 4)
-----Original Message-----
From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] On Behalf Of Troy Telford
Sent: Friday, June 02, 2006 12:46 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib /compiler issue?

On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford
<ttelf...@linuxnetworx.com> wrote:

> the 'com' test ends with:
> [n1:04941] *** An error occurred in MPI_Gather
> [n1:04941] *** on communicator MPI_COMM_WORLD
> [n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind
> [n1:04941] *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> And yes, I'm going to try out the dev snapshots of 1.0.3
and 1.1... I'm
> just not there yet...

I've now tried it on 1.0.3 and 1.1 nightly builds:
***presta 'com'***
1.1 works fine (hooray!!!)

1.0.3 doesn't work fine (booo!!!!)
[n1:28313] *** An error occurred in MPI_Gather
[n1:28313] *** on communicator MPI_COMM_WORLD
[n1:28313] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:28313] *** MPI_ERRORS_ARE_FATAL (goodbye)

***presta 'allred' (allreduce)***
1.0.3 has the following error:
mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
[n1:28366] *** An error occurred in MPI_Gather
[n1:28366] *** on communicator MPI_COMM_WORLD
[n1:28366] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:28366] *** MPI_ERRORS_ARE_FATAL (goodbye)
[n1:28367] *** An error occurred in MPI_Gather
[n1:28367] *** on communicator MPI_COMM_WORLD
[n1:28367] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:28367] *** MPI_ERRORS_ARE_FATAL (goodbye)

1.1 has the following error:
mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
[n1:28536] *** An error occurred in MPI_Gather
[n1:28537] *** An error occurred in MPI_Gather
[n1:28537] *** on communicator MPI_COMM_WORLD
[n1:28537] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:28537] *** MPI_ERRORS_ARE_FATAL (goodbye)
[n1:28536] *** on communicator MPI_COMM_WORLD
[n1:28536] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:28536] *** MPI_ERRORS_ARE_FATAL (goodbye)

--
Troy Telford

Reply via email to