I generally build Open MPI from a source rpm (and I'm the author of that srpm's spec file). That way, Open MPI is built consistently between linux distros...

I'm running into an issue that works on one distro; breaks on another. I'd like to track down where the bug is (the distro, or Open MPI) Since one distro is still a prerelease version, I'm quite willing to believe that it's a problem with the distro, but just in case...

I'm using InfiniBand (openib.org RC4), and presta's 'allred' and 'com' tests. Open MPI, the IB libraries, and the test are compiled from the same set of source RPMS on each distro.

I've got one machine, using Fedora Core 4 (gcc 4.0.0), vanilla linux kernel 2.6.16, and Open MPI 1.0.2.

With FC4, things work fine (for a sufficiently small number of nodes -- see ticket #40)
'mpirun -np 4 -machinefile foo allred 10 10 10'
'mpirun -np 4 -machinefile foo com -o 100'

distro X (pre-release version, and I don't want to violate any NDA's I don't know about...), is using GCC 4.1.0, distro kernel 2.6.16, and Open MPI 1.0.2

This time, when I try to run presta's 'allred', I receive the following:
[n1:04214] *** An error occurred in MPI_Gather
[n1:04214] *** on communicator MPI_COMM_WORLD
[n1:04214] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:04214] *** MPI_ERRORS_ARE_FATAL (goodbye)
[n1:04215] *** An error occurred in MPI_Gather
[n1:04215] *** on communicator MPI_COMM_WORLD
[n1:04215] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:04215] *** MPI_ERRORS_ARE_FATAL (goodbye)

Another note:  On FC4, openib works, TCP doesn't (see ticket #41).

the 'com' test ends with:
[n1:04941] *** An error occurred in MPI_Gather
[n1:04941] *** on communicator MPI_COMM_WORLD
[n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind
[n1:04941] *** MPI_ERRORS_ARE_FATAL (goodbye)

note:  The error is identical for TCP and openib
note:  On FC4, openib works, TCP doesn't (see ticket #41).

And yes, I'm going to try out the dev snapshots of 1.0.3 and 1.1... I'm just not there yet...

(For those tracking tickets #40 and #41 -- I know it would be nice to see if distro X has same the behavior I see with FC4, but I don't have the hardware to do any sort of scale testing with distro X.)
--
Troy Telford

Reply via email to