This usually means that you have a Open MPI version mismatch between some of your nodes. Meaning: on some nodes, you're finding version X.Y.Z of Open MPI by default, but on other nodes, you're finding version A.B.C.
On Oct 21, 2011, at 7:00 AM, devendra rai wrote: > Hello Community, > > I have been struggling with this error for quite some time: > > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > orte_grpcomm_modex failed > --> Returned "Data unpack would read past end of buffer" (-26) instead of > "Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun has exited due to process rank 1 with PID 18945 on > node tik35x.ethz.ch exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > I am running this on a cluster and this has started happening only after a > recent rebuild of openmpi-1.4.3. Interestingly, I have the same version of > openmpi on my PC, and the same application works fine. > > I have looked into this error on the web, but there is very little > discussion, on the causes, or how to correct it. I asked the admin to attempt > a re-install of openmpi, but I am not sure whether this will solve the > problem. > > Can some one please help? > > Thanks a lot. > > Best, > > Devendra Rai > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/