hi, I have found the reason that cause the problem, when call MPI_Alltoall(v) with many processes(300 for instanse), it will build lots of connections in a very short space of time. This situation causes my network card drops lots of packages including the 'syn' packages, as the result, the connection building failed. After I figure this porblem out, my program works well.
By the way, I think the error infomation is not reasonable, which takes me lots of time to fix this bug : ) Thanks Xianjun 在 2011年5月20日 下午7:26,Jeff Squyres <jsquy...@cisco.com>写道: > I missed this email in my INBOX, sorry. > > Can you be more specific about what exact error is occurring? You just say > that the application crashes...? Please send all the information listed > here: > > http://www.open-mpi.org/community/help/ > > > On Apr 26, 2011, at 10:51 PM, 孟宪军 wrote: > > > It seems that the const variable SOMAXCONN who used by listen() system > call causes this problem. Can anybody help me resolve this question? > > > > 2011/4/25 孟宪军 <xjun.m...@gmail.com> > > Dear all, > > > > As I mentioned, when I mpiruned an application with the parameter "np = > 150(or bigger)", the application who used the MPI_Alltoallv function would > carsh. The problem would recur no matter how many nodes we used. > > > > The edition of OpenMPI: 1.4.1 or 1.4.3 > > The OS: linux redhat 2.6.32 > > > > BTW, my nodes had enough memory to run the application, and the > MPI_Alltoall function worked well at my environment. > > Did anybody meet the same problem? Thanks. > > > > > > Best Regards > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >