Hello Terry, Thanks for your answer.
2010/5/20 Terry Dontje <terry.don...@oracle.com> > Olivier Riff wrote: > > Hello, > > I assume this question has been already discussed many times, but I can not > find on Internet a solution to my problem. > It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous > system (32 bit laptop / 64 bit cluster). > My configuration is : > open mpi 1.4, configured with: --without-openib --enable-heterogeneous > --enable-mpi-threads > Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks > to do to a cluster of 70 processors (64 bit RedHat Entreprise > distribution): > I have to send various buffer size from few bytes till 30Mo. > > You really want to get your program running without the tcp_eager_limit > set if you want a better usage of memory. I believe the crash has something > to do with the rendezvous protocol in OMPI. Have you narrowed this failure > down to a simple MPI program? Also I noticed that you're configuring with > --enable-mpi-threads, have you tried configuring without that option? > > -> No, unfortunately I did not narrowed this behaviour to a simple MPI program. I think I will have to do it if I do not find a solution in the next days. I will also make the test without the --enable-mpi-threads configuration. > I tested following commands: > 1) mpirun -v -machinefile machinefile.txt MyMPIProgram > -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size > > 65536. > 2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile > machinefile.txt MyMPIProgram > -> works but has the effect of generating gigantic memory consumption on 32 > bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go > after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo > ). This makes my program crash later because I have no more memory to > allocate new structures. I read in a openmpi forum thread that setting > btl_tcp_eager_limit to a huge value explains this huge memory consumption > when a message sent does not have a preposted ready recv. Also after all > messages have been received and there is no more traffic activity : the > memory consumed remains at 2.1go... and I do not understand why. > > Are the 70 clients all on different nodes? I am curious if the 2.1GB is > due to the SM BTL or possibly a leak in the TCP BTL. > No, 70 clients are only on 9 nodes. In fact it is 72 clients: they are nine 8-processor machines. The 2.1Gb memory consumption appears when I sequentially try to read the result on each 72 clients (for loop from 1 to 72 calling MPI_Recv). I assume that many clients have already sent the result whereas the server has not called the MPI_Rec for the corresponding rank yet. > > What is the best way to do in order to have a working program which also > has a small memory consumption (the speed performance can be lower) ? > I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf, > but without success. > > Thanks in advance for you help. > > Best regards, > > Olivier > > ------------------------------ > > _______________________________________________ > users mailing > listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > [image: Oracle] > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.650.633.7054 > Oracle * - Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >