Olivier Riff wrote:
Hello,
I assume this question has been already discussed many times, but I
can not find on Internet a solution to my problem.
It is about buffer size limit of MPI_Send and MPI_Recv with
heterogeneous system (32 bit laptop / 64 bit cluster).
My configuration is :
open mpi 1.4, configured with: --without-openib --enable-heterogeneous
--enable-mpi-threads
Program is launched a laptop (32 bit Mandriva 2008) which distributes
tasks to do to a cluster of 70 processors (64 bit RedHat Entreprise
distribution):
I have to send various buffer size from few bytes till 30Mo.
You really want to get your program running without the tcp_eager_limit
set if you want a better usage of memory. I believe the crash has
something to do with the rendezvous protocol in OMPI. Have you narrowed
this failure down to a simple MPI program? Also I noticed that you're
configuring with --enable-mpi-threads, have you tried configuring
without that option?
I tested following commands:
1) mpirun -v -machinefile machinefile.txt MyMPIProgram
-> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer
size > 65536.
2) mpirun --mca btl_tcp_eager_limit 30000000 -v -machinefile
machinefile.txt MyMPIProgram
-> works but has the effect of generating gigantic memory consumption
on 32 bit machine side after MPI_Recv. Memory consumption goes from
800Mo to 2,1Go after receiving about 20ko from each 70 clients ( a
total of about 1.4 Mo ). This makes my program crash later because I
have no more memory to allocate new structures. I read in a openmpi
forum thread that setting btl_tcp_eager_limit to a huge value explains
this huge memory consumption when a message sent does not have a
preposted ready recv. Also after all messages have been received and
there is no more traffic activity : the memory consumed remains at
2.1go... and I do not understand why.
Are the 70 clients all on different nodes? I am curious if the 2.1GB is
due to the SM BTL or possibly a leak in the TCP BTL.
What is the best way to do in order to have a working program which
also has a small memory consumption (the speed performance can be lower) ?
I tried to play with mca paramters btl_tcp_sndbuf and mca
btl_tcp_rcvbuf, but without success.
Thanks in advance for you help.
Best regards,
Olivier
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>