Hello, Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I compiled with : $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous --enable-cxx-exceptions --enable-shared --enable-orterun-prefix-by-default $ make all install
I attach the output of ompi_info of my 2 machines. TMHieu On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Did you configure Open MPI with --enable-heterogeneous? > > On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote: > >> Hello, >> >> I have some problems running MPI on my heterogeneous cluster. More >> precisley i got segmentation fault when sending a large array (about >> 10000) of double from a i686 machine to a x86_64 machine. It does not >> happen with small array. Here is the send/recv code source (complete >> source is in attached file) : >> ========code ================ >> if (me == 0 ) { >> for (int pe=1; pe<nprocs; pe++) >> { >> printf("Receiving from proc %d : ",pe); fflush(stdout); >> d=(double *)malloc(sizeof(double)*n); >> MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,&status); >> printf("OK\n"); fflush(stdout); >> } >> printf("All done.\n"); >> } >> else { >> d=(double *)malloc(sizeof(double)*n); >> MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD); >> } >> ======== code ================ >> >> I got segmentation fault with n=10000 but no error with n=1000 >> I have 2 machines : >> sbtn155 : Intel Xeon, x86_64 >> sbtn211 : Intel Pentium 4, i686 >> >> The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1, >> installed in /tmp/openmpi : >> [mhtrinh@sbtn211 heterogenous]$ make hetero >> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o >> hetero.i686.o >> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include >> hetero.i686.o -o hetero.i686 -lm >> >> [mhtrinh@sbtn155 heterogenous]$ make hetero >> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o >> hetero.x86_64.o >> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include >> hetero.x86_64.o -o hetero.x86_64 -lm >> >> I run with the code using appfile and got thoses error : >> $ cat appfile >> --host sbtn155 -np 1 hetero.x86_64 >> --host sbtn155 -np 1 hetero.x86_64 >> --host sbtn211 -np 1 hetero.i686 >> >> $ mpirun -hetero --app appfile >> Input array length : >> 10000 >> Receiving from proc 1 : OK >> Receiving from proc 2 : [sbtn155:26386] *** Process received signal *** >> [sbtn155:26386] Signal: Segmentation fault (11) >> [sbtn155:26386] Signal code: Address not mapped (1) >> [sbtn155:26386] Failing at address: 0x200627bd8 >> [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540] >> [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d7908] >> [sbtn155:26386] [ 2] /tmp/openmpi/lib/openmpi/mca_btl_tcp.so [0x2aaaae2fc6e3] >> [sbtn155:26386] [ 3] /tmp/openmpi/lib/libopen-pal.so.0 [0x2aaaaafe39db] >> [sbtn155:26386] [ 4] >> /tmp/openmpi/lib/libopen-pal.so.0(opal_progress+0x9e) [0x2aaaaafd8b9e] >> [sbtn155:26386] [ 5] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d4b25] >> [sbtn155:26386] [ 6] /tmp/openmpi/lib/libmpi.so.0(MPI_Recv+0x13b) >> [0x2aaaaab30f9b] >> [sbtn155:26386] [ 7] hetero.x86_64(main+0xde) [0x400cbe] >> [sbtn155:26386] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3fa421e074] >> [sbtn155:26386] [ 9] hetero.x86_64 [0x400b29] >> [sbtn155:26386] *** End of error message *** >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 0 with PID 26386 on node sbtn155 >> exited on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> >> Am I missing an option in order to run in heterogenous cluster ? >> MPI_Send/Recv have limit array size when using heterogeneous cluster ? >> Thanks for your help. Regards >> >> -- >> ============================================ >> M. TRINH Minh Hieu >> CEA, IBEB, SBTN/LIRM, >> F-30207 Bagnols-sur-Cèze, FRANCE >> ============================================ >> >> <hetero.c.bz2>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
ompi_info.x86_64
Description: Binary data
ompi_info.i686
Description: Binary data