Ah, good. On the setup that fails, could you use gdb to find the line number where it is dividing by zero? It could be an uninitialized variable that gcc inits one way and icc inits another.
On May 27, 2014, at 4:49 AM, Alain Miniussi <alain.miniu...@oca.eu> wrote: > So it's working with a gcc compiled openmpi: > > [alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme > gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath > -Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags > -L/softs/openmpi-1.8.1-gnu447/lib -lmpi > (reverse-i-search)`mpicc': ^Cicc --showme:compile > [alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme > gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath > -Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags > -L/softs/openmpi-1.8.1-gnu447/lib -lmpi > [alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc ./test.c > [alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpiexec -n 2 ./a.out > [alainm@gurney mpi]$ ldd ./a.out > linux-vdso.so.1 => (0x00007fffb47ff000) > libmpi.so.1 => /softs/openmpi-1.8.1-gnu447/lib/libmpi.so.1 > (0x00002aaee80c1000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003bd9e00000) > libc.so.6 => /lib64/libc.so.6 (0x0000003bd9200000) > libopen-rte.so.7 => /softs/openmpi-1.8.1-gnu447/lib/libopen-rte.so.7 > (0x00002aaee83b8000) > libopen-pal.so.6 => /softs/openmpi-1.8.1-gnu447/lib/libopen-pal.so.6 > (0x00002aaee8630000) > libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003bd9600000) > libdl.so.2 => /lib64/libdl.so.2 (0x00002aaee8904000) > librt.so.1 => /lib64/librt.so.1 (0x0000003bda600000) > libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003beb000000) > libutil.so.1 => /lib64/libutil.so.1 (0x0000003bea000000) > libm.so.6 => /lib64/libm.so.6 (0x0000003bd9a00000) > /lib64/ld-linux-x86-64.so.2 (0x0000003bd8e00000) > [alainm@gurney mpi]$ ./a.out > [alainm@gurney mpi]$ > > So it seems to be specific to Intel's compiler. > > > On 26/05/2014 17:35, Ralph Castain wrote: >> If you wouldn't mind, yes - let's see if it is a problem with icc. We know >> some versions have bugs, though this may not be the issue here >> >> On May 26, 2014, at 7:39 AM, Alain Miniussi <alain.miniu...@oca.eu> wrote: >> >>> Hi, >>> >>> Did that too, with the same result: >>> >>> [alainm@tagir mpi]$ mpirun -n 1 ./a.out >>> [tagir:05123] *** Process received signal *** >>> [tagir:05123] Signal: Floating point exception (8) >>> [tagir:05123] Signal code: Integer divide-by-zero (1) >>> [tagir:05123] Failing at address: 0x2adb507b3d9f >>> [tagir:05123] [ 0] /lib64/libpthread.so.0[0x30f920f710] >>> [tagir:05123] [ 1] >>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0xe9f)[0x2adb507b3d9f] >>> [tagir:05123] [ 2] >>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_bml_r2.so(+0x1481)[0x2adb505a7481] >>> [tagir:05123] [ 3] >>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xa8)[0x2adb51af02f8] >>> [tagir:05123] [ 4] >>> /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(ompi_mpi_init+0x9f6)[0x2adb4b78b236] >>> [tagir:05123] [ 5] >>> /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(MPI_Init+0xef)[0x2adb4b7ad74f] >>> [tagir:05123] [ 6] ./a.out[0x400dd1] >>> [tagir:05123] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30f8a1ed1d] >>> [tagir:05123] [ 8] ./a.out[0x400cc9] >>> [tagir:05123] *** End of error message *** >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 0 with PID 5123 on node tagir exited on >>> signal 13 (Broken pipe). >>> -------------------------------------------------------------------------- >>> [alainm@tagir mpi]$ >>> >>> >>> do you want me to try a gcc build ? >>> >>> Alain >>> >>> On 26/05/2014 16:09, Ralph Castain wrote: >>>> Strange - I note that you are running these as singletons. Can you try >>>> running it under mpirun? >>>> >>>> mpirun -n 1 ./a.out >>>> >>>> just to see if it is the singleton that is causing the problem, or >>>> something in the openib btl itself. >>>> >>>> >>>> On May 26, 2014, at 6:59 AM, Alain Miniussi <alain.miniu...@oca.eu> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a failure with the following minimalistic testcase: >>>>> $: more ./test.c >>>>> #include "mpi.h" >>>>> >>>>> int main(int argc, char* argv[]) { >>>>> MPI_Init(&argc,&argv); >>>>> MPI_Finalize(); >>>>> return 0; >>>>> } >>>>> $: mpicc -v >>>>> icc version 13.1.1 (gcc version 4.4.7 compatibility) >>>>> $: mpicc ./test.c >>>>> $: ./a.out >>>>> [tagir:02855] *** Process received signal *** >>>>> [tagir:02855] Signal: Floating point exception (8) >>>>> [tagir:02855] Signal code: Integer divide-by-zero (1) >>>>> [tagir:02855] Failing at address: 0x2aef6e5b2d9f >>>>> [tagir:02855] [ 0] /lib64/libpthread.so.0[0x30f920f710] >>>>> [tagir:02855] [ 1] >>>>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0xe9f)[0x2aef6e5b2d9f] >>>>> [tagir:02855] [ 2] >>>>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_bml_r2.so(+0x1481)[0x2aef6e3a6481] >>>>> [tagir:02855] [ 3] >>>>> /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xa8)[0x2aef6f8ef2f8] >>>>> [tagir:02855] [ 4] >>>>> /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(ompi_mpi_init+0x9f6)[0x2aef69572236] >>>>> [tagir:02855] [ 5] >>>>> /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(MPI_Init+0xef)[0x2aef6959474f] >>>>> [tagir:02855] [ 6] ./a.out[0x400dd1] >>>>> [tagir:02855] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30f8a1ed1d] >>>>> [tagir:02855] [ 8] ./a.out[0x400cc9] >>>>> [tagir:02855] *** End of error message *** >>>>> $: >>>>> >>>>> Versions info: >>>>> $: mpicc -v >>>>> icc version 13.1.1 (gcc version 4.4.7 compatibility) >>>>> $: ldd ./a.out >>>>> linux-vdso.so.1 => (0x00007fffbb197000) >>>>> libmpi.so.1 => /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1 >>>>> (0x00002b20262ee000) >>>>> libm.so.6 => /lib64/libm.so.6 (0x00000030f8e00000) >>>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000030ff200000) >>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000030f9200000) >>>>> libc.so.6 => /lib64/libc.so.6 (0x00000030f8a00000) >>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00000030f9600000) >>>>> libopen-rte.so.7 => /softs/openmpi-1.8.1-intel13/lib/libopen-rte.so.7 >>>>> (0x00002b202660d000) >>>>> libopen-pal.so.6 => /softs/openmpi-1.8.1-intel13/lib/libopen-pal.so.6 >>>>> (0x00002b20268a1000) >>>>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00002b2026ba6000) >>>>> librt.so.1 => /lib64/librt.so.1 (0x00000030f9e00000) >>>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003109800000) >>>>> libutil.so.1 => /lib64/libutil.so.1 (0x000000310aa00000) >>>>> libimf.so => >>>>> /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libimf.so >>>>> (0x00002b2026db0000) >>>>> libsvml.so => >>>>> /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libsvml.so >>>>> (0x00002b202726d000) >>>>> libirng.so => >>>>> /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirng.so >>>>> (0x00002b2027c37000) >>>>> libintlc.so.5 => >>>>> /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 >>>>> (0x00002b2027e3e000) >>>>> /lib64/ld-linux-x86-64.so.2 (0x00000030f8600000) >>>>> $: >>>>> >>>>> I tried to goole the issue, and saw something regarding an old >>>>> vectorization bug with intel compiler, but that was a lonng time ago and >>>>> seemed to be fixed for 1.6.x. >>>>> Also, "make check" went fine ??? >>>>> >>>>> Any idea ? >>>>> >>>>> Cheers >>>>> >>>>> -- >>>>> --- >>>>> Alain >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -- >>> --- >>> Alain >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > --- > Alain > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users