Hello, I'm playing with a copy of svn7132 that built and installed just fine. At first everything seemed ok, unlike earlier it now runs on mvapi automagically :-)
But then a small testprogram failed and then another. After scratching my head a while I realised that the pattern was that as soon as I had two ranks sharing one node and used "mpi_leave_pinned 1" it broke... (segfaulted) Here is a bidirect point-to-point running two ranks on the same host (this one actually starts but segfaults half way through): NODEFILE is "n50 n50" [cap@n50 mpi]$ mpirun --machinefile $PBS_NODEFILE --mca mpi_leave_pinned 1 --np 2 mpibibench.ompi7132 Using Zero pattern. starting _bidirect_ lat-bw test. Latency: 1.8 µsec (total)Bandwidth: 0.0 bytes/s (0 x 10000) Latency: 2.0 µsec (total)Bandwidth: 1.0 Mbytes/s (1 x 10000) Latency: 2.0 µsec (total)Bandwidth: 2.0 Mbytes/s (2 x 10000) Latency: 1.9 µsec (total)Bandwidth: 4.2 Mbytes/s (4 x 10000) Latency: 2.0 µsec (total)Bandwidth: 8.1 Mbytes/s (8 x 10000) Latency: 2.2 µsec (total)Bandwidth: 14.8 Mbytes/s (16 x 10000) Latency: 2.0 µsec (total)Bandwidth: 31.7 Mbytes/s (32 x 10000) Latency: 2.2 µsec (total)Bandwidth: 57.3 Mbytes/s (64 x 10000) Latency: 2.2 µsec (total)Bandwidth: 114.3 Mbytes/s (128 x 10000) Latency: 2.3 µsec (total)Bandwidth: 224.8 Mbytes/s (256 x 10000) Latency: 2.8 µsec (total)Bandwidth: 369.8 Mbytes/s (512 x 10000) mpirun noticed that job rank 0 with PID 5879 on node "n50" exited on signal 11. 1 additional process aborted (not shown) from dmesg: mpibibench.ompi[5879]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000007fbfffe8e8 error 14 running on more than one node seems to die instantly (simple all-to-all app): NODEFILE is "n50 n50 n49 n49" [cap@n50 mpi]$ mpirun --machinefile $PBS_NODEFILE --mca mpi_leave_pinned 1 --np 4 alltoall.ompi7132 mpirun noticed that job rank 3 with PID 27857 on node "n49" exited on signal 11. 3 additional processes aborted (not shown) and with similar segfault on dmesg Either running with one proc per node or skipping mpi_leave_pinned makes it work 100% Is this expected? tia, Peter System config: OS: centos-4.1 x86_64 2.6.9-11smp (el4u1) ompi: svn7132 vpath build with recommended libtool/autoconf/automake compilers: 64-bit icc/ifort 8.1-029 configure: ./configure --prefix=xxx --with-btl-mvapi=yyy --disable-cxx --disable-f90 --disable-io-romio -- ------------------------------------------------------------ Peter Kjellström | National Supercomputer Centre | Sweden | http://www.nsc.liu.se
pgpg3mftJmPTK.pgp
Description: PGP signature