Dear *,
I'm currently testing OpenMPI 1.0a1r7026 on a Linux 2.6.6 32-node Dual-Athlon cluster with Myrinet (GM 2.1.1 on M3M-PCI64C boards). gcc is 3.3.3. 4GB RAM per node. Compilation from the snapshot and startup went fine, congratulations. Surely not trivial. Point-to-point tests (mpptest) pass. However, running a rather simple benchmark to test the performance of collective operations (not PMB, but a custom one) seems to deadlock. So far, I could figure out: - using btl 'gm' (default) o 16 processes on 8 nodes: "deadlock" in Allreduce o 2 processes on 2 nodes: "deadlock" in Reduce_scatter - explicitely using btl 'tcp' o 2 processes on 2 nodes: "deadlock" in Reduce_scatter Additionally, I sporadically receive SEGV's using gm: Core was generated by `collmeas_open-mpi'. Program terminated with signal 11, Segmentation fault. (gdb) bt #0 0x00000000 in ?? () #1 0x4006d04c in mca_mpool_base_registration_destructor () from /home/joachim/local/open-mpi/lib/libmpi.so.0 #2 0x40179a0c in mca_mpool_gm_free () from /home/joachim/local/open-mpi//lib/openmpi/mca_mpool_gm.so #3 0x4006cf9c in mca_mpool_base_free () from /home/joachim/local/open-mpi/lib/libmpi.so.0 #4 0x4004efbc in PMPI_Free_mem () from /home/joachim/local/open-mpi/lib/libmpi.so.0 #5 0x0804b1c9 in main () Sometimes, this seems to happen when aborting an application (via CTRL-C to mpirun): Core was generated by `collmeas_open-mpi'. Program terminated with signal 11, Segmentation fault. (gdb) bt #0 0x401d0633 in mca_btl_tcp_proc_remove () from /home/joachim/local/open-mpi//lib/openmpi/mca_btl_tcp.so Cannot access memory at address 0xbfffe2bc Of course, I'm not sure if the deadlock really is a deadlock, but the respective tests takes way to much time. Needless to say that other MPI implementations run this benchmark (which we are using for some time on a variety of platforms) reliably on the same machine (MPICH-GM, our own MPI). Any ideas or comments? I will try to run PMB. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de