On May 11, 2010, at 9:18 , Gijsbert Wiesenekker wrote: > An OpenMPI program of mine that uses MPI_Isend and MPI_Irecv crashes after > some non-reproducible time my Fedora Linux kernel (invalid opcode), which > makes it hard to debug (there is no trace, even with the debug kernel, and if > I run it under valgrind it does not crash). > My guess is that the kernel crash is caused by OpenMPI running out if memory > because too many MPI_Irecv messages have been sent but not been processed yet. > My questions are: > What does the OpenMPI specification say about the behaviour of MPI_Isend when > many messages have been sent but have not been processed yet? Will it fail? > Will it block until more memory becomes available (I hope not, because this > would cause my program to deadlock)? > Ideally I would like to check how many MPI_Isend messages have not been > processed yet, so that I can stop sending messages if there are 'too many' > waiting. Is there a way to do this? > > Regards, > Gijsbert >
I want to let you know that this crash (you get invalid opcode: 0000 [1] SMP painted on your screen) is specific for Fedora 12 kernel version 2.6.32.11-99.fc12.x86_64, OpenMPI 1.4.2, a lot of MPI_Isend and MPI_Irecv calls and perhaps my hardware. The same code on CentOS 5.4 kernel version 2.6.18-164.15.1.el5 runs fine. Gijsbert