On May 11, 2010, at 9:18 , Gijsbert Wiesenekker wrote:

> An OpenMPI program of mine that uses MPI_Isend and MPI_Irecv crashes after 
> some non-reproducible time my Fedora Linux kernel (invalid opcode), which 
> makes it hard to debug (there is no trace, even with the debug kernel, and if 
> I run it under valgrind it does not crash).
> My guess is that the kernel crash is caused by OpenMPI running out if memory 
> because too many MPI_Irecv messages have been sent but not been processed yet.
> My questions are:
> What does the OpenMPI specification say about the behaviour of MPI_Isend when 
> many messages have been sent but have not been processed yet? Will it fail? 
> Will it block until more memory becomes available (I hope not, because this 
> would cause my program to deadlock)?
> Ideally I would like to check how many MPI_Isend messages have not been 
> processed yet, so that I can stop sending messages if there are 'too many' 
> waiting. Is there a way to do this?
> 
> Regards,
> Gijsbert
> 

I want to let you know that this crash (you get invalid opcode: 0000 [1] SMP 
painted on your screen) is specific for Fedora 12 kernel version 
2.6.32.11-99.fc12.x86_64, OpenMPI 1.4.2, a lot of MPI_Isend and MPI_Irecv calls 
and perhaps my hardware. The same code on CentOS 5.4 kernel version 
2.6.18-164.15.1.el5 runs fine.

Gijsbert

Reply via email to