I'm unaware of any OMPI error message like that - might be caused by something 
in libevent as that could be using epoll, so it could be caused by us. However, 
I'm a little concerned about the use of the prerelease version of Slurm as we 
know that PMI is having some problems over there.

So out of curiosity - how was this job launched? Via mpirun or directly using 
srun?


On May 27, 2014, at 1:22 AM, Filippo Spiga <spiga.fili...@gmail.com> wrote:

> Dear all,
> 
> I am using Open MPI v1.8.2 night snapshot compiled with SLURM support 
> (version 14.03pre5). These two messages below appeared during a job of 2048 
> MPI that died after 24 hours! 
> 
> [warn] Epoll ADD(1) on fd 0 failed.  Old events were 0; read change was 1 
> (add); write change was 0 (none): Operation not permitted
> 
> [warn] Epoll ADD(4) on fd 2 failed.  Old events were 0; read change was 0 
> (none); write change was 1 (add): Operation not permitted
> 
> 
> The first one, appeared immediately at the beginning had no effect. The 
> application started to compute and it successfully called a big parallel 
> eigensolver. The second message appeared after 18~19 hours of non-stop 
> computation and the application crashed without showing any other error 
> message! Regularly I was checking that MPI processes were not stuck, after 
> this message the processes were all aborted without dumping anything on 
> stdout/stderr. It is quite weird.
> 
> I believe these messages come from Open MPI (but correct me if I am wrong!). 
> I am going to look at the application and the various libraries to find out 
> if something is wrong. In the meanwhile it will be a great help if anyone can 
> clarify the exact meaning of these warning messages.
> 
> Many thanks in advance.
> 
> Regards,
> Filippo
> 
> --
> Mr. Filippo SPIGA, M.Sc.
> http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga
> 
> «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
> 
> *****
> Disclaimer: "Please note this message and any attachments are CONFIDENTIAL 
> and may be privileged or otherwise protected from disclosure. The contents 
> are not to be disclosed to anyone other than the addressee. Unauthorized 
> recipients are requested to preserve this confidentiality and to advise the 
> sender immediately of any error in transmission."
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to