I'm unaware of any OMPI error message like that - might be caused by something in libevent as that could be using epoll, so it could be caused by us. However, I'm a little concerned about the use of the prerelease version of Slurm as we know that PMI is having some problems over there.
So out of curiosity - how was this job launched? Via mpirun or directly using srun? On May 27, 2014, at 1:22 AM, Filippo Spiga <spiga.fili...@gmail.com> wrote: > Dear all, > > I am using Open MPI v1.8.2 night snapshot compiled with SLURM support > (version 14.03pre5). These two messages below appeared during a job of 2048 > MPI that died after 24 hours! > > [warn] Epoll ADD(1) on fd 0 failed. Old events were 0; read change was 1 > (add); write change was 0 (none): Operation not permitted > > [warn] Epoll ADD(4) on fd 2 failed. Old events were 0; read change was 0 > (none); write change was 1 (add): Operation not permitted > > > The first one, appeared immediately at the beginning had no effect. The > application started to compute and it successfully called a big parallel > eigensolver. The second message appeared after 18~19 hours of non-stop > computation and the application crashed without showing any other error > message! Regularly I was checking that MPI processes were not stuck, after > this message the processes were all aborted without dumping anything on > stdout/stderr. It is quite weird. > > I believe these messages come from Open MPI (but correct me if I am wrong!). > I am going to look at the application and the various libraries to find out > if something is wrong. In the meanwhile it will be a great help if anyone can > clarify the exact meaning of these warning messages. > > Many thanks in advance. > > Regards, > Filippo > > -- > Mr. Filippo SPIGA, M.Sc. > http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga > > «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert > > ***** > Disclaimer: "Please note this message and any attachments are CONFIDENTIAL > and may be privileged or otherwise protected from disclosure. The contents > are not to be disclosed to anyone other than the addressee. Unauthorized > recipients are requested to preserve this confidentiality and to advise the > sender immediately of any error in transmission." > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users