2008/5/29 Andreas Schäfer <gent...@gmx.de>: > Hi Amy, > > On 16:10 Thu 29 May , Lee Amy wrote: > > MicroTar parallel version was terminated after 463 minutes with following > > error messages: > > ================================================ > > [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430] > > [gnode5:31982] [ 1] microtar(LocateNuclei+0x137) [0x403037] > > [gnode5:31982] [ 2] microtar(main+0x4ac) [0x40431c] > > [gnode5:31982] [ 3] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x3453b1c3fb] > > [gnode5:31982] [ 4] microtar [0x402e6a] > > [gnode5:31982] *** End of error message *** > > mpirun noticed that job rank 0 with PID 18710 on node gnode1 exited on > > signal 15 (Terminated). > > 19 additional processes aborted (not shown) > > ================================================ > > if I'm not mistaken, signal 15 is SIGTERM, which is sent to processes > to terminate them. To me this sounds like your application is > terminated from an external instance, maybe because your job exceeded > the wall clock time limit of your scheduling system. Does the job > repeatedly fail at the same time? Do shorter jobs finish successfully? > > Just my 0.02 Euros (-8 > > Cheers > -Andreas > > > -- > ============================================ > Andreas Schäfer > Cluster and Metacomputing Working Group > Friedrich-Schiller-Universität Jena, Germany > PGP/GPG key via keyserver > I'm a bright... http://www.the-brights.net > ============================================ > > (\___/) > (+'.'+) > (")_(") > This is Bunny. Copy and paste Bunny into your > signature to help him gain world domination! > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Thank you very much. If I do a shorter job it seems run well. And the job dosen't repeatedly fail at the same time, but it will fail at this error messages. Anyway, I'm not using a scheduling system. So any suggestions?
Thank you again. Regards, Amy