Hi Amy,

On 16:10 Thu 29 May     , Lee Amy wrote:
> MicroTar parallel version was terminated after 463 minutes with following
> error messages:
> ================================================
> [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430]
> [gnode5:31982] [ 1] microtar(LocateNuclei+0x137) [0x403037]
> [gnode5:31982] [ 2] microtar(main+0x4ac) [0x40431c]
> [gnode5:31982] [ 3] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x3453b1c3fb]
> [gnode5:31982] [ 4] microtar [0x402e6a]
> [gnode5:31982] *** End of error message ***
> mpirun noticed that job rank 0 with PID 18710 on node gnode1 exited on
> signal 15 (Terminated).
> 19 additional processes aborted (not shown)
> ================================================

if I'm not mistaken, signal 15 is SIGTERM, which is sent to processes
to terminate them. To me this sounds like your application is
terminated from an external instance, maybe because your job exceeded
the wall clock time limit of your scheduling system. Does the job
repeatedly fail at the same time? Do shorter jobs finish successfully?

Just my 0.02 Euros (-8

Cheers
-Andreas


-- 
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!

Attachment: pgp8TQOHKBqEK.pgp
Description: PGP signature

Reply via email to