Hello

About a month ago we recently started seeing duplicate job in SGE.

For example:

sysadmin@panda2[~]$ qacct -j 878815

==============================================================
qname        standard.q
hostname     node127.panda.pbtech
group        abc
owner        developer
project      NONE
department   cmlab.u
jobname old job
jobnumber    878815
taskid       undefined
account      sge
priority     0
qsub_time    Tue Jan 10 11:49:45 2017
start_time   Tue Jan 10 11:51:40 2017
end_time     Tue Jan 10 11:51:40 2017
granted_pe   smp
slots        1
failed       0
exit_status  0
ru_wallclock 0
ru_utime     0.001
ru_stime     0.006
ru_maxrss    1428
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    1254
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   8
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     60
ru_nivcsw    4
cpu          0.007
mem          0.000
io           0.000
iow          0.000
maxvmem      0.000
arid         undefined
==============================================================
qname        standard.q
hostname     node120.panda.pbtech
group        abc
owner        developer
project      NONE
department   cmlab.u
jobname      newjob
jobnumber    878815
taskid       undefined
account      sge
priority     0
qsub_time    Wed Feb  8 12:37:38 2017
start_time   Wed Feb  8 13:20:49 2017
end_time     Wed Feb  8 13:41:01 2017
granted_pe   smp
slots        12
failed       100 : assumedly after job
exit_status  137
ru_wallclock 1212
ru_utime     0.002
ru_stime     0.022
ru_maxrss    1280
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    623
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   8
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     47
ru_nivcsw    2
cpu          13816.930
mem          48585.941
io           34.210
iow          0.000
maxvmem      3.692G
arid         undefined

As you can see the jobs are nearly a month apart.  This does not affect
their ability to complete though it's required that we not have these
duplicates.

Has anyone experienced this issue or have an idea of what could be causing
this behavior?

We are not rotating our accounting logs.

Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Physiology and Biophysics
Weill Cornell Medicine
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to