Hello all, Hopefully, someone may have a plausible explanation as to why we're seeing some time stamp inconsistencies within our ARCO db (postgres).
It has been discovered that some j_id's within arco/sge_job are reporting a 'j_submission_time' of 1969-12-31 19:00:00 while other j_id's for the same job are reporting expected and current values. Please see a sanitized output below with the UNIX epoch values: j_id | j_job_number | j_task_number | j_pe_taskid | j_job_name | j_group | j_owner | j_account | j_priority | j_submission_time | j_project | j_department ---------+--------------+---------------+---------------+------------+---------+----------+-----------+------------+---------------------+-----------+-------------- 7356583 | 2906834 | -1 | 1.host-1 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356104 | 2906834 | -1 | 1.host-25 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356103 | 2906834 | -1 | 1.host-29 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356101 | 2906834 | -1 | 1.host-21 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356096 | 2906834 | -1 | 1.host-27 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356062 | 2906834 | -1 | 1.host-8 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm 7356052 | 2906834 | -1 | 1.host-3 | Re95064 | user | user | sge | 0 | 1969-12-31 19:00:00 | NONE | cefm Please see a sanitized output from the same job with expected time stamps: j_id | j_job_number | j_task_number | j_pe_taskid | j_job_name | j_group | j_owner | j_account | j_priority | j_submission_time | j_project | j_department ---------+--------------+---------------+---------------+------------+---------+----------+-----------+------------+---------------------+-----------+-------------- 7395559 | 2906834 | -1 | 1.host-3 | Re95064 | user | user | sge | 5 | 2014-09-29 11:45:09 | NONE | cefm 7395560 | 2906834 | -1 | 1.host-8 | Re95064 | user | user | sge | 5 | 2014-09-29 11:45:09 | NONE | cefm 7395561 | 2906834 | -1 | 1.host-27 | Re95064 | user | user | sge | 5 | 2014-09-29 11:45:09 | NONE | cefm 7395562 | 2906834 | -1 | 1.host-1 | Re95064 | user | user | sge | 5 | 2014-09-29 11:45:09 | NONE | cefm 7395563 | 2906834 | -1 | 1.host-21 | Re95064 | user | user | sge | 5 | 2014-09-29 11:45:09 | NONE | cefm Initially, I was thinking it had to do with either the time being out of sync or with sge_execd being restarted on the hosts in question. However, I did some testing and found that those were just red herrings. I checked qmaster and it has been stable for quite some time in terms of clock sync and uptime of the qmaster process. A site (https://www.gc3.uzh.ch/blog/GridEngine_accounting_queries_with_PostgreSQL/) suggested that jobs which have failed may manifest the NULL time stamp value, but my tests deliberately failed by wallclock and calling commands which didn't exist. Has anyone else seen this type of time stamp inconsistency in their ARCO installations before? If so, does anyone have a plausible idea as to why it happens? Thank you, John DeSantis _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users