Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5 can cause the qmaster to seg fault when it receives the job reports from parallel jobs.
Rayson On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]> wrote: > Greetings, > > Hope someone can help me out. > I have a 6.2u5 install on centos 5.x > > Last night the power company shut us down. > This morning I can not get sge_master daemon to say running. > If i disable all the queues or shutdown all the executable host daemons so > jobs can not run, it will stay up. > > As soon as i enable and a job attempts to run, the sge_master daemon > crashes. Sometimes the job sends an email error, often not, but it always > segfaults. > > I restored from backup and have the same problem. > > I have a shadow master and it crashes on both the main and backup masters. > > Im at a loss. Any help would be most appreciated. > > Thanks > -Bob > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
