Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5
can cause the qmaster to seg fault when it receives the job reports
from parallel jobs.

Rayson



On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]> wrote:
> Greetings,
>
> Hope someone can help me out.
> I have a 6.2u5 install on centos 5.x
>
> Last night the power company shut us down.
> This morning I can not get sge_master daemon to say running.
> If i disable all the queues or shutdown all the executable host daemons so
> jobs can not run,  it will stay up.
>
> As soon as i enable and a job attempts to run,  the sge_master daemon
> crashes.   Sometimes the job sends an email error, often not, but it always
> segfaults.
>
> I restored from backup and have the same problem.
>
> I have a shadow master and it crashes on both the main and backup masters.
>
> Im at a loss.   Any help would be most appreciated.
>
> Thanks
> -Bob
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to