Two things you can try: 1) Run the qmaster under a debugger by setting $SGE_ND, and send us a backtrace of the crash.
2) Try the qmaster binary in a newer release (you don't need to upgrade other parts of your cluster, and don't need to drain the jobs), and if it really is the job report issue, then the newer qmaster should be able to handle the job reports without crashing: http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz Of course, you can compile from source if you want: http://gridscheduler.sourceforge.net/ Rayson On Fri, Aug 31, 2012 at 3:20 PM, Bob Tupper <[email protected]> wrote: > Thanks for your help. > I do have PE defined. But it crashes with just a simple job that just > sleeps. > Crashes every time. > -Bob > > > > On 08/31/2012 11:59 AM, Rayson Ho wrote: >> >> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5 >> can cause the qmaster to seg fault when it receives the job reports >> from parallel jobs. >> >> Rayson >> >> >> >> On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]> wrote: >>> >>> Greetings, >>> >>> Hope someone can help me out. >>> I have a 6.2u5 install on centos 5.x >>> >>> Last night the power company shut us down. >>> This morning I can not get sge_master daemon to say running. >>> If i disable all the queues or shutdown all the executable host daemons >>> so >>> jobs can not run, it will stay up. >>> >>> As soon as i enable and a job attempts to run, the sge_master daemon >>> crashes. Sometimes the job sends an email error, often not, but it >>> always >>> segfaults. >>> >>> I restored from backup and have the same problem. >>> >>> I have a shadow master and it crashes on both the main and backup >>> masters. >>> >>> Im at a loss. Any help would be most appreciated. >>> >>> Thanks >>> -Bob >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
