Oh, and when it crashes, eg.: Program received signal SIGSEGV, Segmentation fault. ... (gdb) where
You will then see the stack trace. Rayson On Fri, Aug 31, 2012 at 3:36 PM, Rayson Ho <[email protected]> wrote: > - Set SGE_ND in the env > - At the shell, gdb sge_qmaster , and then "r". > > Rayson > > > > > On Fri, Aug 31, 2012 at 3:33 PM, Bob Tupper <[email protected]> wrote: >> Can you please explain in more detail how to launch with the debugger >> enabled? >> >> Thanks >> >> >> On 08/31/2012 12:25 PM, Rayson Ho wrote: >>> >>> Two things you can try: >>> >>> 1) Run the qmaster under a debugger by setting $SGE_ND, and send us a >>> backtrace of the crash. >>> >>> 2) Try the qmaster binary in a newer release (you don't need to >>> upgrade other parts of your cluster, and don't need to drain the >>> jobs), and if it really is the job report issue, then the newer >>> qmaster should be able to handle the job reports without crashing: >>> >>> http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz >>> >>> Of course, you can compile from source if you want: >>> http://gridscheduler.sourceforge.net/ >>> >>> Rayson >>> >>> >>> >>> On Fri, Aug 31, 2012 at 3:20 PM, Bob Tupper <[email protected]> wrote: >>>> >>>> Thanks for your help. >>>> I do have PE defined. But it crashes with just a simple job that just >>>> sleeps. >>>> Crashes every time. >>>> -Bob >>>> >>>> >>>> >>>> On 08/31/2012 11:59 AM, Rayson Ho wrote: >>>>> >>>>> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5 >>>>> can cause the qmaster to seg fault when it receives the job reports >>>>> from parallel jobs. >>>>> >>>>> Rayson >>>>> >>>>> >>>>> >>>>> On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]> >>>>> wrote: >>>>>> >>>>>> Greetings, >>>>>> >>>>>> Hope someone can help me out. >>>>>> I have a 6.2u5 install on centos 5.x >>>>>> >>>>>> Last night the power company shut us down. >>>>>> This morning I can not get sge_master daemon to say running. >>>>>> If i disable all the queues or shutdown all the executable host daemons >>>>>> so >>>>>> jobs can not run, it will stay up. >>>>>> >>>>>> As soon as i enable and a job attempts to run, the sge_master daemon >>>>>> crashes. Sometimes the job sends an email error, often not, but it >>>>>> always >>>>>> segfaults. >>>>>> >>>>>> I restored from backup and have the same problem. >>>>>> >>>>>> I have a shadow master and it crashes on both the main and backup >>>>>> masters. >>>>>> >>>>>> Im at a loss. Any help would be most appreciated. >>>>>> >>>>>> Thanks >>>>>> -Bob >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> [email protected] >>>>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >> > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
