Oh, and when it crashes, eg.:

Program received signal SIGSEGV, Segmentation fault.
...
(gdb) where

You will then see the stack trace.

Rayson



On Fri, Aug 31, 2012 at 3:36 PM, Rayson Ho <[email protected]> wrote:
> - Set SGE_ND in the env
> - At the shell, gdb sge_qmaster , and then "r".
>
> Rayson
>
>
>
>
> On Fri, Aug 31, 2012 at 3:33 PM, Bob Tupper <[email protected]> wrote:
>> Can you please explain in more detail how to launch with the debugger
>> enabled?
>>
>> Thanks
>>
>>
>> On 08/31/2012 12:25 PM, Rayson Ho wrote:
>>>
>>> Two things you can try:
>>>
>>> 1) Run the qmaster under a debugger by setting $SGE_ND, and send us a
>>> backtrace of the crash.
>>>
>>> 2) Try the qmaster binary in a newer release (you don't need to
>>> upgrade other parts of your cluster, and don't need to drain the
>>> jobs), and if it really is the job report issue, then the newer
>>> qmaster should be able to handle the job reports without crashing:
>>>
>>> http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz
>>>
>>> Of course, you can compile from source if you want:
>>> http://gridscheduler.sourceforge.net/
>>>
>>> Rayson
>>>
>>>
>>>
>>> On Fri, Aug 31, 2012 at 3:20 PM, Bob Tupper <[email protected]> wrote:
>>>>
>>>> Thanks for your help.
>>>> I do have PE defined.  But it crashes with just a simple job that just
>>>> sleeps.
>>>> Crashes every time.
>>>> -Bob
>>>>
>>>>
>>>>
>>>> On 08/31/2012 11:59 AM, Rayson Ho wrote:
>>>>>
>>>>> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5
>>>>> can cause the qmaster to seg fault when it receives the job reports
>>>>> from parallel jobs.
>>>>>
>>>>> Rayson
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> Hope someone can help me out.
>>>>>> I have a 6.2u5 install on centos 5.x
>>>>>>
>>>>>> Last night the power company shut us down.
>>>>>> This morning I can not get sge_master daemon to say running.
>>>>>> If i disable all the queues or shutdown all the executable host daemons
>>>>>> so
>>>>>> jobs can not run,  it will stay up.
>>>>>>
>>>>>> As soon as i enable and a job attempts to run,  the sge_master daemon
>>>>>> crashes.   Sometimes the job sends an email error, often not, but it
>>>>>> always
>>>>>> segfaults.
>>>>>>
>>>>>> I restored from backup and have the same problem.
>>>>>>
>>>>>> I have a shadow master and it crashes on both the main and backup
>>>>>> masters.
>>>>>>
>>>>>> Im at a loss.   Any help would be most appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> -Bob
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>
>>>>
>>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to