- Set SGE_ND in the env
- At the shell, gdb sge_qmaster , and then "r".

Rayson




On Fri, Aug 31, 2012 at 3:33 PM, Bob Tupper <[email protected]> wrote:
> Can you please explain in more detail how to launch with the debugger
> enabled?
>
> Thanks
>
>
> On 08/31/2012 12:25 PM, Rayson Ho wrote:
>>
>> Two things you can try:
>>
>> 1) Run the qmaster under a debugger by setting $SGE_ND, and send us a
>> backtrace of the crash.
>>
>> 2) Try the qmaster binary in a newer release (you don't need to
>> upgrade other parts of your cluster, and don't need to drain the
>> jobs), and if it really is the job report issue, then the newer
>> qmaster should be able to handle the job reports without crashing:
>>
>> http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz
>>
>> Of course, you can compile from source if you want:
>> http://gridscheduler.sourceforge.net/
>>
>> Rayson
>>
>>
>>
>> On Fri, Aug 31, 2012 at 3:20 PM, Bob Tupper <[email protected]> wrote:
>>>
>>> Thanks for your help.
>>> I do have PE defined.  But it crashes with just a simple job that just
>>> sleeps.
>>> Crashes every time.
>>> -Bob
>>>
>>>
>>>
>>> On 08/31/2012 11:59 AM, Rayson Ho wrote:
>>>>
>>>> Do you have parallel (or PE) jobs in your cluster?? A bug in SGE 6.2u5
>>>> can cause the qmaster to seg fault when it receives the job reports
>>>> from parallel jobs.
>>>>
>>>> Rayson
>>>>
>>>>
>>>>
>>>> On Fri, Aug 31, 2012 at 2:52 PM, Bob Tupper <[email protected]>
>>>> wrote:
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Hope someone can help me out.
>>>>> I have a 6.2u5 install on centos 5.x
>>>>>
>>>>> Last night the power company shut us down.
>>>>> This morning I can not get sge_master daemon to say running.
>>>>> If i disable all the queues or shutdown all the executable host daemons
>>>>> so
>>>>> jobs can not run,  it will stay up.
>>>>>
>>>>> As soon as i enable and a job attempts to run,  the sge_master daemon
>>>>> crashes.   Sometimes the job sends an email error, often not, but it
>>>>> always
>>>>> segfaults.
>>>>>
>>>>> I restored from backup and have the same problem.
>>>>>
>>>>> I have a shadow master and it crashes on both the main and backup
>>>>> masters.
>>>>>
>>>>> Im at a loss.   Any help would be most appreciated.
>>>>>
>>>>> Thanks
>>>>> -Bob
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>
>>>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to