Gilles Gouaillardet writes:
> Are you sure ulimit -c unlimited is *really* applied on all hosts
>
>
> can you please run the simple program below and confirm that ?
Nothing specifically wrong with that, but it's worth installing
procenv(1) as a general solution to checking the (generalized)
envi
e queues (bytes, -q) 819200
>>> >>> real-time priority (-r) 0
>>> >>> stack size (kbytes, -s) 8192
>>> >>> cpu time (seconds, -t) unlimited
>>> >>> max user processes (-u) 4096
>&g
;> Gilles
>> >>>>
>> >>>>
>> >>>> #include
>> >>>> #include
>> >>>> #include
>> >>>> #include
>> >>>>
>> >>>> int main(int argc, char *argv[]) {
>
gt;>>
>>>> And the strange thing is, on the cluster where regular core
dump is
>>>> happening, the output of
>>>> $ ompi_info | grep backtrace
>>>> is identical to the above. (Which kind of makes sense because
they were
>>>&g
nal
> >>>>MCA opal base: parameter "opal_signal" (current value:
> >>>> "6,7,8,11", data source: default, level: 3 user/all, type: string)
> >>>> [durga@smallMPI git]$
> >>>>
> >>>>
> >>
the 'SIGTERM' part later, just in case it would make a
>>>> difference; i didn't)
>>>>
>>>> The resulting code still generates .btr files instead of core files.
>>>>
>>>> It looks like the 'execinfo' MCA component is being used as the
>>>> backtrace mechanism:
>
advises you to eat right, exercise regularly and
>>> quit ageing.
>>>
>>> On Wed, May 11, 2016 at 3:37 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Durga,
>>>>
>>>> you might wanna tr
ile with a back trace "ala" gdb
>>>
>>>
>>>
>>> Nathan,
>>>
>>> i did a 'grep btr' and could not find anything :-(
>>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
>>> so i am puzzle
btr' and could not find anything :-(
>>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
>>> so i am puzzled who creates the tracefile name and where ...
>>> also, no stack is printed by default unless opal_abort_print_stack is
>>> true
ay i have
>> found to prevent it is to restore the default signal
handlers after
>> MPI_Init.
>>
>> Excuse the quoting style. Good sucks.
>>
>>
>>
is a plain text file with a back trace "ala" gdb
>>>
>>>
>>>
>>> Nathan,
>>>
>>> i did a 'grep btr' and could not find anything :-(
>>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
>>> so i am puzzled who creates the tracefile n
anything :-(
>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
>> so i am puzzled who creates the tracefile name and where ...
>> also, no stack is printed by default unless opal_abort_print_stack is true
>>
>> Cheers,
>>
>> Gill
running on a
>> > different machine shows different behaviour.
>> >
>> > Best regards
>> > Durga
>> >
>> > The surgeon general advises you to eat right, exercise regularly and
>> quit
>> > ageing.
>> >
>> &g
hanism.
I think we
>> should revisit it at some point but for now the only effective
way i have
>> found to prevent it is to restore the default signal handlers after
>> MPI_Init.
>>
>> Excuse the quoting style. Good sucks.
>>
&g
it at some point but for now the only effective way i
> have
> >> found to prevent it is to restore the default signal handlers after
> >> MPI_Init.
> >>
> >> Excuse the quoting style. Good sucks.
> >>
> >>
> >> _
ing style. Good sucks.
>>
>>
>> ____________________
>> From: users on behalf of dpchoudh .
>> Sent: Monday, May 09, 2016 2:59:37 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] No core dump in some cases
>>
>> Hi Gus
>
From: users on behalf of dpchoudh .
> Sent: Monday, May 09, 2016 2:59:37 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] No core dump in some cases
>
> Hi Gus
>
> Thanks for your suggestion. But I am not using any resource manager (i.e.
> I am launching mpirun from the bash she
On 05/09/2016 04:59 PM, dpchoudh . wrote:
Hi Gus
Thanks for your suggestion. But I am not using any resource manager
(i.e. I am launching mpirun from the bash shell.). In fact, both of the
two clusters I talked about run CentOS 7 and I launch the job the same
way on both of these, yet one of the
sucks.
From: users on behalf of dpchoudh .
Sent: Monday, May 09, 2016 2:59:37 PM
To: Open MPI Users
Subject: Re: [OMPI users] No core dump in some cases
Hi Gus
Thanks for your suggestion. But I am not using any resource manager (i.e. I am
launching mpirun from the bash shell.). In
Hi Gus
Thanks for your suggestion. But I am not using any resource manager (i.e. I
am launching mpirun from the bash shell.). In fact, both of the two
clusters I talked about run CentOS 7 and I launch the job the same way on
both of these, yet one of them creates standard core files and the other
Hi Durga
Just in case ...
If you're using a resource manager to start the jobs (Torque, etc),
you need to have them set the limits (for coredump size, stacksize,
locked memory size, etc).
This way the jobs will inherit the limits from the
resource manager daemon.
On Torque (which I use) I do th
I'm afraid I don't know what a .btr file is -- that is not something that is
controlled by Open MPI.
You might want to look into your OS settings to see if it has some kind of
alternate corefile mechanism...?
> On May 6, 2016, at 8:58 PM, dpchoudh . wrote:
>
> Hello all
>
> I run MPI jobs (
22 matches
Mail list logo