Re: [OMPI users] No core dump in some cases

2016-05-16 Thread Dave Love
Gilles Gouaillardet writes: > Are you sure ulimit -c unlimited is *really* applied on all hosts > > > can you please run the simple program below and confirm that ? Nothing specifically wrong with that, but it's worth installing procenv(1) as a general solution to checking the (generalized) envi

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread Gilles Gouaillardet
e queues (bytes, -q) 819200 >>> >>> real-time priority (-r) 0 >>> >>> stack size (kbytes, -s) 8192 >>> >>> cpu time (seconds, -t) unlimited >>> >>> max user processes (-u) 4096 >&g

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread dpchoudh .
;> Gilles >> >>>> >> >>>> >> >>>> #include >> >>>> #include >> >>>> #include >> >>>> #include >> >>>> >> >>>> int main(int argc, char *argv[]) { >

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread Gilles Gouaillardet
gt;>> >>>> And the strange thing is, on the cluster where regular core dump is >>>> happening, the output of >>>> $ ompi_info | grep backtrace >>>> is identical to the above. (Which kind of makes sense because they were >>>&g

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread dpchoudh .
nal > >>>>MCA opal base: parameter "opal_signal" (current value: > >>>> "6,7,8,11", data source: default, level: 3 user/all, type: string) > >>>> [durga@smallMPI git]$ > >>>> > >>>> > >>

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread Gilles Gouaillardet
the 'SIGTERM' part later, just in case it would make a >>>> difference; i didn't) >>>> >>>> The resulting code still generates .btr files instead of core files. >>>> >>>> It looks like the 'execinfo' MCA component is being used as the >>>> backtrace mechanism: >

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
advises you to eat right, exercise regularly and >>> quit ageing. >>> >>> On Wed, May 11, 2016 at 3:37 AM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> Durga, >>>> >>>> you might wanna tr

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
ile with a back trace "ala" gdb >>> >>> >>> >>> Nathan, >>> >>> i did a 'grep btr' and could not find anything :-( >>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr. >>> so i am puzzle

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
btr' and could not find anything :-( >>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr. >>> so i am puzzled who creates the tracefile name and where ... >>> also, no stack is printed by default unless opal_abort_print_stack is >>> true

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread Gilles Gouaillardet
ay i have >> found to prevent it is to restore the default signal handlers after >> MPI_Init. >> >> Excuse the quoting style. Good sucks. >> >> >>

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
is a plain text file with a back trace "ala" gdb >>> >>> >>> >>> Nathan, >>> >>> i did a 'grep btr' and could not find anything :-( >>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr. >>> so i am puzzled who creates the tracefile n

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread Ralph Castain
anything :-( >> opal_backtrace_buffer and opal_backtrace_print are only used with stderr. >> so i am puzzled who creates the tracefile name and where ... >> also, no stack is printed by default unless opal_abort_print_stack is true >> >> Cheers, >> >> Gill

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
running on a >> > different machine shows different behaviour. >> > >> > Best regards >> > Durga >> > >> > The surgeon general advises you to eat right, exercise regularly and >> quit >> > ageing. >> > >> &g

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread Gilles Gouaillardet
hanism. I think we >> should revisit it at some point but for now the only effective way i have >> found to prevent it is to restore the default signal handlers after >> MPI_Init. >> >> Excuse the quoting style. Good sucks. >> &g

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
it at some point but for now the only effective way i > have > >> found to prevent it is to restore the default signal handlers after > >> MPI_Init. > >> > >> Excuse the quoting style. Good sucks. > >> > >> > >> _

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread Gilles Gouaillardet
ing style. Good sucks. >> >> >> ____________________ >> From: users on behalf of dpchoudh . >> Sent: Monday, May 09, 2016 2:59:37 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] No core dump in some cases >> >> Hi Gus >

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
From: users on behalf of dpchoudh . > Sent: Monday, May 09, 2016 2:59:37 PM > To: Open MPI Users > Subject: Re: [OMPI users] No core dump in some cases > > Hi Gus > > Thanks for your suggestion. But I am not using any resource manager (i.e. > I am launching mpirun from the bash she

Re: [OMPI users] No core dump in some cases

2016-05-10 Thread Gus Correa
On 05/09/2016 04:59 PM, dpchoudh . wrote: Hi Gus Thanks for your suggestion. But I am not using any resource manager (i.e. I am launching mpirun from the bash shell.). In fact, both of the two clusters I talked about run CentOS 7 and I launch the job the same way on both of these, yet one of the

Re: [OMPI users] No core dump in some cases

2016-05-10 Thread Hjelm, Nathan Thomas
sucks. From: users on behalf of dpchoudh . Sent: Monday, May 09, 2016 2:59:37 PM To: Open MPI Users Subject: Re: [OMPI users] No core dump in some cases Hi Gus Thanks for your suggestion. But I am not using any resource manager (i.e. I am launching mpirun from the bash shell.). In

Re: [OMPI users] No core dump in some cases

2016-05-09 Thread dpchoudh .
Hi Gus Thanks for your suggestion. But I am not using any resource manager (i.e. I am launching mpirun from the bash shell.). In fact, both of the two clusters I talked about run CentOS 7 and I launch the job the same way on both of these, yet one of them creates standard core files and the other

Re: [OMPI users] No core dump in some cases

2016-05-09 Thread Gus Correa
Hi Durga Just in case ... If you're using a resource manager to start the jobs (Torque, etc), you need to have them set the limits (for coredump size, stacksize, locked memory size, etc). This way the jobs will inherit the limits from the resource manager daemon. On Torque (which I use) I do th

Re: [OMPI users] No core dump in some cases

2016-05-07 Thread Jeff Squyres (jsquyres)
I'm afraid I don't know what a .btr file is -- that is not something that is controlled by Open MPI. You might want to look into your OS settings to see if it has some kind of alternate corefile mechanism...? > On May 6, 2016, at 8:58 PM, dpchoudh . wrote: > > Hello all > > I run MPI jobs (