Mahmood,

note you have to compile the source file that contains the snippet
with '-g -O0', and link with '-g -O0'

also, there was a typo in the gdb command,
please read "frame 1" instead of "frame #1"

Cheers,

Gilles

On Fri, Sep 16, 2016 at 12:53 PM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> Mahmood,
>
> -march=bdver1
>
> should be ok on your nodes.
> from the gcc command line, i was expecting -march=xxx, but it is
> missing (your gcc might be a bit older for that)
> note you have to recompile all your libs (openblas and friends) with
> -march=bdver1
>
> i guess your gdb is also a bit too old to support all operations on a core 
> file
> (fwiw, i am able to do that on RHEL7)
>
> at first, i recommend you find the smallest number of nodes necessary
> to reproduce the issue.
> ideally, you would confirm the app is working fine by running it
> exclusively on the frontend.
>
> if you do not have a parallel debugger, then you have to manually
> parallel debug your app.
>
> i usually update my main app like this
>
> int _dbg=1;
>
> MPI_Init(...);
> printf("gdb --pid=%d\n", getpid());
> while (_dbg) poll(NULL, 0, 1);
>
> rebuild and run.
>
> then log into the compute nodes, and run the gdb command that was
> displayed previously
> you usually have to (for all your MPI tasks, in different terminals)
> bt
> frame #1
> set _dbg=0
> c
>
> and wait for a crash
>
> hopefully, you will be able to run
> disas
> info proc mapping
> x /100x $rp
>
> Cheers,
>
> Gilles
>
>
> On Fri, Sep 16, 2016 at 2:54 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:
>> The differences are very very minor
>>
>> root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1
>>  /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v -
>> -mtune=generic
>>
>> [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1
>>  /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v -
>> -mtune=generic
>>
>>
>> Even I tried to compile the program with -march=amdfam10. Something like
>> these
>>
>> /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10
>> `FoX/FoX-config --fcflags`  -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
>> -DTRANSIESTA    /export/apps/siesta/siesta-4.0/Src/pspltm1.F
>>
>> But got the same error.
>>
>> /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute
>> node it shows (family 21, model 1).
>>
>>
>>
>>>That being said, my best bet is you compile on a compute node ...
>> gcc is there on the computes, but the NFS permission is another issue. It
>> seems that nodes are not able to write on /share (the one which is shared
>> between frontend and computes).
>>
>>
>>
>> An important question is that, how can I find out what is the name of the
>> illegal instruction. Then, I hope to find the document that points which
>> instruction set (avx, sse4, ...) contains that instruction.
>>
>> Is there any option in mpirun to turn on the verbosity to see more
>> information?
>>
>> Regards,
>> Mahmood
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to