Mahmood, note you have to compile the source file that contains the snippet with '-g -O0', and link with '-g -O0'
also, there was a typo in the gdb command, please read "frame 1" instead of "frame #1" Cheers, Gilles On Fri, Sep 16, 2016 at 12:53 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Mahmood, > > -march=bdver1 > > should be ok on your nodes. > from the gcc command line, i was expecting -march=xxx, but it is > missing (your gcc might be a bit older for that) > note you have to recompile all your libs (openblas and friends) with > -march=bdver1 > > i guess your gdb is also a bit too old to support all operations on a core > file > (fwiw, i am able to do that on RHEL7) > > at first, i recommend you find the smallest number of nodes necessary > to reproduce the issue. > ideally, you would confirm the app is working fine by running it > exclusively on the frontend. > > if you do not have a parallel debugger, then you have to manually > parallel debug your app. > > i usually update my main app like this > > int _dbg=1; > > MPI_Init(...); > printf("gdb --pid=%d\n", getpid()); > while (_dbg) poll(NULL, 0, 1); > > rebuild and run. > > then log into the compute nodes, and run the gdb command that was > displayed previously > you usually have to (for all your MPI tasks, in different terminals) > bt > frame #1 > set _dbg=0 > c > > and wait for a crash > > hopefully, you will be able to run > disas > info proc mapping > x /100x $rp > > Cheers, > > Gilles > > > On Fri, Sep 16, 2016 at 2:54 AM, Mahmood Naderan <mahmood...@gmail.com> wrote: >> The differences are very very minor >> >> root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1 >> /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v - >> -mtune=generic >> >> [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1 >> /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v - >> -mtune=generic >> >> >> Even I tried to compile the program with -march=amdfam10. Something like >> these >> >> /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10 >> `FoX/FoX-config --fcflags` -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT >> -DTRANSIESTA /export/apps/siesta/siesta-4.0/Src/pspltm1.F >> >> But got the same error. >> >> /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute >> node it shows (family 21, model 1). >> >> >> >>>That being said, my best bet is you compile on a compute node ... >> gcc is there on the computes, but the NFS permission is another issue. It >> seems that nodes are not able to write on /share (the one which is shared >> between frontend and computes). >> >> >> >> An important question is that, how can I find out what is the name of the >> illegal instruction. Then, I hope to find the document that points which >> instruction set (avx, sse4, ...) contains that instruction. >> >> Is there any option in mpirun to turn on the verbosity to see more >> information? >> >> Regards, >> Mahmood >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users