Brian, Thanks for the quick night fix.I could not find r9223 on the subversion trunk but I downloaded r9224 instead.
- Configure and compile are okay- However compiling the mpi.f90 takes over 35 *minutes* with -O1. This seems a bit excessive... I tried removing any -O option and things are just as slow. Is this behaviour related to open-mpi or to some wrong features of the Studio11 compiler ?
- 'mpirun --help' non longer crashes. - standard output seems messy:a) 'mpirun -np 4 pwd' returns randomly 1 or two lines, never 4. The same behaviour occurs if the output is redirected to a file.
b) When running some simple "demo" fortran code, the standard output is buffered within open-mpi and all results are issued at the end. No intermediates are showed.
- running a slightly more elaborate program fails: a) compile behaves differently with mpif77 and mpif90. While mpif90 compiles and builds "silently", mpif77 is talkative: valiron@icare ~/BENCHES > mpif77 -xtarget=opteron -xarch=amd64 -o all all.fNOTICE: Invoking /opt/Studio11/SUNWspro/bin/f90 -f77 -ftrap=%none -I/users/valiron/lib/openmpi-1.1a1r9224/include -xtarget=opteron -xarch=amd64 -o all all.f -L/users/valiron/lib/openmpi-1.1a1r9224/lib -lmpi -lorte -lopal -lsocket -lnsl -lrt -lm -lthread -ldl
all.f: rw_sched: MAIN all: lam_alltoall: my_alltoall1: my_alltoall2: my_alltoall3: my_alltoall4: check_buf: alltoall_sched_ori: alltoall_sched_new: b) whatever the code was compiled with mpif77 or mpif90, execution fails: valiron@icare ~/BENCHES > mpirun -np 2 all Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:40 *** End of error message *** Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:40 *** End of error message *** Compiling with -g adds no more information.I attach the all.f program... (this program was used last summer to discuss several strategies for alltoall over ethernet on the lammpi list).
Pierre. Brian Barrett wrote:
On Mar 8, 2006, at 4:46 AM, Pierre Valiron wrote:Sorry for the interruption. I back on mpi tracks again. I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.I have also discovered that I don't need to run any openmpi applicationto show up the error. mpirun --help or mpirun show up the same error: valiron@icare ~ > mpirun *Segmentation fault (core dumped) and valiron@icare ~ > pstack core core 'core' of 13842: mpirun fffffd7ffee9dfe0 strlen () + 20 fffffd7ffeef6ab3 vsprintf () + 33 fffffd7fff180fd1 opal_vasprintf () + 41 fffffd7fff180f88 opal_asprintf () + 98 00000000004098a3 orterun () + 63 0000000000407214 main () + 34 000000000040708c ???????? () Seems very basic !It turns out this was an error in our compatibility code for asprintf (). We were doing something with va_list structures that Solaris didn't like. I'm actually surprised that it worked on the UltraSparc version of Solaris, but it has been for some time for us.Anyway, I committed a fix at r9223 on the subversion trunk - it should make tonight's nightly tarball for the trunk. I've also asked the release managers for v1.0.2 to push the fix into that release.Thanks for reporting the issue and for the account. Let me know if you have any further problems.Brian
-- Soutenez le mouvement SAUVONS LA RECHERCHE : http://recherche-en-danger.apinc.org/ _/_/_/_/ _/ _/ Dr. Pierre VALIRON _/ _/ _/ _/ Laboratoire d'Astrophysique _/ _/ _/ _/ Observatoire de Grenoble / UJF _/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France) _/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/ _/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr _/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821_/ _/_/
all.f.gz
Description: GNU Zip compressed data