Bummer! :-( Just to be sure -- you had a clean config.cache file before you ran configure, right? (e.g., the file didn't exist -- just to be sure it didn't get potentially erroneous values from a previous run of configure) Also, FWIW, it's not necessary to specify --enable-ltdl-convenience; that should be automatic. If you had a clean configure, we *suspect* that this might be due to alignment issues on Solaris 64 bit platforms, but thought that we might have had a pretty good handle on it in 1.1. Obviously we didn't solve everything. Bonk. Did you get a corefile, perchance? If you could send a stack trace, that would be most helpful.
________________________________ From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Eric Thibodeau Sent: Tuesday, June 20, 2006 8:36 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Installing OpenMPI on a solaris Hello Brian (and all), Well, the joy was short lived. On a 12 CPU Enterprise machine and on a 4 CPU one, I seem to be able to start up to 4 processes. Above 4, I seem to inevitably get BUS_ADRALN (Bus collisions?). Below are some traces of the failling runs as well as a detailed (mpirun -d) of one of these situations and ompi_info output. Obviously, don't hesitate to ask if more information is requred. Buid version: openmpi-1.1b5r10421 Config parameters: Open MPI config.status 1.1b5 configured by ./configure, generated by GNU Autoconf 2.59, with options \"'--cache-file=config.cache' 'CFLAGS=-mcpu=v9' 'CXXFLAGS=-mcpu=v9' 'FFLAGS=-mcpu=v9' '--prefix=/export/lca/home/lca0/etudiants/ac38820/openmp i_sun4u' --enable-ltdl-convenience\" The traces: sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 10 mandelbrot-mpi 100 400 400 Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2f4f04 *** End of error message *** sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 8 mandelbrot-mpi 100 400 400 Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2b354c *** End of error message *** sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 6 mandelbrot-mpi 100 400 400 Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2b1ecc *** End of error message *** sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400 Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2b12cc *** End of error message *** sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 4 mandelbrot-mpi 100 400 400 maxiter = 100, width = 400, height = 400 execution time in seconds = 1.48 Taper q pour quitter le programme, autrement, on fait un refresh q sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400 Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2b12cc *** End of error message *** I also got the same behaviour on a different machine (with the exact same code base, $HOME is an NFS mount) and same hardware but limited to 4 CPUs. The following is a debug run of such the failling execution: sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ ~/openmpi_sun4u/bin/mpirun -d -v -np 5 mandelbrot-mpi 100 400 400 [enterprise:24786] [0,0,0] setting up session dir with [enterprise:24786] universe default-universe [enterprise:24786] user sshd [enterprise:24786] host enterprise [enterprise:24786] jobid 0 [enterprise:24786] procid 0 [enterprise:24786] procdir: /tmp/openmpi-sessions-sshd@enterprise_0/default-universe/0/0 [enterprise:24786] jobdir: /tmp/openmpi-sessions-sshd@enterprise_0/default-universe/0 [enterprise:24786] unidir: /tmp/openmpi-sessions-sshd@enterprise_0/default-universe [enterprise:24786] top: openmpi-sessions-sshd@enterprise_0 [enterprise:24786] tmp: /tmp [enterprise:24786] [0,0,0] contact_file /tmp/openmpi-sessions-sshd@enterprise_0/default-universe/universe-setup.txt [enterprise:24786] [0,0,0] wrote setup file [enterprise:24786] pls:rsh: local csh: 0, local bash: 0 [enterprise:24786] pls:rsh: assuming same remote shell as local shell [enterprise:24786] pls:rsh: remote csh: 0, remote bash: 0 [enterprise:24786] pls:rsh: final template argv: [enterprise:24786] pls:rsh: /usr/local/bin/ssh <template> ( ! [ -e ./.profile ] || . ./.profile; orted --debug --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename <template> --universe sshd@enterprise:default-universe --nsreplica "0.0.0;tcp://10.45.117.37:40236" --gprreplica "0.0.0;tcp://10.45.117.37:40236" --mpi-call-yield 0 ) [enterprise:24786] pls:rsh: launching on node localhost [enterprise:24786] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1 (1 5) [enterprise:24786] pls:rsh: localhost is a LOCAL node [enterprise:24786] pls:rsh: reset PATH: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/bin:/bin:/usr/local/bin:/usr/bin:/usr/sbin:/usr/ccs/bin:/usr/dt/bin:/usr/local/lam-mpi/7.1.1/bin:/export/lca/appl/Forte/SUNWspro/WS6U2/bin:/opt/sfw/bin:/usr/bin:/usr/ucb:/etc:/usr/local/bin:. [enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/lib:/export/lca/appl/Forte/SUNWspro/WS6U2/lib:/usr/local/lib:/usr/local/lam-mpi/7.1.1/lib:/opt/sfw/lib [enterprise:24786] pls:rsh: changing to directory /export/lca/home/lca0/etudiants/ac38820 [enterprise:24786] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe sshd@enterprise:default-universe --nsreplica "0.0.0;tcp://10.45.117.37:40236" --gprreplica "0.0.0;tcp://10.45.117.37:40236" --mpi-call-yield 1 [enterprise:24787] [0,0,1] setting up session dir with [enterprise:24787] universe default-universe [enterprise:24787] user sshd [enterprise:24787] host localhost [enterprise:24787] jobid 0 [enterprise:24787] procid 1 [enterprise:24787] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/0/1 [enterprise:24787] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/0 [enterprise:24787] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24787] top: openmpi-sessions-sshd@localhost_0 [enterprise:24787] tmp: /tmp [enterprise:24789] [0,1,0] setting up session dir with [enterprise:24789] universe default-universe [enterprise:24789] user sshd [enterprise:24789] host localhost [enterprise:24789] jobid 1 [enterprise:24789] procid 0 [enterprise:24789] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/0 [enterprise:24789] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1 [enterprise:24789] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24789] top: openmpi-sessions-sshd@localhost_0 [enterprise:24789] tmp: /tmp [enterprise:24791] [0,1,1] setting up session dir with [enterprise:24791] universe default-universe [enterprise:24791] user sshd [enterprise:24791] host localhost [enterprise:24791] jobid 1 [enterprise:24791] procid 1 [enterprise:24791] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/1 [enterprise:24791] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1 [enterprise:24791] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24791] top: openmpi-sessions-sshd@localhost_0 [enterprise:24791] tmp: /tmp [enterprise:24793] [0,1,2] setting up session dir with [enterprise:24793] universe default-universe [enterprise:24793] user sshd [enterprise:24793] host localhost [enterprise:24793] jobid 1 [enterprise:24793] procid 2 [enterprise:24793] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/2 [enterprise:24793] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1 [enterprise:24793] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24793] top: openmpi-sessions-sshd@localhost_0 [enterprise:24793] tmp: /tmp [enterprise:24795] [0,1,3] setting up session dir with [enterprise:24795] universe default-universe [enterprise:24795] user sshd [enterprise:24795] host localhost [enterprise:24795] jobid 1 [enterprise:24795] procid 3 [enterprise:24795] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/3 [enterprise:24795] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1 [enterprise:24795] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24795] top: openmpi-sessions-sshd@localhost_0 [enterprise:24795] tmp: /tmp [enterprise:24797] [0,1,4] setting up session dir with [enterprise:24797] universe default-universe [enterprise:24797] user sshd [enterprise:24797] host localhost [enterprise:24797] jobid 1 [enterprise:24797] procid 4 [enterprise:24797] procdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/4 [enterprise:24797] jobdir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe/1 [enterprise:24797] unidir: /tmp/openmpi-sessions-sshd@localhost_0/default-universe [enterprise:24797] top: openmpi-sessions-sshd@localhost_0 [enterprise:24797] tmp: /tmp [enterprise:24786] spawn: in job_state_callback(jobid = 1, state = 0x4) [enterprise:24786] Info: Setting up debugger process table for applications MPIR_being_debugged = 0 MPIR_debug_gate = 0 MPIR_debug_state = 1 MPIR_acquired_pre_main = 0 MPIR_i_am_starter = 0 MPIR_proctable_size = 5 MPIR_proctable: (i, host, exe, pid) = (0, localhost, mandelbrot-mpi, 24789) (i, host, exe, pid) = (1, localhost, mandelbrot-mpi, 24791) (i, host, exe, pid) = (2, localhost, mandelbrot-mpi, 24793) (i, host, exe, pid) = (3, localhost, mandelbrot-mpi, 24795) (i, host, exe, pid) = (4, localhost, mandelbrot-mpi, 24797) [enterprise:24789] [0,1,0] ompi_mpi_init completed [enterprise:24791] [0,1,1] ompi_mpi_init completed [enterprise:24793] [0,1,2] ompi_mpi_init completed [enterprise:24795] [0,1,3] ompi_mpi_init completed [enterprise:24797] [0,1,4] ompi_mpi_init completed Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN) Failing at addr:2b12cc *** End of error message *** [enterprise:24787] sess_dir_finalize: found proc session dir empty - deleting [enterprise:24787] sess_dir_finalize: job session dir not empty - leaving [enterprise:24787] orted: job_state_callback(jobid = 1, state = ORTE_PROC_STATE_ABORTED) [enterprise:24787] sess_dir_finalize: found job session dir empty - deleting [enterprise:24787] sess_dir_finalize: univ session dir not empty - leaving -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24789 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24791 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24793 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24795 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24797 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24789 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24791 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24793 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24795 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: A process refused to die! Host: enterprise PID: 24797 This process may still be running and/or consuming resources. -------------------------------------------------------------------------- [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving [enterprise:24787] sess_dir_finalize: proc session dir not empty - leaving [enterprise:24787] orted: job_state_callback(jobid = 1, state = ORTE_PROC_STATE_TERMINATED) [enterprise:24787] sess_dir_finalize: found proc session dir empty - deleting [enterprise:24787] sess_dir_finalize: found job session dir empty - deleting [enterprise:24787] sess_dir_finalize: found univ session dir empty - deleting [enterprise:24787] sess_dir_finalize: found top session dir empty - deleting ompi_info output: sshd@enterprise ~ $ ~/openmpi_sun4u/bin/ompi_info Open MPI: 1.1b5r10421 Open MPI SVN revision: r10421 Open RTE: 1.1b5r10421 Open RTE SVN revision: r10421 OPAL: 1.1b5r10421 OPAL SVN revision: r10421 Prefix: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u Configured architecture: sparc-sun-solaris2.8 Configured by: sshd Configured on: Tue Jun 20 15:25:44 EDT 2006 Configure host: averoes Built by: ac38820 Built on: Tue Jun 20 15:59:47 EDT 2006 Built host: averoes C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: no Fortran90 bindings size: na C compiler: gcc C compiler absolute: /usr/local/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/local/bin/g++ Fortran77 compiler: g77 Fortran77 compiler abs: /usr/local/bin/g77 Fortran90 compiler: f90 Fortran90 compiler abs: /export/lca/appl/Forte/SUNWspro/WS6U2/bin/f90 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: no C++ exceptions: no Thread support: solaris (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1) MCA timer: solaris (MCA v1.0, API v1.0, Component v1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) MCA pml: dr (MCA v1.0, API v1.0, Component v1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1) Le mardi 20 juin 2006 17:06, Eric Thibodeau a écrit : > Thanks for the pointer, it WORKS!! (yay) > > Le mardi 20 juin 2006 12:21, Brian Barrett a écrit : > > On Jun 19, 2006, at 12:15 PM, Eric Thibodeau wrote: > > > > > I checked the thread with the same title as this e-mail and tried > > > compiling openmpi-1.1b4r10418 with: > > > > > > ./configure CFLAGS="-mv8plus" CXXFLAGS="-mv8plus" FFLAGS="-mv8plus" > > > FCFLAGS="-mv8plus" --prefix=$HOME/openmpi-SUN-`uname -r` --enable- > > > pretty-print-stacktrace > > I put the incorrect flags in the error message - can you try again with: > > > > > > ./configure CFLAGS=-mcpu=v9 CXXFLAGS=-mcpu=v9 FFLAGS=-mcpu=v9 > > FCFLAGS=-mcpu=v9 --prefix=$HOME/openmpi-SUN-`uname -r` --enable- > > pretty-print-stacktrace > > > > > > and see if that helps? By the way, I'm not sure if Solaris has the > > required support for the pretty-print stack trace feature. It likely > > will print what signal caused the error, but will not actually print > > the stack trace. It's enabled by default on Solaris, with this > > limited functionality (the option exists for platforms that have > > broken half-support for GNU libc's stack trace feature, and for users > > that don't like us registering a signal handler to do the work). > > > > Brian > > > > > -- Eric Thibodeau Neural Bucket Solutions Inc. T. (514) 736-1436 C. (514) 710-0517