I'm afraid I have zero knowledge or experience with gentoo portage, so I can't help you there. I always install our releases from the tarball source as it is pretty trivial to do and avoids any issues.
I will have to defer to someone who knows that system to help you from here. It sounds like an installation or configuration issue. Ralph On 3/16/07 3:15 PM, "David Bronke" <whitel...@gmail.com> wrote: > On 3/15/07, Ralph Castain <r...@lanl.gov> wrote: >> Hmmm...well, a few thoughts to hopefully help with the debugging. One >> initial comment, though - 1.1.2 is quite old. You might want to upgrade to >> 1.2 (releasing momentarily - you can use the last release candidate in the >> interim as it is identical). > > Version 1.2 doesn't seem to be in gentoo portage yet, so I may end up > having to compile from source... I generally prefer to do everything > from portage if possible, because it makes upgrades and maintenance > much cleaner. > >> Meantime, looking at this output, there appear to be a couple of common >> possibilities. First, I don't see any of the diagnostic output from after we >> do a local fork (we do this prior to actually launching the daemon). Is it >> possible your system doesn't allow you to fork processes (some don't, though >> it's unusual)? > > I don't see any problems with forking on this system... I'm able to > start a dbus daemon as a regular user without any problems. > >> Second, it could be that the "orted" program isn't being found in your path. >> People often forget that the path in shells started up by programs isn't >> necessarily the same as that in their login shell. You might try executing a >> simple shellscript that outputs the results of "which orted" to verify this >> is correct. > > 'which orted' from a shell script gives me '/usr/bin/orted', which > seems to be correct. > >> BTW, I should have asked as well: what are you running this on, and how did >> you configure openmpi? > > I'm running this on two identical machines with 2 dual-core > hyperthreading Xeon processors. (EM64T) I simply installed OpenMPI > using portage, with the USE flags "debug fortran pbs -threads". (I've > also tried it with "-debug fortran pbs threads") > >> Ralph >> >> >> >> On 3/15/07 5:33 PM, "David Bronke" <whitel...@gmail.com> wrote: >> >>> I'm using OpenMPI version 1.1.2. I installed it using gentoo portage, >>> so I think it has the right permissions... I tried doing 'equery f >>> openmpi | xargs ls -dl' and inspecting the permissions of each file, >>> and I don't see much out of the ordinary; it is all owned by >>> root:root, but every file has read permission for user, group, and >>> other. (and execute for each as well when appropriate) From the debug >>> output, I can tell that mpirun is creating the session tree in /tmp, >>> and it does seem to be working fine... Here's the output when using >>> --debug-daemons: >>> >>> $ mpirun -aborted 8 -v -d --debug-daemons -np 8 /workspace/bronke/mpi/hello >>> [trixie:25228] [0,0,0] setting up session dir with >>> [trixie:25228] universe default-universe >>> [trixie:25228] user bronke >>> [trixie:25228] host trixie >>> [trixie:25228] jobid 0 >>> [trixie:25228] procid 0 >>> [trixie:25228] procdir: >>> /tmp/openmpi-sessions-bronke@trixie_0/default-universe/0/0 >>> [trixie:25228] jobdir: >>> /tmp/openmpi-sessions-bronke@trixie_0/default-universe/0 >>> [trixie:25228] unidir: >>> /tmp/openmpi-sessions-bronke@trixie_0/default-universe >>> [trixie:25228] top: openmpi-sessions-bronke@trixie_0 >>> [trixie:25228] tmp: /tmp >>> [trixie:25228] [0,0,0] contact_file >>> /tmp/openmpi-sessions-bronke@trixie_0/default-universe/universe-setup.txt >>> [trixie:25228] [0,0,0] wrote setup file >>> [trixie:25228] pls:rsh: local csh: 0, local bash: 1 >>> [trixie:25228] pls:rsh: assuming same remote shell as local shell >>> [trixie:25228] pls:rsh: remote csh: 0, remote bash: 1 >>> [trixie:25228] pls:rsh: final template argv: >>> [trixie:25228] pls:rsh: /usr/bin/ssh <template> orted --debug >>> --debug-daemons --bootproxy 1 --name <template> --num_procs 2 >>> --vpid_start 0 --nodename <template> --universe >>> bronke@trixie:default-universe --nsreplica >>> "0.0.0;tcp://141.238.31.33:43838" --gprreplica >>> "0.0.0;tcp://141.238.31.33:43838" --mpi-call-yield 0 >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] spawn: in job_state_callback(jobid = 1, state = 0x100) >>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited >>> on signal 13. >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] sess_dir_finalize: proc session dir not empty - leaving >>> [trixie:25228] spawn: in job_state_callback(jobid = 1, state = 0x80) >>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 1 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 2 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 3 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 4 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 5 with PID 0 on node "localhost" exited >>> on signal 13. >>> mpirun noticed that job rank 6 with PID 0 on node "localhost" exited >>> on signal 13. >>> [trixie:25228] ERROR: A daemon on node localhost failed to start as >>> expected. >>> [trixie:25228] ERROR: There may be more information available from >>> [trixie:25228] ERROR: the remote shell (see above). >>> [trixie:25228] The daemon received a signal 13. >>> 1 additional process aborted (not shown) >>> [trixie:25228] sess_dir_finalize: found proc session dir empty - deleting >>> [trixie:25228] sess_dir_finalize: found job session dir empty - deleting >>> [trixie:25228] sess_dir_finalize: found univ session dir empty - deleting >>> [trixie:25228] sess_dir_finalize: found top session dir empty - deleting >>> >>> On 3/15/07, Ralph H Castain <r...@lanl.gov> wrote: >>>> It isn't a /dev issue. The problem is likely that the system lacks >>>> sufficient permissions to either: >>>> >>>> 1. create the Open MPI session directory tree. We create a hierarchy of >>>> subdirectories for temporary storage used for things like your shared >>>> memory >>>> file - the location of the head of that tree can be specified at run time, >>>> but has a series of built-in defaults it can search if you don't specify it >>>> (we look at your environmental variables - e.g., TMP or TMPDIR - as well as >>>> the typical Linux/Unix places). You might check to see what your tmp >>>> directory is, and that you have write permission into it. Alternatively, >>>> you >>>> can specify your own location (where you know you have permissions!) by >>>> setting --tmpdir your-dir on the mpirun command line. >>>> >>>> 2. execute or access the various binaries and/or libraries. This is usually >>>> caused when someone installs OpenMPI as root, and then tries to execute as >>>> a >>>> non-root user. Best thing here is to either run through the installation >>>> directory and add the correct permissions (assuming it is a system-level >>>> install), or reinstall as the non-root user (if the install is solely for >>>> you anyway). >>>> >>>> You can also set --debug-daemons on the mpirun command line to get more >>>> diagnostic output from the daemons and then send that along. >>>> >>>> BTW: if possible, it helps us to advise you if we know which version of >>>> OpenMPI you are using. ;-) >>>> >>>> Hope that helps. >>>> Ralph >>>> >>>> >>>> >>>> >>>> On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote: >>>> >>>>> Ok, now that I've figured out what the signal means, I'm wondering >>>>> exactly what is running into permission problems... the program I'm >>>>> running doesn't use any functions except printf, sprintf, and MPI_*... >>>>> I was thinking that possibly changes to permissions on certain /dev >>>>> entries in newer distros might cause this, but I'm not even sure what >>>>> /dev entries would be used by MPI. >>>>> >>>>> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote: >>>>>> Hi, >>>>>> If the perror command is available on your system it will tell >>>>>> you what the message is associated with the signal value. On my system >>>>>> RHEL4U3, it is permission denied. >>>>>> >>>>>> HTH, >>>>>> >>>>>> mac mccalla >>>>>> >>>>>> -----Original Message----- >>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On >>>>>> Behalf Of David Bronke >>>>>> Sent: Thursday, March 15, 2007 12:25 PM >>>>>> To: us...@open-mpi.org >>>>>> Subject: [OMPI users] Signal 13 >>>>>> >>>>>> I've been trying to get OpenMPI working on two of the computers at a lab >>>>>> I help administer, and I'm running into a rather large issue. When >>>>>> running anything using mpirun as a normal user, I get the following >>>>>> output: >>>>>> >>>>>> >>>>>> $ mpirun --no-daemonize --host >>>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo >>>>>> calhost >>>>>> /workspace/bronke/mpi/hello >>>>>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on >>>>>> signal 13. >>>>>> [trixie:18104] ERROR: A daemon on node localhost failed to start as >>>>>> expected. >>>>>> [trixie:18104] ERROR: There may be more information available from >>>>>> [trixie:18104] ERROR: the remote shell (see above). >>>>>> [trixie:18104] The daemon received a signal 13. >>>>>> 8 additional processes aborted (not shown) >>>>>> >>>>>> >>>>>> However, running the same exact command line as root works fine: >>>>>> >>>>>> >>>>>> $ sudo mpirun --no-daemonize --host >>>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo >>>>>> calhost >>>>>> /workspace/bronke/mpi/hello >>>>>> Password: >>>>>> p is 8, my_rank is 0 >>>>>> p is 8, my_rank is 1 >>>>>> p is 8, my_rank is 2 >>>>>> p is 8, my_rank is 3 >>>>>> p is 8, my_rank is 6 >>>>>> p is 8, my_rank is 7 >>>>>> Greetings from process 1! >>>>>> >>>>>> Greetings from process 2! >>>>>> >>>>>> Greetings from process 3! >>>>>> >>>>>> p is 8, my_rank is 5 >>>>>> p is 8, my_rank is 4 >>>>>> Greetings from process 4! >>>>>> >>>>>> Greetings from process 5! >>>>>> >>>>>> Greetings from process 6! >>>>>> >>>>>> Greetings from process 7! >>>>>> >>>>>> >>>>>> I've looked up signal 13, and have found that it is apparently SIGPIPE; >>>>>> I also found a thread on the LAM-MPI site: >>>>>> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php >>>>>> However, this thread seems to indicate that the problem would be in the >>>>>> application, (/workspace/bronke/mpi/hello in this case) but there are no >>>>>> pipes in use in this app, and the fact that it works as expected as root >>>>>> doesn't seem to fit either. I have tried running mpirun with --verbose >>>>>> and it doesn't show any more output than without it, so I've run into a >>>>>> sort of dead-end on this issue. Does anyone know of any way I can figure >>>>>> out what's going wrong or how I can fix it? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> David H. Bronke >>>>>> Lead Programmer >>>>>> G33X Nexus Entertainment >>>>>> http://games.g33xnexus.com/precursors/ >>>>>> >>>>>> v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7 >>>>>> p6 >>>>>> hackerkey.com >>>>>> Support Web Standards! http://www.webstandards.org/ >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >