I got it to work, but it didn't have anything to do with the environment variables on the shell.
I am running CentOS 4.2, and the system has everything compiled with the GCC 3.4, and it also has GCC 4.0 installed. I was building ompi with GCC 4.0, and I think it was having trouble with loading dynamic libraries since the system is build with 3.4. I am not sure if that is exactly the case, but I recompiled ompi with GCC 3.4, and only used gfortran for FC. After that things seemed to be working properly. Was my guess correct, or do you know the real reason why this is? Also, does ompi have something similar to "lamboot" and "recon", or is the only option is adding --hostfile or --host a,b to the mpirun command? Sam Adams General Dynamics - Network Systems Phone: 210.536.5945 -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Kluskens Sent: Monday, April 10, 2006 1:03 PM To: Open MPI Users Subject: Re: [OMPI users] job running question You need to confirm that /etc/bashrc is actually being read in that environment, bash is a little different on which files get read depending on whether you login interactively or not. Also, I don't think ~/.bashrc is read on a noninteractive login. Michael On Apr 10, 2006, at 1:06 PM, Adams Samuel D Contr AFRL/HEDR wrote: > I put in /etc/bashrc and opened a new shell, but I still am not > seeing any > core files. > > Sam Adams > General Dynamics - Network Systems > Phone: 210.536.5945 > > -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-bounces@open- > mpi.org] On > Behalf Of Pavel Shamis (Pasha) > Sent: Monday, April 10, 2006 8:56 AM > To: Open MPI Users > Subject: Re: [OMPI users] job running question > > Mpirun opens separate shell on each machine/node, so the "ulimit" will > not be available in new sheel. I think if you will add "ulimit -c > unlimited" to you default shell configuration file (~/.bashrc in BASH > case ant ~/.tcshrc in TCSH/CSH case) you will find your core files :) > > Regards, > Pavel Shamis (Pasha) > > Adams Samuel D Contr AFRL/HEDR wrote: >> I set bash to have unlimited size core files like this: >> >> $ ulimit -c unlimited >> >> But, it was not dropping core files for some reason when I was >> running > with >> mpirun. Just to make sure it would do what I expected, I wrote a >> little C >> program that was kind of like this >> >> int ptr = 4; >> fprintf(stderr,"bad! %s\n", (char*)ptr); >> >> That would give a segmentation fault. It dropped a core file like >> you > would >> expect. Am I missing something? >> >> Sam Adams >> General Dynamics - Network Systems >> Phone: 210.536.5945 >> >> -----Original Message----- >> From: users-boun...@open-mpi.org [mailto:users-bounces@open- >> mpi.org] On >> Behalf Of Jeff Squyres (jsquyres) >> Sent: Saturday, April 08, 2006 6:25 AM >> To: Open MPI Users >> Subject: Re: [OMPI users] job running question >> >> Some process is exiting on a segv -- are you getting any corefiles? >> >> If not, can you increase your coredumpsize to unlimited? This should >> let you get a corefile; can you send the backtrace from that >> corefile? >> >> >>> -----Original Message----- >>> From: users-boun...@open-mpi.org >>> [mailto:users-boun...@open-mpi.org] On Behalf Of Adams Samuel >>> D Contr AFRL/HEDR >>> Sent: Friday, April 07, 2006 11:53 AM >>> To: 'us...@open-mpi.org' >>> Subject: [OMPI users] job running question >>> >>> We are trying to build a new cluster running OpenMPI. We >>> were previous >>> running LAM-MPI. To run jobs we would do the following: >>> >>> $ lamboot lam-host-file >>> $ mpirun C program >>> >>> I am not sure if this works more or less the same way with >>> ompi. We were >>> trying to run it like this: >>> >>> $ [james.parker@Cent01 FORTRAN]$ mpirun --np 2 f_5x5 localhost >>> mpirun noticed that job rank 1 with PID 0 on node "localhost" >>> exited on >>> signal 11. >>> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: A daemon on node >>> localhost >>> failed to start as expected. >>> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: There may be more >>> information >>> available from >>> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: the remote shell >>> (see above). >>> [Cent01.brooks.afmc.ds.af.mil:16124] The daemon received a signal >>> 11. >>> 1 additional process aborted (not shown) >>> [james.parker@Cent01 FORTRAN]$ >>> >>> We have ompi installed to /usr/local, and these are our environment >>> variables: >>> >>> [james.parker@Cent01 FORTRAN]$ export >>> declare -x COLORTERM="gnome-terminal" >>> declare -x >>> DBUS_SESSION_BUS_ADDRESS="unix:abstract=/tmp/dbus-sfzFctmRFS" >>> declare -x DESKTOP_SESSION="default" >>> declare -x DISPLAY=":0.0" >>> declare -x GDMSESSION="default" >>> declare -x GNOME_DESKTOP_SESSION_ID="Default" >>> declare -x GNOME_KEYRING_SOCKET="/tmp/keyring-x8WQ1E/socket" >>> declare -x >>> GTK_RC_FILES="/etc/gtk/gtkrc:/home/BROOKS-2K/james.parker/.gtk >>> rc-1.2-gnome2" >>> declare -x G_BROKEN_FILENAMES="1" >>> declare -x HISTSIZE="1000" >>> declare -x HOME="/home/BROOKS-2K/james.parker" >>> declare -x HOSTNAME="Cent01" >>> declare -x INPUTRC="/etc/inputrc" >>> declare -x KDEDIR="/usr" >>> declare -x LANG="en_US.UTF-8" >>> declare -x LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib/openmpi" >>> declare -x LESSOPEN="|/usr/bin/lesspipe.sh %s" >>> declare -x LOGNAME="james.parker" >>> declare -x >>> LS_COLORS="no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd= >>> 40;33;01:cd=40 >>> ;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.ex >>> e=00;32:*.com= >>> 00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00; >>> 31:*.tgz=00;31 >>> :*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z >>> =00;31:*.gz=00 >>> ;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31 >>> :*.jpg=00;35:* >>> .gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.t >>> if=00;35:" >>> declare -x MAIL="/var/spool/mail/james.parker" >>> declare -x >>> OLDPWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0" >>> declare -x >>> PATH="/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R >>> 6/bin:/home/BR >>> OOKS-2K/james.parker/bin:/usr/local/bin" >>> declare -x >>> PERL5LIB="/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-mul >>> ti:/usr/lib/pe >>> rl5/site_perl/5.8.5" >>> declare -x >>> PWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0/FORTRAN" >>> declare -x >>> SESSION_MANAGER="local/Cent01.brooks.afmc.ds.af.mil:/tmp/.ICE- >>> unix/14516" >>> declare -x SHELL="/bin/bash" >>> declare -x SHLVL="2" >>> declare -x SSH_AGENT_PID="14541" >>> declare -x SSH_ASKPASS="/usr/libexec/openssh/gnome-ssh-askpass" >>> declare -x SSH_AUTH_SOCK="/tmp/ssh-JUIxl14540/agent.14540" >>> declare -x TERM="xterm" >>> declare -x USER="james.parker" >>> declare -x WINDOWID="35651663" >>> declare -x XAUTHORITY="/home/BROOKS-2K/james.parker/.Xauthority" >>> [james.parker@Cent01 FORTRAN]$ >>> > _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users