Hi Prentice, after some tests I've concluded that is not an environment problem; following you can see the env printed by a job. And seems correct. I've seen if that the library /usr/local/lib/openmpi/mca_plm_lsf is in the appropriate location the job fail: > mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so: > undefined symbol: lsb_init
The problem disappaers if a rename/rmeove the lib /usr/local/lib/openmpi/mca_plm_lsf . So I think that the LSF support included in the last version on Open mpi doesn't interact well with the lsf process that run openmpi jobs ( perhaps TaskManager ). Have you any ideas? Bye Alex + exec pam -g /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/openmpi_wrapper /mnt/ewd/mpi/hello/hello [grid01.ags.wan:11820] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_plm_lsf: file not found (ignored) Hello World! from process 2 out of 4 on grid01.ags.wan Hello World! from process 3 out of 4 on grid01.ags.wan Hello World! from process 1 out of 4 on grid05.ags.wan Hello World! from process 0 out of 4 on grid03.ags.wan MANPATH=/opt/lsf/7.0/man: EGO_CONFDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel LSB_EXEC_CLUSTER=grid-cluster-01 LSF_EAUTH_AUX_PASS=yes HOSTNAME=grid01 EGO_TOP=/opt/lsf LSF_LIM_API_NTRIES=1 LSF_LOGDIR=/opt/lsf/log LSB_BATCH_JID=748 EGO_SERVERDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc LSB_TRAPSIGS=trap # 15 10 12 2 1 LS_JOBPID=11809 LSB_JOBRES_CALLBACK=45290@grid01 LSB_JOB_EXECUSER=lsfadmin LSB_JOBID=748 LSF_SERVERDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc LSB_JOBRES_PID=11809 LSF_TS_OPTIONS=-p grid01:42740 -c /opt/lsf/conf -s /opt/lsf/7.0/linux2.6-glibc2.3-x86/etc -a LINUX86 LSB_JOBNAME=mpirun.lsf /mnt/ewd/mpi/hello/hello PM_SOURCE=pam LSF_PJL_TYPE=openmpi LSF_LIBDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib USER=lsfadmin LSB_EEXEC_REAL_UID= EGO_LIBDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib HOSTTYPE=LINUX86 LSF_INVOKE_CMD=bsub LS_EXEC_T=START LSF_EAUTH_SERVER=mbatchd@grid-cluster-01 LS_SUBCWD=/mnt/ewd/mpi/hello LSF_VERSION=7.0 LSB_DJOB_RU_INTERVAL=15 LSB_HOSTS=grid01 grid01 grid05 grid03 LSB_UNIXGROUP_INT=lsfadmin LSB_DJOB_HB_INTERVAL=15 LSB_JOBFILENAME=/home/lsfadmin/.lsbatch/1239206877.748 LSB_JOBINDEX=0 PATH=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin:/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc:/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/lsfadmin/bin MAIL=/var/spool/mail/lsfadmin LSB_EXIT_PRE_ABORT=99 LSB_JOBEXIT_STAT=0 LSF_TSOPT_NUM=0 PWD=/mnt/ewd/mpi/hello LSB_CHKFILENAME=/home/lsfadmin/.lsbatch/1239206877.748 LSF_EAUTH_CLIENT=user LSB_DJOB_HOSTFILE=/home/lsfadmin/.lsbatch/1239206877.748.hostfile LSF_BINDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin HOME=/home/lsfadmin SHLVL=3 LSB_ACCT_FILE=/tmp/.1239206877.748.acct BINARY_TYPE_HPC= LSF_PM_MPIARGS=-p4pg /home/lsfadmin/pam_pg.11813 LSB_SUB_HOST=grid03 EGO_LOCAL_CONFDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel LSFUSER=lsfadmin LSB_QUEUE=normal LSB_MCPU_HOSTS=grid03 1 grid05 1 grid01 2 LOGNAME=lsfadmin CVS_RSH=ssh XLSF_UIDDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib/uid LESSOPEN=|/usr/bin/lesspipe.sh %s EGO_ESRVDIR=/opt/lsf/conf/ego/grid-cluster-01/eservice LSB_EEXEC_REAL_GID= LSF_ENVDIR=/opt/lsf/conf LSF_EGO_ENVDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel G_BROKEN_FILENAMES=1 EGO_BINDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin _=/bin/env ldd /mnt/ewd/mpi/fibonacci/fibonacci_mpi linux-gate.so.1 => (0x40000000) libmpi.so.0 => /usr/local/lib/libmpi.so.0 (0x40002000) libopen-rte.so.0 => /usr/local/lib/libopen-rte.so.0 (0x40090000) libopen-pal.so.0 => /usr/local/lib/libopen-pal.so.0 (0x400d2000) libdl.so.2 => /lib/libdl.so.2 (0x00c00000) libnsl.so.1 => /lib/libnsl.so.1 (0x00cca000) libutil.so.1 => /lib/libutil.so.1 (0x03668000) libm.so.6 => /lib/i686/nosegneg/libm.so.6 (0x00c06000) libpthread.so.0 => /lib/i686/nosegneg/libpthread.so.0 (0x00c2f000) libc.so.6 => /lib/i686/nosegneg/libc.so.6 (0x00ab8000) /lib/ld-linux.so.2 (0x00a95000) On Mon, Apr 6, 2009 at 10:02 PM, Prentice Bisbal <prent...@ias.edu> wrote: > Alessandro Surace wrote: > > Hi guys, I try to repost my question... > > I've a problem with the last stable build and the last nightly snapshot. > > > > When I run a job directly with mpirun no problem. > > If I try to submit it with lsf: > > bsub -a openmpi -m grid01 mpirun.lsf /mnt/ewd/mpi/fibonacci/fibonacci_mpi > > > > I get the follow error: > > mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so: > > undefined symbol: lsb_init > > Job /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/openmpi_wrapper > > /mnt/ewd/mpi/fibonacci/fibonacci_mpi > > > > I've verified that the lsb_init symbol is present in the library: > > [root@grid01 lib]# strings libbat.* |grep lsb_init > > lsb_init > > sch_lsb_init > > lsb_init() > > lsb_init > > sch_lsb_init > > sch_lsb_init > > sch_lsb_init > > sch_lsb_init > > lsb_init() > > sch_lsb_init > > > > Can you verify that LSF is passing your evironment along correctly? It > looks like your LD_LIBRARY_PATH is set in your login environment, but > not the environment that the LSF job runs in > > You can check this by submitting a jog that executes just the command > 'printenv'. Compare the output to what you get when you type 'printenv' > on the command. Compare the values for LD_LIBRARY_PATH, in particular. > > If that looks okay, then try running a job that just executes > > ldd /mnt/ewd/mpi/fibonacci/fibonacci_mpi > > This will show you any libraries that ld can't find in the LSF run-time > environment. > > -- > Prentice > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >