On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> Hi Cristobal > > In case you are not using full path name for mpiexec/mpirun, > what does "which mpirun" say? > --> $which mpirun /opt/openmpi-1.4.2 > > Often times this is a source of confusion, old versions may > be first on the PATH. > > Gus > openMPI version problem is now gone, i can confirm that the version is consistent now :), thanks. however, i keep getting this kernel crash randomnly when i execute with -np higher than 5 these are Xeons, with Hyperthreading On, is that a problem?? im trying to locate the kernel error on logs, but after rebooting a crash, the error is not in the kern.log (neither kern.log.1). all i remember is that it starts with "Kernel BUG..." and somepart it mentions a certain CPU X, where that cpu can be any from 0 to 15 (im testing only in main node). Someone knows where the log of kernel error could be? > > Cristobal Navarro wrote: > >> >> On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu<mailto: >> g...@ldeo.columbia.edu>> wrote: >> >> Hi Cristobal >> >> Does it run only on the head node alone? >> (Fuego? Agua? Acatenango?) >> Try to put only the head node on the hostfile and execute with mpiexec. >> >> --> i will try only with the head node, and post results back >> This may help sort out what is going on. >> Hopefully it will run on the head node. >> >> Also, do you have Infinband connecting the nodes? >> The error messages refer to the openib btl (i.e. Infiniband), >> and complains of >> >> >> no we are just using normal network 100MBit/s , since i am just testing >> yet. >> >> >> "perhaps a missing symbol, or compiled for a different >> version of Open MPI?". >> It sounds as a mixup of versions/builds. >> >> >> --> i agree, somewhere there must be the remains of the older version >> >> Did you configure/build OpenMPI from source, or did you install >> it with apt-get? >> It may be easier/less confusing to install from source. >> If you did, what configure options did you use? >> >> >> -->i installed from source, ./configure --prefix=/opt/openmpi-1.4.2 >> --with-sge --without-xgid --disable--static >> >> Also, as for the OpenMPI runtime environment, >> it is not enough to set it on >> the command line, because it will be effective only on the head node. >> You need to either add them to the PATH and LD_LIBRARY_PATH >> on your .bashrc/.cshrc files (assuming these files and your home >> directory are *also* shared with the nodes via NFS), >> or use the --prefix option of mpiexec to point to the OpenMPI main >> directory. >> >> >> yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside >> the login scripts ( .bashrc in my case ) >> >> Needless to say, you need to check and ensure that the OpenMPI >> directory (and maybe your home directory, and your work directory) >> is (are) >> really mounted on the nodes. >> >> >> --> yes, doublechecked that they are >> >> I hope this helps, >> >> >> --> thanks really! >> >> Gus Correa >> >> Update: i just reinstalled openMPI, with the same parameters, and it >> seems that the problem has gone, i couldnt test entirely but when i >> get back to lab ill confirm. >> >> best regards! Cristobal >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >