yes, somehow after the second install, the installlation is consistent.
im only running into an issue, might be mpi im not sure. these nodes, each one have 8 phisical procesors (2xIntel Xeon quad core), and 16 virtual ones, btw i have ubuntu server 64bit 10.04 instaled on these nodes. the problem seems to be whenever y try to use over 8 proceses (make use of the virtual ones), i get a horrible error saying about a kernel error and a certain cpu that crashed, the error hags there for about a minute, then it switches to another cpu and shows the same error. i have no other option to press power off button. ill try to copy the error, and post it. On Wed, Jul 28, 2010 at 7:39 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > This issue is usually caused by installing one version of Open MPI over an > older version: > > http://www.open-mpi.org/faq/?category=building#install-overwrite > > > On Jul 27, 2010, at 10:35 PM, Cristobal Navarro wrote: > > > > > On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu> > wrote: > > Hi Cristobal > > > > Does it run only on the head node alone? > > (Fuego? Agua? Acatenango?) > > Try to put only the head node on the hostfile and execute with mpiexec. > > --> i will try only with the head node, and post results back > > This may help sort out what is going on. > > Hopefully it will run on the head node. > > > > Also, do you have Infinband connecting the nodes? > > The error messages refer to the openib btl (i.e. Infiniband), > > and complains of > > > > no we are just using normal network 100MBit/s , since i am just testing > yet. > > > > "perhaps a missing symbol, or compiled for a different > > version of Open MPI?". > > It sounds as a mixup of versions/builds. > > > > --> i agree, somewhere there must be the remains of the older version > > > > Did you configure/build OpenMPI from source, or did you install > > it with apt-get? > > It may be easier/less confusing to install from source. > > If you did, what configure options did you use? > > > > -->i installed from source, > > ./configure --prefix=/opt/openmpi-1.4.2 --with-sge --without-xgid > --disable--static > > > > Also, as for the OpenMPI runtime environment, > > it is not enough to set it on > > the command line, because it will be effective only on the head node. > > You need to either add them to the PATH and LD_LIBRARY_PATH > > on your .bashrc/.cshrc files (assuming these files and your home > directory are *also* shared with the nodes via NFS), > > or use the --prefix option of mpiexec to point to the OpenMPI main > directory. > > > > yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside > the login scripts ( .bashrc in my case ) > > > > Needless to say, you need to check and ensure that the OpenMPI directory > (and maybe your home directory, and your work directory) is (are) > > really mounted on the nodes. > > > > --> yes, doublechecked that they are > > > > I hope this helps, > > > > --> thanks really! > > > > Gus Correa > > > > Update: i just reinstalled openMPI, with the same parameters, and it > seems that the problem has gone, i couldnt test entirely but when i get back > to lab ill confirm. > > > > best regards! > > Cristobal > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >