On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Cristobal
>
> In case you are not using full path name for mpiexec/mpirun,
> what does "which mpirun" say?
>

--> $which mpirun
      /opt/openmpi-1.4.2

>
> Often times this is a source of confusion, old versions may
> be first on the PATH.
>
> Gus
>

openMPI version problem is now gone, i can confirm that the version is
consistent now :), thanks.

however, i keep getting this kernel crash randomnly when i execute with -np
higher than 5
these are Xeons, with Hyperthreading On, is that a problem??

im trying to locate the kernel error on logs, but after rebooting a crash,
the error is not in the kern.log (neither kern.log.1).
all i remember is that it starts with "Kernel BUG..."
and somepart it mentions a certain CPU X, where that cpu can be any from 0
to 15 (im testing only in main node).  Someone knows where the log of kernel
error could be?

>
> Cristobal Navarro wrote:
>
>>
>> On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu<mailto:
>> g...@ldeo.columbia.edu>> wrote:
>>
>>    Hi Cristobal
>>
>>    Does it run only on the head node alone?
>>    (Fuego? Agua? Acatenango?)
>>    Try to put only the head node on the hostfile and execute with mpiexec.
>>
>> --> i will try only with the head node, and post results back
>>    This may help sort out what is going on.
>>    Hopefully it will run on the head node.
>>
>>    Also, do you have Infinband connecting the nodes?
>>    The error messages refer to the openib btl (i.e. Infiniband),
>>    and complains of
>>
>>
>> no we are just using normal network 100MBit/s , since i am just testing
>> yet.
>>
>>
>>    "perhaps a missing symbol, or compiled for a different
>>    version of Open MPI?".
>>    It sounds as a mixup of versions/builds.
>>
>>
>> --> i agree, somewhere there must be the remains of the older version
>>
>>    Did you configure/build OpenMPI from source, or did you install
>>    it with apt-get?
>>    It may be easier/less confusing to install from source.
>>    If you did, what configure options did you use?
>>
>>
>> -->i installed from source, ./configure --prefix=/opt/openmpi-1.4.2
>> --with-sge --without-xgid --disable--static
>>
>>    Also, as for the OpenMPI runtime environment,
>>    it is not enough to set it on
>>    the command line, because it will be effective only on the head node.
>>    You need to either add them to the PATH and LD_LIBRARY_PATH
>>    on your .bashrc/.cshrc files (assuming these files and your home
>>    directory are *also* shared with the nodes via NFS),
>>    or use the --prefix option of mpiexec to point to the OpenMPI main
>>    directory.
>>
>>
>> yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
>> the login scripts ( .bashrc in my case  )
>>
>>    Needless to say, you need to check and ensure that the OpenMPI
>>    directory (and maybe your home directory, and your work directory)
>>    is (are)
>>    really mounted on the nodes.
>>
>>
>> --> yes, doublechecked that they are
>>
>>    I hope this helps,
>>
>>
>> --> thanks really!
>>
>>    Gus Correa
>>
>>    Update: i just reinstalled openMPI, with the same parameters, and it
>>    seems that the problem has gone, i couldnt test entirely but when i
>>    get back to lab ill confirm.
>>
>> best regards! Cristobal
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to