2009/3/31 Ralph Castain :
> I have no idea why your processes are crashing when run via Torque - are you
> sure that the processes themselves crash? Are they segfaulting - if so, can
> you use gdb to find out where?
I have to admit I'm a newbiee with gdb. I am trying to recompile my
code as "ifort
Dear Rolf,
Thanks for your reply.
I've created another PE and changed the submission script, explicitly
specify the hostname with "--host".
However the result is the same.
# qconf -sp orte
pe_nameorte
slots 8
user_lists NONE
xuser_listsNONE
start_proc_args
2009/3/31 Ralph Castain :
> It is very hard to debug the problem with so little information. We
> regularly run OMPI jobs on Torque without issue.
Another small thing that I noticed. Not sure if it is relevant.
When the job starts running there is an orte process. The args to this
process are sli
2009/3/31 Ralph Castain :
>
> Information would be most helpful - the information we really need is
> specified here: http://www.open-mpi.org/community/help/
Output of "ompi_info --all" is attached in a file.
echo $LD_LIBRARY_PATH
/usr/local/ompi-ifort/lib:/opt/intel/fce/10.1.018/lib:/opt/intel
2009/3/31 Ralph Castain :
> It is very hard to debug the problem with so little information. We
Thanks Ralph! I'm sorry my first post lacked enough specifics. I'll
try my best to fill you guys in on as much debug info as I can.
> regularly run OMPI jobs on Torque without issue.
So do we. In fac
It is very hard to debug the problem with so little information. We
regularly run OMPI jobs on Torque without issue.
Are you getting an allocation from somewhere for the nodes? If so, are
you using Moab to get it? Do you have a $PBS_NODEFILE in your
environment?
I have no idea why your pr
I've a strange OpenMPI/Torque problem while trying to run a job on our
Opteron-SC-1435 based cluster:
Each node has 8 cpus.
If I got to a node and run like so then the job works:
mpirun -np 6 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS}
Same job if I submit through PBS/Torque then it starts running but
On Tue, Mar 31, 2009 at 05:36:19PM -0400, Jeff Squyres wrote:
> On Mar 31, 2009, at 5:25 PM, Kevin McManus wrote:
>
> >--- MCA component mtl:psm (m4 configuration macro)
> >checking for MCA component mtl:psm compile mode... static
> >checking --with-psm value... simple ok (unspecified)
> >checking
On Mar 31, 2009, at 5:25 PM, Kevin McManus wrote:
--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... static
checking --with-psm value... simple ok (unspecified)
checking --with-psm-libdir value... sanity check ok (/usr/lib64)
checking psm.h usab
On Tue, Mar 31, 2009 at 04:59:00PM -0400, Jeff Squyres wrote:
> My goal in having you try that statement in a standalone shell script
> wasn't the success or failure of the uname command -- but rather to
> figure out if something in that statement itself was causing the
> syntax error.
>
> A
On 03/31/09 14:50, Dave Love wrote:
Rolf Vandevaart writes:
However, I found that if I explicitly specify the "-machinefile
$TMPDIR/machines", all 8 mpi processes were spawned within a single
node, i.e. node0002.
I had that sort of behaviour recently when the tight integration was
broken on
My goal in having you try that statement in a standalone shell script
wasn't the success or failure of the uname command -- but rather to
figure out if something in that statement itself was causing the
syntax error.
Apparently it is not. There's an errant character elsewhere that is
cau
On Tue, Mar 31, 2009 at 10:11:17PM +0200, Bogdan Costescu wrote:
> On Tue, 31 Mar 2009, Bogdan Costescu wrote:
>
> >'uname -X' is valid on Solaris, but not on Linux.
>
> Not good to reply to oneself, but I've looked at the archives and
> realized that 'uname -X' comes from a message of the OP. M
On Tue, Mar 31, 2009 at 09:57:02PM +0200, Bogdan Costescu wrote:
> On Tue, 31 Mar 2009, Jeff Squyres wrote:
>
> >UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')`
>
> Not sure what you want to achieve here... 'uname -X' is valid on
> Solaris, but not on Linux. The OP has indicated alrea
Hi Jeff, list
Jeff: Thank you for your help and suggestions.
Please, correct my argument below if I am wrong.
I am not sure yet if the problem is caused by libtool,
because somehow it was not present in OpenMPI 1.2.8.
Just as a comparison, the libtool commands on 1.2.8 and 1.3 are very
similar,
On Tue, 31 Mar 2009, Bogdan Costescu wrote:
'uname -X' is valid on Solaris, but not on Linux.
Not good to reply to oneself, but I've looked at the archives and
realized that 'uname -X' comes from a message of the OP. My guess is
that the same source directory was used to build for Solaris
p
On Tue, 31 Mar 2009, Jeff Squyres wrote:
UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')`
Not sure what you want to achieve here... 'uname -X' is valid on
Solaris, but not on Linux. The OP has indicated already that he is
running this on Linux (SLES) so the above line is supposed t
I think that the missing configure option might be the problem as
well. The BLCR configure logic checks to see if you have enabled
checkpoint/restart in Open MPI. If you haven't then it fails out of
configure (probably should print a better error message - I'll put
that on my todo list).
Rolf Vandevaart writes:
>> However, I found that if I explicitly specify the "-machinefile
>> $TMPDIR/machines", all 8 mpi processes were spawned within a single
>> node, i.e. node0002.
I had that sort of behaviour recently when the tight integration was
broken on the installation we'd been give
M C writes:
> --- MCA component crs:blcr (m4 configuration macro)
> checking for MCA component crs:blcr compile mode... dso
> checking --with-blcr value... sanity check ok (/opt/blcr)
> checking --with-blcr-libdir value... sanity check ok (/opt/blcr/lib)
> configure: WARNING: BLCR support request
On Tue, Mar 31, 2009 at 01:37:22PM -0400, Jeff Squyres wrote:
> On Mar 31, 2009, at 1:31 PM, Terry Dontje wrote:
>
> >Can you manually run UNAME_REL=`(/bin/uname -X|grep Release|sed -e
> >'s/.*= //')` in your shell without error?
> >
>
> Better would be to put this small script by itself:
>
> #!
On Tue, Mar 31, 2009 at 01:31:12PM -0400, Terry Dontje wrote:
> I was talking with Jeff Squyres about your issue and he thinks the
> config.guess issue needs to be resolved first, even though your
> specification of x86_64 seems to get you by.
>
> So, do you still see the unexpected "(" if you t
On Mar 31, 2009, at 1:31 PM, Terry Dontje wrote:
Can you manually run UNAME_REL=`(/bin/uname -X|grep Release|sed -e
's/.*= //')` in your shell without error?
Better would be to put this small script by itself:
#! /bin/sh
UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')`
echo got $UN
On 03/31/09 11:43, PN wrote:
Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$
I was talking with Jeff Squyres about your issue and he thinks the
config.guess issue needs to be resolved first, even though your
specification of x86_64 seems to get you by.
So, do you still see the unexpected "(" if you try and run
config/config.guess directly? The original issue IIRC was
Hi guys,
This is my first foray into the world of OpenMPI (MPICH 1, 2 and LAM so far),
and I'm keen to test checkpointing using the BLCR kernel modules. I get the
BLCR components to build just fine (v0.8.1), but the OpenMPI build fails with:
% ./configure --with-blcr=/opt/blcr --with-blcr-lib
Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/
Hi,
unfortunatelly it's up to us to provide the starting address of the
buffer and the number of elements to be received multiplied by the
datatype extent.
This kind of things is dealt automatically in the internals of
collective communication operations.
Massimo
On 31/mar/09, at 14:00,
Thanks Massimo,
now it works well.
I've erroneous think that Irecv did this automatically using recvtype fields.
2009/3/31 Massimo Cafaro :
> Hi,
>
> let me see that it is still not clear to me why you want to reimplement the
> MPI_Gather supplied by an MPI implementation with your own version.
>
Hi,
let me see that it is still not clear to me why you want to
reimplement the MPI_Gather supplied by an MPI implementation with your
own version.
You will never be able to attain the same level of performance using
point to point communication, since MPI_Gather uses internally a
binomia
Mm,
OpenMPI functions like MPI_Irecv, does pointer arithmetics over recv
buffer using type info in ompi_datatype_t i suppose. I'm trying to
write a wrapper to MPI_Gather using Irecv functions:
int MPI_FT_Gather(void*sendbuf, int sendcount, MPI_Datatype sendtype,
void*recvbuff,
Hi Jeff,
Yes I've installed LSF and the liblsf and libbat are found by the configure
how you can see in the previous attach and here:
/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib
-rw-r--r-- 1 root 10007 1771182 Sep 24 2008 libbat.a
-rw-r--r-- 1 root 10007 31278 Nov 23 2007 libbat.jsdl.a
-rwxr-xr-x 1
32 matches
Mail list logo