I am trying to run WRF on 1024 cores with OpenMPI 1.3.3 and
1.4. I can get the code to run with 512 cores, but it crashes
at startup on 1024 cores. I am getting the following error message:
[n172][[43536,1],0][connect/btl_openib_connect_oob.c:463:qp_create_one] error
creating qp errno says Can
On Dec 17, 2009, at 5:55 PM, wrote:
> I am happy to be able to inform you that the problems we were
> seeing would seem to have been arising down at the OpenMPI
> level.
Happy for *them*, at least. ;-)
> If I remove any acknowledgement of IPv6 within the OpenMPI
> code, then both the PETSc exa
A whole swathe of people have been made aware of the issues
that have arisen as a result of a researcher here looking to
run PISM, which sits on top of PETSc, which sits on top of
OpenMPI.
I am happy to be able to inform you that the problems we were
seeing would seem to have been arising down at
That would be great!
On Dec 17, 2009, at 3:52 PM, Ralph Castain wrote:
> If it would help, I have time and am willing to add notifier calls to this
> area of the code base. You'll still get the errors shown here as I always
> bury the notifier call behind the error check that surrounds these er
If it would help, I have time and am willing to add notifier calls to this area
of the code base. You'll still get the errors shown here as I always bury the
notifier call behind the error check that surrounds these error messages to
avoid impacting the critical path, but you would be able to gt
Will be in the 1.4 nightly tarball generated later tonight...
Thanks again
Ralph
On Dec 17, 2009, at 4:07 AM, Marcia Cristina Cera wrote:
> very good news
> I will wait carefully for the release :)
>
> Thanks, Ralph
> márcia.
>
> On Wed, Dec 16, 2009 at 10:56 PM, Ralph Castain wrote:
> A
Ok, I'll give it a try.
Thanks, nick
On Thu, Dec 17, 2009 at 12:44, Ralph Castain wrote:
> In case you missed it, this patch should be in the 1.4 nightly tarballs -
> feel free to test and let me know what you find.
>
> Thanks
> Ralph
>
> On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote:
>
> Th
Simon Su writes:
> Hi Tom,
>
> "hello world" MPI program also won't compile if
> librdmacm-devel-1.0.8-5.el5 is not installed. I have asked the person
> who maintain the openmpi package on how they were compiled. My guess
> is librdmacm-devel-1.0.8-5.el5 may need to be added as dependency
> packa
In case you missed it, this patch should be in the 1.4 nightly tarballs - feel
free to test and let me know what you find.
Thanks
Ralph
On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote:
> That was quick. I will try the patch as soon as you release it.
>
> nick
>
>
> On Wed, Dec 2, 2009 at 21:
Hi Min,
I've had a chance to use the openmpi-mpirun script.
Though unfortunately I do not have openmpi available at the moment
(customer where I'm at runs RHEL4 with LAM/MPI) I've been able to see
what quotes need to be used and this one seems to work on my end:
bsub -e ERR -o OUT -n 16 './open
Hi, Jeroen,
Thanks a lot. Unfortunately I don't think I have got mpirun.lsf. The Dell
company only asked us to use openmpi-mpirun as a wrapper script.
Cheers,
Min Zhu
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Jeroen Kleijer
S
Hi Min,
Sorry for the mixup but it's been a while since I've actually used LSF.
I've had a look at my notes and to use mpirun with LSF you should give
something like this:
bsub -a openmpi -n 16 "mpirun.lsf -x PATH -x LD_LIBRARY_PATH -x
MPI_BUFFER_SIZE \" ulimit -s unlimited ; ./wrf.exe \" "
(a
Hi, Ashley,
I changed the content in openmpi-mpirun script you mentioned but wrf.exe
still not executed.
Cheers,
Min Zhu
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Ashley Pittman
Sent: 17 December 2009 16:39
To: Open MPI Users
Su
Hi,
This time the OUT file is
Sender: LSF System
Subject: Job 667: Exited
Job was
submitted from host by user .
Job was executed on host(s) <8*compute-01>, in queue , as user .
<8*compute-12>
was used as the home directory.
was u
On Thu, 2009-12-17 at 14:40 +, Min Zhu wrote:
> Here is the content of openmpi-mpirun file, so maybe something needs to
> be changed?
> if [ x"${LSB_JOBFILENAME}" = x -o x"${LSB_HOSTS}" = x ]; then
> usage
> exit -1
> fi
>
> MYARGS=$*
Shouldn't this be MYARGS=$@ It'll change the way
Hi Min,
Seems like the command ulimit was executed 16 times but after that
(the "; ./wrf.exe") was ignored.
Could you give the following a try:
bsub -e ERR -o OUT -n 16 "openmpi-mpirun \"/bin/sh -c 'ulimit -s
unlimited ; ./wrf.exe ' \" "
Somewhere, quoting goes wrong and I'm trying to figure out
Hi, Jeroen,
Here is the OUT file, ERR file is empty.
--
Sender: LSF System
Subject: Job 662: Done
Job was
submitted from host by user .
Job was executed on host(s) <8*compute-10>, in queue , as user .
Hi Min
Did you get any type of error message in the ERR or OUT files?
I don't have mpirun installed in the environment at the moment but
giving the following:
bsub -q interq -I "ssh /bin/sh -c 'ulimit -s unlimited ;
/bin/hostname ' "
seems to work for me, so I'm kind of curious what the error m
Hi, Jeroen,
Thanks for your reply. I tried the command bsub -e ERR -o OUT -n 16
"openmpi-mpirun /bin/sh -c 'ulimit -s unlimited; ./wrf.exe ' " and wrf.exe not
executed.
Cheers,
Min Zhu
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
It's just that the "'s on the command line get parsed by LSF / bash
(or whatever shell you use)
If you wish to use it without the script you can give this a try:
bsub -e ERR -o OUT -n 16 "openmpi-mpirun /bin/sh -c 'ulimit -s
unlimited; ./wrf.exe ' "
This causes to pass the whole string "openmpi-m
Hi, Jeff,
Your script method works for me. Thank you very much,
Cheers,
Min Zhu
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: 17 December 2009 14:56
To: Open MPI Users
Subject: Re: [OMPI users] About openmpi-mpir
On Dec 17, 2009, at 3:54 AM, jody wrote:
> yeah, know that you mention it, i remember (old brain here, as well)
> But IIRC you created a OMPI version which was called 1.4a1r or something,
> where i indeed could use this xterm. When i updated to 1.3.2, i sort
> of forgot about it again...
The 1.4
This might be something you need to talk to Platform about...?
Another option would be to openmpi-mpirun a script that is just a few lines
long:
#!/bin/sh
ulimit -s unlimited
./wrf.exe
On Dec 17, 2009, at 9:40 AM, Min Zhu wrote:
> Hi, Jeff,
>
> Thanks. For bsub -e ERR -o OUT -n 16 openmpi-m
Hi, Jeff,
Thanks. For bsub -e ERR -o OUT -n 16 openmpi-mpirun /bin/sh -c "ulimit
-s unlimited; ./wrf.exe", I tried and wrf.exe doesn't executed.
Here is the content of openmpi-mpirun file, so maybe something needs to
be changed?
--
#!/bin/sh
#
# Copyr
On Dec 17, 2009, at 9:15 AM, Min Zhu wrote:
> Thanks for your reply. Yes, your mpirun command works for me. But I need to
> use bsub job scheduler. I wonder why
> bsub -e ERR -o OUT -n 16 openmpi-mpirun "/bin/sh -c ulimit -s unlimited;
> ./wrf.exe" doesn't work.
Try with different quoting...?
Hi,
Thanks for your reply. Yes, your mpirun command works for me. But I need to use
bsub job scheduler. I wonder why
bsub -e ERR -o OUT -n 16 openmpi-mpirun "/bin/sh -c ulimit -s unlimited;
./wrf.exe" doesn't work.
Cheers,
Min Zhu
-Original Message-
From: users-boun...@open-mpi.org
Min Zhu a écrit :
Hi,
I excuted
bsub -e ERR -o OUT -n 16 openmpi-mpirun "/bin/sh -c ulimit -s unlimited; ./wrf.exe"
Hello,
Here we use :
mpirun /bin/tcsh -c " limit stacksize unlimited ; /full/path/to/wrf.exe"
Regards,
R. David
Hi,
I excuted
bsub -e ERR -o OUT -n 16 openmpi-mpirun "/bin/sh -c ulimit -s unlimited;
./wrf.exe"
Here is the results, it seems that wrf.exe didn't run. ERR file is empty.
Sender: LSF System
Subject: Job 647:
Done
Job was subm
Min Zhu a écrit :
Dear all,
I have a question to ask you. If I issue a command "bsub -n 16
openmpi-mpirun ./wrf.exe" to my 128-core (16 nodes)cluster, the job
failed to run due to stacksize problem. Openmpi-mpirun is a wrapper
script written by Platform. If I want to add '/bin/sh -c ulimit -s
Dear all,
I have a question to ask you. If I issue a command "bsub -n 16
openmpi-mpirun ./wrf.exe" to my 128-core (16 nodes)cluster, the job
failed to run due to stacksize problem. Openmpi-mpirun is a wrapper
script written by Platform. If I want to add '/bin/sh -c ulimit -s
unlimited' to the abov
Hi all
I have a doubt. I'm starting using MPI-IO and was wondering if I can use
the MPI_BUFFER_ATTACH to provide the necessary IO buffer (or will it use
the array I'm passing the MPI_Write...??)
many thanks,
Ricardo Reis
'Non Serviam'
PhD candidate @ Lasef
Computational Fluid Dynam
very good news
I will wait carefully for the release :)
Thanks, Ralph
márcia.
On Wed, Dec 16, 2009 at 10:56 PM, Ralph Castain wrote:
> Ah crumb - I found the problem. Sigh.
>
> I actually fixed this in the trunk over 5 months ago when the problem first
> surfaced in my own testing, but it n
yeah, know that you mention it, i remember (old brain here, as well)
But IIRC you created a OMPI version which was called 1.4a1r or something,
where i indeed could use this xterm. When i updated to 1.3.2, i sort
of forgot about it again...
Another question though:
You said "If it includes the -xte
> You could confirm that it is the IPv6 loop by simply disabling IPv6
> support - configure with --disable-ipv6 and see if you still get the error
> messages
>
> Thanks for continuing to pursue this!
> Ralph
>
Yeah, but if you disable the IPv6 stuff then there's a completely
different path taken
34 matches
Mail list logo