Back to this problem.
The last suggestion was to upgrade to 1.3.3, which has been done. Still cannot
get this code to
run in 64 bit mode with torque. What I can do is run the job in l6 bit mode
using a hostfile.
Specifically, if I use
qsub -I -l nodes=2:ppn=1 torque allocates two nodes to the
Hi Allen,
The invalid reads come from line 30 and 31 of your code, and I guess
they are the two 'printf's before MPI_Wait.
In Open MPI, when memchecker is enabled, OMPI marks the receive buffer
as invalid internally, immediately after receive starts for MPI semantic
checks, in this case, it
Hello,
Trying to link OpenMPI-1.3.3 with PGI 9.0-1 and got the following error :
# ./configure --prefix=/opt/ofed/mpi/pgi/openmpi-1.3.3 --with-openib=/opt/ofed
FC=pgf95 CC=gcc CXX=g++
# make
[...]
libtool: link: pgf95 -shared -fpic -Mnomain .libs/mpi.o .libs/mpi_sizeof.o
.libs/mpi_comm_spaw
We use Torque with OMPI here on almost every cluster, running 64-bit
jobs with the Intel compilers, so I doubt the problem is with Torque.
It is probably an issue with library paths.
Torque doesn't automatically forward your environment, nor does it
execute your remote .bashrc (or equivalen
Hi Shiqing:
That is very clever to invalidate the buffer memory until the comm
completes! However, I guess I'm still confused by my results. Lines 30
and 31 identified by valgrind are the lines after the Wait, and, if I
comment out the prints before the Wait, I still get the valgrind errors
on the
Hello *,
I would like to understand in more detail how much time some collective
communication calls really spend waiting for the last process to enter. I
know this can be done by logging entry times for each process, but I
wonder if there is a better and more efficient way.
The peruse interface
Hello Manfred,
this is more a MPI-Standardization question, Open MPI happens to be (the
only?) one implementation providing Peruse,
While there are people using Peruse event tracing to collect information on
collectives in Open MPI, however, these are not in trunk.
The specification itself has n
Manfred Muecke wrote:
I would like to understand in more detail how much time some collective
communication calls really spend waiting for the last process to enter. I
know this can be done by logging entry times for each process, but I
wonder if there is a better and more efficient way.
"Bette
So pushing this along a little more
running with openmpi-1.3 svn rev 20295
mpirun -np 2
-mca btl sm,self
-mca mpi_paffinity_alone 1
-mca mpi_leave_pinned 1
-mca btl_sm_eager_limit 8192
$PWD/IMB-MPI1 pingpong
Yields ~390MB/sec
So we're getting there, but still only about half speed
On
Hi Jalel, list
This is a libtool problem, I was told.
I had the same problem with PGI 8.0-4 and OpenMPI 1.2.8 to 1.3.2
(I haven't tried 1.3.3. yet).
From what you say, apparently the problem is still there on OpenMPI
1.3.3, PGI 9.0-1, and whatever libtool you have in your system.
The workarou
Dear OpenMPI developers,
referred to the follow problem:
http://openmpi.igor.onlinedirect.bg/faq/?category=troubleshooting#parallel-debugger-attach
me and Cristiano Calonaci have compiled openmpi 1.3.3 with intel 11
and runs an example under Totalview 8.6.
The problem below we solved settings th
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote:
Did you wipe off the old directories before reinstalling?
Check.
I prefer to install on a NFS mounted directory,
Check
Have you tried to ssh from node to node on all possible pairs?
check - fixed this today, works fine with the spawni
I believe TCP works fine, Jody, as it is used on Macs fairly widely. I
suspect this is something funny about your installation.
One thing I have found is that you can get this error message when you have
multiple NICs installed, each with a different subnet, and the procs try to
connect across dif
Hi Allen,
Sorry for the confusion, your application doesn't use non-blocking
communications, so the receive buffers are still valid after you call
MPI_Recv_init, that's why the first two printf didn't complain. But in
MPI_Wait, it still checks the buffer, and make it invalid after packing
the
Sorry, I don't understand what you want me to do. I assume you want me to run
the app on n296 as
rank 0 and run the app on n298 as rank 1, but I don't know how to do that
outside of either torque
or mpirun -hostfile
Jim
P.S. O tried -x LD_LIBRARY_PATH and it doesn't work.
___
Hi Ralph,
That gives me something more to work with...
On Aug 12, 2009, at 9:44 AM, Ralph Castain wrote:
I believe TCP works fine, Jody, as it is used on Macs fairly widely.
I suspect this is something funny about your installation.
One thing I have found is that you can get this error me
Well, it is getting better! :-)
On your cmd line, what btl's are you specifying? You should try -mca btl
sm,tcp,self for this to work. Reason: sometimes systems block tcp loopback
on the node. What I see below indicates that inter-node comm was fine, but
the two procs that share a node couldn't co
HI,
I want to configure OPENMPI to checkpoint MPI applications using DMTCP. Does
anyone know how to specify the path to the DMTCP application when installing
OPENMPI.
Also, I wanted to use OPENMPI with SELF instead of BLCR. Is there any guide for
setting up OPENMPI with SELF?
Thanks a lot.
On Aug 12, 2009, at 12:31 PM, Ralph Castain wrote:
Well, it is getting better! :-)
On your cmd line, what btl's are you specifying? You should try -mca
btl sm,tcp,self for this to work. Reason: sometimes systems block
tcp loopback on the node. What I see below indicates that inter-node
On Aug 12, 2009, at 12:46 PM, Jody Klymak wrote:
So I think ranks 0 and 2 are on xserve02 and rank 1 is on xserve01,
Should read xserve03,
--
Jody Klymak
http://web.uvic.ca/~jklymak/
Update:
No, still it ain't work.
I have been trying setups with different env. variables like OPAL_PREFIX
but it just gets me the same error all over again.
Also I've been trying to compile the package but I didn't even get over
configure script. I got stuck with configure being unable to compute
Hi Jody
Jody Klymak wrote:
On Aug 11, 2009, at 18:55 PM, Gus Correa wrote:
Did you wipe off the old directories before reinstalling?
Check.
I prefer to install on a NFS mounted directory,
Check
Have you tried to ssh from node to node on all possible pairs?
check - fixed this toda
If I use -mca orte_launch_agent /home/kenneth/info/openmpi/install/bin/orted,
I get an error:
...
bash: -c: line 0: `( test ! -r ./.profile || . ./.profile;
PATH=/home/kenneth/info/openmpi/install/bin:$PATH ; export PATH ;
LD_LIBRARY_PATH=/home/kenneth/info/openmpi/install/lib:$LD_LIBRARY_PATH
This is using 1.3.3, devel trunk, ...??
I doubt anyone has really tested it in a long time as everyone just
uses the default orted - are you just trying to see if it works, or
are you trying your own orted out?
On Aug 12, 2009, at 4:04 PM, Kenneth Yoshimoto wrote:
If I use -mca orte_lau
This is 1.3.3. I would like to specify the path to orted on
different sets of nodes.
Thanks,
Kenneth
On Wed, 12 Aug 2009, Ralph Castain wrote:
Date: Wed, 12 Aug 2009 17:03:17 -0600
From: Ralph Castain
To: Kenneth Yoshimoto , Open MPI Users
Subject: Re: [OMPI users] orte_launch_agent usage?
Okay - let me debug this. It is likely broken, but I can get the fix
into 1.3.4 (probably coming out fairly soon)
Will update shortly.
On Aug 12, 2009, at 6:26 PM, Kenneth Yoshimoto wrote:
This is 1.3.3. I would like to specify the path to orted on
different sets of nodes.
Thanks,
Kenneth
Hmmm...well, I'm going to ask our TCP friends for some help here.
Meantime, I do see one thing that stands out. Port 4 is an awfully low
port number that usually sits in the reserved range. I checked the /
etc/services file on my Mac, and it was commented out as unassigned,
which should mean
Hi:
I recently tried to build my MPI application against OpenMPI 1.3.3. It
worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way
through. It does a fair amount of comm, but eventually it stops in a
Send/Recv point-to-point exchange. If I turn off the openib btl, it runs
to completion.
28 matches
Mail list logo