Did you tell it --bind-to-core? If not, then the procs would be unbound to any
particular core - so your code might well think they are "sharing" cores.
On Apr 24, 2012, at 4:46 PM, Kyle Boe wrote:
> Right, I tried using a hostfile, and it made no difference. This is running
> OpenMPI 1.4.4 on
Hi Ralph,
Yes, you are absolutely correct. A user can suppress the warning, however, by
simply setting shmem_mmap_enable_nfs_warning to 0.
For what it's worth, I just verified that the warning shows itself on Panasas
and NFS. Looks like Lustre and GPFS will behave similarly.
Sam
On Apr 24, 2
Right, I tried using a hostfile, and it made no difference. This is running
OpenMPI 1.4.4 on CentOS 5.x machines. The original issue was an error trap
built into my code, where it said one of the cores was asking for
information it already owned. I'm sorry to be vague, but I can't share
anything fr
I thought we had code in the 1.5 series that would "bark" if the tmp dir was on
a network mount? Is that not true?
On Apr 24, 2012, at 3:20 PM, Gutierrez, Samuel K wrote:
> Hi,
>
> I just wanted to record the behind the scenes resolution to this particular
> issue. For more info, take a look
You don't need a hostfile to run multiple procs on the localhost.
What version of OMPI are you using? What was the original issue?
On Apr 24, 2012, at 4:07 PM, Jingcha Joba wrote:
> Try using slots in hostfile ?
>
> --
> Sent from my iPhone
>
> On Apr 24, 2012, at 2:52 PM, Kyle Boe wrote:
>
Try using slots in hostfile ?
--
Sent from my iPhone
On Apr 24, 2012, at 2:52 PM, Kyle Boe wrote:
> I'm having a problem trying to use OpenMPI on some multicore machines I have.
> The code I am running was giving me errors which suggested that MPI was
> assigning multiple processes to the sam
I'm having a problem trying to use OpenMPI on some multicore machines I
have. The code I am running was giving me errors which suggested that MPI
was assigning multiple processes to the same core (which I do not want).
So, I tried launching my job using the -nooversubscribe option, and I get
this e
Hi,
I just wanted to record the behind the scenes resolution to this particular
issue. For more info, take a look at:
https://svn.open-mpi.org/trac/ompi/ticket/3076
It seems as if the problem stems from /tmp being mounted as an NFS space that
is shared between the compute nodes.
This problem
On Apr 24, 2012, at 3:33 PM, Tom Rosmond wrote:
> Yes, I would be interested in such a plugin. But be advised that I am
> strictly a fortran programmer, so if it requires any C/C++ talent, I
> would be in trouble. So maybe, before jumping into that, I would like
> to be able to look at what proc
To throw in my $0.02, though it is worth less.
Were you running this on verb based infiniband?
We see a problem that we have a work around for even with the newest 1.4.5 only
on IB, we can reproduce it with IMB. You can find an old thread from me about
it. Your problem might not be the same.
Will do. My machine is currently quite busy, so it will be a while
before I get answers. Stay tuned.
T. Rosmond
On Tue, 2012-04-24 at 13:36 -0600, Ralph Castain wrote:
> Add --display-map to your mpirun cmd line
>
> On Apr 24, 2012, at 1:33 PM, Tom Rosmond wrote:
>
> > Jeff,
> >
> > Yes, I
Add --display-map to your mpirun cmd line
On Apr 24, 2012, at 1:33 PM, Tom Rosmond wrote:
> Jeff,
>
> Yes, I would be interested in such a plugin. But be advised that I am
> strictly a fortran programmer, so if it requires any C/C++ talent, I
> would be in trouble. So maybe, before jumping int
Jeff,
Yes, I would be interested in such a plugin. But be advised that I am
strictly a fortran programmer, so if it requires any C/C++ talent, I
would be in trouble. So maybe, before jumping into that, I would like
to be able to look at what processor/node mapping Open-mpi is actually
giving me.
That's very odd, indeed -- it's listed as being inside MPI_INIT, but we don't
get any further details from there. :-\
Any chance you could try upgrading to OMPI 1.4.5 and/or 1.5.5?
On Apr 24, 2012, at 1:57 PM, Jeffrey A Cummings wrote:
> I've been having an intermittent failure during MPI init
On Apr 24, 2012, at 3:01 PM, Tom Rosmond wrote:
> My question is this: If the cartesian mapping is done so the two
> spacial dimensions are the 'most rapidly varying' in equivalent 1-D
> processor mapping, will Open-mpi automatically assign those 2 dimensions
> 'on-node', and assign the 'ensemble
Could you repeat your tests with 1.4.5 and/or 1.5.5?
On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote:
> Hi,
>
> I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3).
> An strace of one of the processes shows:
>
> Process 10925 attached with 3 threads - interrupt to quit
> [pi
We have a large ensemble-based atmospheric data assimilation system that
does a 3-D cartesian partitioning of the 'domain' using MPI_DIMS_CREATE,
MPI_CART_CREATE, etc. Two of the dimensions are spacial, i.e. latitude
and longitude; the third is an 'ensemble' dimension, across which
subsets of ense
The ~/.openmpi/mca-params.conf file should contain the same
information on all nodes.
You can install Open MPI as root. However, we do not recommend that
you run Open MPI as root.
If the user $HOME directory is NFS mounted, then you can use an NFS
mounted directory to store your files. With this
Hi,
I ran those cmd's and have posted the outputs on:
https://svn.open-mpi.org/trac/ompi/ticket/3076
-mca shmem posix worked for all -np (even when oversubscribing), however
sysv did not work for any -np.
On Tue, Apr 24, 2012 at 5:36 PM, Gutierrez, Samuel K wrote:
> Hi,
>
> Just out of curios
Hi Jeffrey
Assuming you are on Linux,
a frequent cause of out-of-nowhere segfaults
is a limited/small stack size.
They can happen if you [ab]use big automatic arrays, etc.
You can set the stacksize bigger/unlimited
with the ulimit/limit command,
or edit the /etc/security/limits.conf.
Of course,
I've been having an intermittent failure during MPI initialization (v
1.4.3) for several months. It comes and goes as I make changes to my
application, that is changes unrelated to MPI calls. Even when I have a
version of my app which shows the problem, it doesn't happen on every
submittal.
Hi ,Thank you For your reply.
I have some problem:
Q1: I setting 2 kinds mac.para.conf
(1) crs_base_snapshot_dir=/root/kidd_openMPI/Tmp
snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints
My Master : /root/kidd_openMPI is My opempi-Installed Dir ,
Hi,
Just out of curiosity, what happens when you add
-mca shmem posix
to your mpirun command line using 1.5.5?
Can you also please try:
-mca shmem sysv
I'm shooting in the dark here, but I want to make sure that the failure isn't
due to a small backing store.
Thanks,
Sam
On Apr 16, 2012,
On Tue, Apr 24, 2012 at 10:10 AM, kidd wrote:
> Hi ,Thank you For your reply.
> but I still failed. I must add -x LD_LIBRARY_PATH
> this is my All Setting ;
> 1) Master-Node(cuda07) & Slaves Node(cuda08) :
> Configure:
> ./configure --prefix=/root/kidd_openMPI --with-ft=cr
> --enable-f
Hi ,Thank you For your reply.
but I still failed. I must add -x LD_LIBRARY_PATH
this is my All Setting ;
1) Master-Node(cuda07) & Slaves Node(cuda08) :
Configure:
./configure --prefix=/root/kidd_openMPI --with-ft=cr --enable-ft-thread
--with-blcr=/usr/local/BLCR
--with-blcr-
I am not sure about everything that is going wrong, but there are at least two
issues I found.
First, you are skipping the first line that you read from integers.txt. Maybe
something like this instead.
while(fgets(line, sizeof line, fp)!= NULL){
sscanf(line,"%d",&data[k]);
sum = sum +
It looks like you are using LAM/MPI. This list is for supporting Open MPI, a
wholly different MPI software implementation. However, speaking as one of the
core LAM/MPI developers, I'll tell you that you should uninstall LAM and
install Open MPI install. We abandoned LAM/MPI several years ago.
On 4/24/2012 6:19 AM, Syed Ahsan Ali wrote:
I am not familiar with attaching debugger to the processes. Other
things you asked are as follows:
The easiest is to get Totalview or Allinea (both are parallel debuggers)
and attach them to the job. However they cost. Another is to try padb,
look
I am not familiar with attaching debugger to the processes. Other things
you asked are as follows:
Is this the first time you've ran it (with Open MPI? with any MPI?) *No
We have been running this and other models but this problem has arised now
* How many processes is the job using? Are you o
To determine if an MPI process is waiting for a message do what Rayson
suggested and attach a debugger to the processes and see if any of them
are stuck in MPI. Either internally in a MPI_Recv or MPI_Wait call or
looping on a MPI_Test call.
Other things to consider.
Is this the first time y
Hi,
I have installed MPI and when i tried to run MPI parallelly on all the
nodes, that while MPI is looking to establish connection i have been
getting the following error
"*ERROR: LAM/MPI unexpectedly received the following on stderr:Permission
denied (publickey,gssapi-with-mic)." so any one could
I am combining mpi and cuda. Trying to find out sum of array elements
using cuda and using mpi to distribute the array.
my cuda code
#include
__global__ void add(int *devarray, int *devsum)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
*devsum = *devsum + devarray[index]
Dear Rayson,
That is a Nuemrical model that is written by National weather service of a
country. The logs of the model show every detail about the simulation
progress. I have checked on the remote nodes as well the application binary
is running but the logs show no progress, it is just waiting at
Seems like there's a bug in the application. Did you or someone else
write it, or did you get it from an ISV??
You can log onto one of the nodes, attach a debugger, and see if the
MPI task is waiting for a message (looping in one of the MPI receive
functions)...
Rayson
==
Dear All,
I am having problem with running an application on Dell cluster . The model
starts well but no further progress is shown. It just stuck. I have checked
the systems, no apparent hardware error is there. Other open mpi
applications are running well on the same cluster. I have tried running
35 matches
Mail list logo