On Mon, Feb 23, 2015 at 4:37 PM, Joshua Ladd wrote:
> Nathan,
>
> I do, but the hang comes later on. It looks like it's a situation where
> the root is way, way faster than the children and he's inducing an an
> overrun in the unexpected message queue. I think the queue is set to just
> keep grow
Nathan,
I do, but the hang comes later on. It looks like it's a situation where the
root is way, way faster than the children and he's inducing an an overrun
in the unexpected message queue. I think the queue is set to just keep
growing and it eventually blows up the memory??
$/hpc/mtl_scrap/user
Josh, do you see a hang when using vader? It is preferred over the old
sm btl.
-Nathan
On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>Sachin,
>
>I am able to reproduce something funny. Looks like your issue. When I run
>on a single host with two ranks, the test works
Sachin,
I am able to reproduce something funny. Looks like your issue. When I run
on a single host with two ranks, the test works fine. However, when I try
three or more, it looks like only the root, rank 0, is making any progress
after the first iteration.
$/hpc/mtl_scrap/users/joshual/openmpi-1
George,
I was able to run the code without any errors in an older version of
OpenMPI in another machine. It looks like some problem with my machine like
Josh pointed out.
Adding --mca coll tuned or basic to the mpirun command resulted in an
MPI_Init failed error with the following additional inf
Sachin,
I cant replicate your issue neither with the latest 1.8 nor with the trunk.
I tried using a single host, while forcing SM and then TP to no avail.
Can you try restricting the collective modules in use (adding --mca coll
tuned,basic) to your mpirun command?
George.
On Fri, Feb 20, 201
Josh,
Thanks for the help.
I'm running on a single host. How do I confirm that it is an issue with the
shared memory?
Sachin
On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd wrote:
> Sachin,
>
> Are you running this on a single host or across multiple hosts (i.e. are
> you communicating between p
Sachin,
Are you running this on a single host or across multiple hosts (i.e. are
you communicating between processes via networking.) If it's on a single
host, then it might be an issue with shared memory.
Josh
On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan wrote:
> Hello Josh,
>
> The comma
Hello Josh,
The command i use to compile the code is:
mpicc bcast_loop.c
To run the code I use:
mpirun -np 2 ./a.out
Output is unpredictable. It gets stuck at different places.
Im attaching lstopo and ompi_info outputs. Do you need any other info?
lstopo-no-graphics output:
Machine (3433M
Sachin,
Can you, please, provide a command line? Additional information about your
system could be helpful also.
Josh
On Wed, Feb 18, 2015 at 3:43 AM, Sachin Krishnan wrote:
> Hello,
>
> I am new to MPI and also this list.
> I wrote an MPI code with several MPI_Bcast calls in a loop. My code w
Hello,
I am new to MPI and also this list.
I wrote an MPI code with several MPI_Bcast calls in a loop. My code was
getting stuck at random points, ie it was not systematic. After a few hours
of debugging and googling, I found that the issue may be with the several
MPI_Bcast calls in a loop.
I stu
11 matches
Mail list logo