Dear Yann
Here is the output
*[root@compute-01-01 ~]# cat /etc/redhat-release*
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
*[root@compute-01-01 ~]# uname -a*
Linux compute-01-01.private.dns.zone 2.6.18-128.el5 #1 SMP Wed Dec 17
11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
*[root@com
Hello,
I am trying to confirm that I am using OpenMPI in a correct way. I
seem to be losing messages but I don't like to assume there's a bug
when I'm still new to MPI in general.
I have multiple processes in a master / slaves type setup, and I am
trying to have multiple persistent non-blocking m
w00t :-)
Thanks
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985
On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote:
> HmmmI'll see what I can do about the error message. I don't think there
> is much in 1.6 I can do, but in 1.7 I could generate an
HmmmI'll see what I can do about the error message. I don't think there is
much in 1.6 I can do, but in 1.7 I could generate an appropriate error message
as we have a way to check the topologies.
On Dec 20, 2012, at 7:11 AM, Brock Palen wrote:
> Ralph,
>
> Thanks for the info,
> That sai
Ralph,
Thanks for the info,
That said I found the problem, one of the new nodes, had Hyperthreading on, and
the rest didn't so all the nodes didn't match. A quick
pdsh lstopo | dshbak -c
Uncovered the one different node. The error just didn't give me a clue to that
being the cause, which
Simon,
The goal of any MPI implementation is to be as fast as possible.
Unfortunately there is no "one size fits all" algorithm that works on all
networks and given all possible kind of peculiarities that your specific
communication scheme may have. That's why there are different algorithms and
yo
Glad you got it resolved!
On Dec 18, 2012, at 8:53 PM, Kumar, Sudhir wrote:
> Hi
> The error is resolved. The solution was actually in a previous post.
> http://www.open-mpi.org/community/lists/users/2011/03/15954.php
>
>
>
> -Original Message-
> From: Kumar, Sudhir
> Sent: Tuesday, D
On Dec 19, 2012, at 11:26 AM, Handerson, Steven wrote:
> I fixed the problem we were experiencing by adding a barrier.
> The bug occurred between a piece of code that uses (many, over a loop) SEND
> (from the leader)
> and RECV (in the worker processes) to ship data to the
> processing nodes fro