Hi Jeff,
Thanks. Did you see my followup?
The code is all written the same, e.g. SPMD.
And it is just when I use a binary built at my institution
and run it at another.
Thx....John
On 2/16/16 6:35 AM, Jeff Squyres (jsquyres) wrote:
John --
+1 on what Gilles said.
The initial error says that a broadcast message was truncated. This likely
indicates that someone is calling MPI_Bcast with a different size than its
peers (it *could* indicate what Giles mentioned about
different-but-supposed-to-be-compatible-datatypes, but more often than not,
it's a simple accounting error in message lengths).
Also, as a sidenote: I notice you're running Open MPI 1.6.5. That's pretty
ancient. Any chance you can upgrade to something more modern, like Open MPI
1.10.x?
On February 15, 2016 at 7:04:15 PM, Gilles Gouaillardet (gil...@rist.or.jp)
wrote:
John,
the readv error is likely a consequence of the abort, and not the root
cause of the issue.
an obvious user error is if not all MPI tasks MPI_Bcast with non
compatible signatures.
coll/tuned module is known to be broken when using different but
compatible signatures.
for example, one process MPI_Bcast one vector of N MPI_DOUBLE, and one
other process MPI_Bcast N MPI_DOUBLE.
you can try to
mpirun --mca coll ^tuned ...
and see if it helps
fwiw, OpenMPI 1.6.5 is quite old nowadays...
Cheers,
Gilles
On 2/16/2016 7:28 AM, JR Cary wrote:
We have distributed a binary to a person with a Linux cluster. When
he runs our binary, he gets
[server1:10978] *** An error occurred in MPI_Bcast
[server1:10978] *** on communicator MPI COMMUNICATOR 8 DUP FROM 7
[server1:10978] *** MPI_ERR_TRUNCATE: message truncated
[server1:10978] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[server2][[14125,1],2][/..../openmpi-1.6.5/ompi/mca/btl/tcp/btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
Anyone have any ideas on how to debug this?
Thanks......John Cary
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28534.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28535.php
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28538.php