John,
the readv error is likely a consequence of the abort, and not the root
cause of the issue.
an obvious user error is if not all MPI tasks MPI_Bcast with non
compatible signatures.
coll/tuned module is known to be broken when using different but
compatible signatures.
for example, one
We have distributed a binary to a person with a Linux cluster. When
he runs our binary, he gets
[server1:10978] *** An error occurred in MPI_Bcast
[server1:10978] *** on communicator MPI COMMUNICATOR 8 DUP FROM 7
[server1:10978] *** MPI_ERR_TRUNCATE: message truncated
[server1:10978] *** MPI_ERRO
Yes, that is correct: the Open MPI libraries used by all components (to include
mpirun and all the individual MPI processes) must be compatible. We usually
advise users to use the same version across the board, and that will make
things work.
More specifically: it's not enough to statically li
Jeff Hammond writes:
> You can rely upon e.g. https://www.mpich.org/abi/ when redistributing MPI
> binaries built with MPICH, but a better option would be to wrap all of your
> MPI code in an implementation-agnostic wrapper and then ship a binary that
> can dlopen a different version wrapper depe
Rob Latham writes:
> PLFS folks seem to think it's not worth the effort?
>
> https://github.com/plfs/plfs-core/issues/361
I guess they'd know, but I'm a bit puzzled as EMC seemed to be relying
on it (re-named?) fairly recently. It's in the latest ompio anyhow.
Rob Latham writes:
>> [I wonder why ANL apparently dumped PVFS but I probably shouldn't ask.]
>
> No harm in asking.
I wrongly guessed it was political!
> This is getting a bit off topic for an OpenMPI list
> but OMPIO and ROMIO both support PVFS so perhaps it's interesting to
> some.
At least
Rob Latham writes:
>> Assuming MPI-IO, to be on-topic, is MX known to have any real advantage
>> over TCP filesystem drivers, e.g. Lustre and PVFS?
>
> Both lustre (portals) and PVFS (BMI) have transport abstraction
> layers. File systems benefit from native transport (compared to
> ip-over-whate