On Sun, Dec 22, 2019 at 3:33 PM Raymond, David <david.raym...@nmt.edu>
wrote:

> I am running openmpi-4.0.2 (self-compiled with GDS patches) on
> up-to-date 6.6 stable with a Go program that calls Clang MPI routines.
> With particular hardware (details provided if desired), readv and
> writev calls randomly fail with respectively "Timeout" and "Permission
> denied" errors for calls from one machine to another across the
> ethernet.


While "Permission denied" is the error message for EACCES, "Timeout" is not
a complete errno error message OpenBSD.  Has it been established that the
underlying readv/writev syscalls are returning particular errors by using
ktrace/kdump?

Next: if you have a device open, then the device driver *totally controls*
what errnos syscalls get.  If a device driver wanted to return EDOM
("Numerical argument out of domain") it totally could.  If you're getting
weird errno from a device, well, review the device source!


The errors don't occur between cores on the same machine.
>

THIS SHOULD NOT BE A SURPRISE: the net is not the same as your local
machine.


The man pages for readv and writev don't document the possibility of
> such errors.


IMO, weird errnos from devices should be documented in the manpage for the
device.  Consider the termios(4) manpage, for example.


Philip Guenther

Reply via email to