On Sun, Dec 22, 2019 at 3:33 PM Raymond, David <david.raym...@nmt.edu> wrote:
> I am running openmpi-4.0.2 (self-compiled with GDS patches) on > up-to-date 6.6 stable with a Go program that calls Clang MPI routines. > With particular hardware (details provided if desired), readv and > writev calls randomly fail with respectively "Timeout" and "Permission > denied" errors for calls from one machine to another across the > ethernet. While "Permission denied" is the error message for EACCES, "Timeout" is not a complete errno error message OpenBSD. Has it been established that the underlying readv/writev syscalls are returning particular errors by using ktrace/kdump? Next: if you have a device open, then the device driver *totally controls* what errnos syscalls get. If a device driver wanted to return EDOM ("Numerical argument out of domain") it totally could. If you're getting weird errno from a device, well, review the device source! The errors don't occur between cores on the same machine. > THIS SHOULD NOT BE A SURPRISE: the net is not the same as your local machine. The man pages for readv and writev don't document the possibility of > such errors. IMO, weird errnos from devices should be documented in the manpage for the device. Consider the termios(4) manpage, for example. Philip Guenther