ping 2018-06-01 22:29 GMT+03:00 Dmitry N. Mikushin <maemar...@gmail.com>:
> Dear all, > > Looks like I have a weird issue never encountered before. While trying to > run simplest "Hello world" program, I get: > > $ cat hello.c > #include <mpi.h> > > int main(int argc, char* argv[]) > { > MPI_Init(&argc, &argv); > > MPI_Finalize(); > > return 0; > } > $ mpicc hello.c -o hello > $ mpirun -np 1 ./hello > -------------------------------------------------------------------------- > WARNING: The accept(3) system call failed on a TCP socket. While this > should generally never happen on a well-configured HPC system, the > most common causes when it does occur are: > > * The process ran out of file descriptors > * The operating system ran out of file descriptors > * The operating system ran out of memory > > Your Open MPI job will likely hang until the failure resason is fixed > (e.g., more file descriptors and/or memory becomes available), and may > eventually timeout / abort. > > Local host: M17xR4 > Errno: 9 (Bad file descriptor) > Probable cause: Unknown cause; job will try to continue > -------------------------------------------------------------------------- > > Further tracing shows the following: > > [pid 13498] accept(0, 0x7f2ec8000960, 0x7f2ee6740e7c) = -1 EBADF (Bad file > descriptor) > [pid 13498] shutdown(0, SHUT_RDWR) = -1 EBADF (Bad file descriptor) > [pid 13498] close(0) = -1 EBADF (Bad file descriptor) > [pid 13498] open("/usr/share/openmpi/help-oob-tcp.txt", O_RDONLY) = 0 > [pid 13498] ioctl(0, TCGETS, 0x7f2ee6740be0) = -1 ENOTTY (Inappropriate > ioctl for device) > [pid 13499] <... nanosleep resumed> NULL) = 0 > [pid 13498] fstat(0, <unfinished ...> > [pid 13499] nanosleep({0, 100000}, <unfinished ...> > [pid 13498] <... fstat resumed> {st_mode=S_IFREG|0644, st_size=3025, ...}) > = 0 > [pid 13498] read(0, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 3025 > [pid 13498] read(0, "", 4096) = 0 > [pid 13498] read(0, "", 8192) = 0 > [pid 13498] ioctl(0, TCGETS, 0x7f2ee6740b40) = -1 ENOTTY (Inappropriate > ioctl for device) > [pid 13498] close(0) = 0 > [pid 13499] <... nanosleep resumed> NULL) = 0 > [pid 13499] nanosleep({0, 100000}, <unfinished ...> > [pid 13498] write(1, "--------------------------------"..., > 768--------------------------------------------------------- > ----------------- > WARNING: The accept(3) system call failed on a TCP socket. While this > should generally never happen on a well-configured HPC system, the > most common causes when it does occur are: > > * The process ran out of file descriptors > * The operating system ran out of file descriptors > * The operating system ran out of memory > > Your Open MPI job will likely hang until the failure resason is fixed > (e.g., more file descriptors and/or memory becomes available), and may > eventually timeout / abort. > > Local host: M17xR4 > Errno: 9 (Bad file descriptor) > Probable cause: Unknown cause; job will try to continue > -------------------------------------------------------------------------- > ) = 768 > > In fact, "Bad file descriptor" first occurs a bit earlier, here: > > [pid 13499] open("/proc/self/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) > = 20 > [pid 13499] fstat(20, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0 > [pid 13499] getdents(20, /* 25 entries */, 32768) = 600 > [pid 13499] close(3) = 0 > [pid 13499] close(4) = 0 > [pid 13499] close(5) = 0 > [pid 13499] close(6) = 0 > [pid 13499] close(7) = 0 > [pid 13499] close(8) = 0 > [pid 13499] close(9) = 0 > [pid 13499] close(10) = 0 > [pid 13499] close(11) = 0 > [pid 13499] close(12) = 0 > [pid 13499] close(13) = 0 > [pid 13499] close(14) = 0 > [pid 13499] close(15) = 0 > [pid 13499] close(16) = 0 > [pid 13499] close(17) = 0 > [pid 13499] close(18) = 0 > [pid 13499] close(19) = 0 > [pid 13499] close(20) = 0 > [pid 13499] getdents(20, 0x1cc04a0, 32768) = -1 EBADF (Bad file descriptor) > [pid 13499] close(20) = -1 EBADF (Bad file descriptor) > > Any idea how to fix this? System is Ubuntu 16.04: > > Linux M17xR4 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 > UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > Kind regards, > - Dmitry. >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users