I have an MPI program (a code in c for a school project) that I want to run on more nodes (this time 2 nodes) but it doesn't work and it is infinitely waiting. First I tried to run it on both machines with command `mpirun -np 2 -- host 192.168.0.147,192.168.0.116 ./mandelbrot_mpi_omp`, and now I tried to be more specific with subnet masks and to enable logging with: `mpirun --mca oob_base_verbose 100 --mca oob_tcp_if_include 192.168.0.0/24 --mca btl_tcp_if_include 192.168.0.0/24 -np 2 --host 192.168.0.147,192.168.0.116 ./mandelbrot_mpi_omp` Local ip addresses are ip addresses of that computers and they are entered in same order on both pcs, so 192.168.0.147 would be 0 - master. The first command is waiting without any text/error. The second one will pull a log and stop at step: get transports for component tcp - on both machines. log from both machines: https://pastebin.com/bt32ZddX lsof from both machines: https://pastebin.com/s3HHFWZB the lsof output is not shrinked, it's everything that had opened any ports at the moment, nothing else was running. The weird thing is about that CLOSE_WAIT flag that the ssh connection has on both sides.
here is my code: ``` int main(int argc, char* argv[]){ int width = SCALE_X; int height = SCALE_Y; // MPI init & setup MPI_Init(&argc, &argv); int world_size; int rank; MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); // calculate size of buffer according to server count int part_height = SCALE_Y/world_size; int buffer_size = (width+1)*(part_height+1)*3; // dynamically allocate arrays for image data according to server count send_buffer = calloc( buffer_size, sizeof(PIXEL)); recv_buffer = calloc( buffer_size*world_size, sizeof(PIXEL)); if(rank == 0) printf("MPI node count: %i\n", world_size); MPI_Barrier(MPI_COMM_WORLD); // OpenMP setup int cpu_count = omp_get_num_procs(); omp_set_num_threads(cpu_count); printf("OpenMP cpu count on node %i: %i\n", rank, cpu_count); printf("OpenMP (max) thread count on node %i: %i\n", rank, omp_get_num_threads()); MPI_Barrier(MPI_COMM_WORLD); // generate a part of mandelbrot set according to world size and rank of this server mandelbrot(rank, world_size, width, part_height); // gather parts of mandelbrot from all nodes MPI_Gather(send_buffer, (width)*(part_height)*3, MPI_CHAR, recv_buffer, (width)*(part_height)*3, MPI_CHAR, 0, MPI_COMM_WORLD); // save raster array of mandelbrot data to png file if(rank == 0) save_to_png(width, height); printf("Process %i finished.\n", rank); MPI_Finalize(); return 0; } ``` My OS is Debian 11 and Open MPI (v4.1.0) is installed through official debian repositories (on both machines). iptables or nftables are not installed on both systems, so any ip blocking should not be problem right now. Machines are connected to one router and they can connect to each other - i can ping them or connect to them with ssh on each other. I tried to connect them directly with ethernet cable and set ip addresses manually, but this didn't work too. Also, they have same username and password in system. What am I missing? I am new to mpi and not very savvy about networking as it is. Thanks in advance.