date:20160830

[OMPI users] job aborts "readv failed: Connection reset by peer"

2016-08-30 Thread Mahmood Naderan

Hi, An MPI job is running on two nodes and everything seems to be fine. However, in the middle of the run, the program aborts with the following error [compute-0-1.local][[47664,1],14][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) [c

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-08-30 Thread Gilles Gouaillardet

In absence of a clear error message, the btl_tcp_frag related error messages can suggest a process was killed by the oom-killer. This is not your case, since rank 0 died because of an illegal instruction. Are you running under a batch manager ? On which architecture ? do your compute node have the

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang

I checked again and as far as I can tell, everything was setup correctly. I added "HCC debug" to the output message to make sure it's the correct plugin. The updated outputs: $ mpirun ./a.out < test.in [c1725.crane.hcc.unl.edu:218844] HCC debug: [[26513,0],0] iof:hnp pushing fd 35 for process [

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org

Hmmm...well, the problem appears to be that we aren’t setting up the input channel to read stdin. This happens immediately after the application is launched - there is no “if” clause or anything else in front of it. The only way it wouldn’t get called is if all the procs weren’t launched, but th

[OMPI users] Certain files for mpi missing when building mpi4py

2016-08-30 Thread Mahdi, Sam

HI everyone, I am using a linux fedora. I downloaded/installed openmpi-1.7.3-1.fc20(64-bit) and openmpi-devel-1.7.3-1.fc20(64-bit). As well as pypar-openmpi-2.1.5_108-3.fc20(64-bit) and python3-mpi4py-openmpi-1.3.1-1.fc20(64-bit). The problem I am having is building mpi4py using the mpicc wrapper.

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang

Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to the mpirun command. Please see attached for the results. $mpirun -mca plm_base_verbose 5 ./a.out < test.in I mentioned in my initial post that the test job can run properly for the 1st time. But if I kill the job and

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org

Well, that helped a bit. For some reason, your system is skipping a step in the launch state machine, and so we never hit the step where we setup the IO forwarding system. Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so I have no way to replicate it. Must be a subtl

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org

Oh my - that indeed illustrated the problem!! It is indeed a race condition on the backend orted. I’ll try to fix it - probably have to send you a patch to test? > On Aug 30, 2016, at 1:04 PM, Jingchao Zhang wrote: > > $mpirun -mca state_base_verbose 5 ./a.out < test.in > > Please see attache

[OMPI users] bug? "The system limit on number of children a process can have was reached"

2016-08-30 Thread Jason Maldonis

Hello everyone, I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function inside a for-loop. My program spawns N workers within each iteration of the for-loop, makes some changes to the input for the next iteration, and then proceeds to the next iteration. After a few iterations

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang

Yes, I can definitely help to test the patch. Jingchao From: users on behalf of r...@open-mpi.org Sent: Tuesday, August 30, 2016 2:23:12 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Oh my - that indeed illustrated the problem!

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread Jingchao Zhang

Thank you! The patch fixed the problem. I did multiple tests with your program and another application. No more process hangs! Cheers, Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 From: users on behalf of r...@open-mp

Re: [OMPI users] Certain files for mpi missing when building mpi4py

2016-08-30 Thread Gilles Gouaillardet

Sam, at first you mentionned Open MPI 1.7.3. though this is now a legacy version, you posted to the right place. then you # python setup.py build --mpicc=/usr/lib64/mpich/bin/mpicc this is mpich, which is a very reputable MPI implementation, but not Open MPI. so i do invite you to use Op

[OMPI users] job aborts "readv failed: Connection reset by peer"

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

Re: [OMPI users] stdin issue with openmpi/2.0.0

Re: [OMPI users] stdin issue with openmpi/2.0.0

[OMPI users] Certain files for mpi missing when building mpi4py

Re: [OMPI users] stdin issue with openmpi/2.0.0

Re: [OMPI users] stdin issue with openmpi/2.0.0

Re: [OMPI users] stdin issue with openmpi/2.0.0

[OMPI users] bug? "The system limit on number of children a process can have was reached"

Re: [OMPI users] stdin issue with openmpi/2.0.0

Re: [OMPI users] stdin issue with openmpi/2.0.0

Re: [OMPI users] Certain files for mpi missing when building mpi4py

12 matches

Site Navigation

Mail list logo

Footer information