Hi,
An MPI job is running on two nodes and everything seems to be fine.
However, in the middle of the run, the program aborts with the following
error
[compute-0-1.local][[47664,1],14][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[c
In absence of a clear error message, the btl_tcp_frag related error
messages can suggest a process was killed by the oom-killer.
This is not your case, since rank 0 died because of an illegal instruction.
Are you running under a batch manager ?
On which architecture ?
do your compute node have the
I checked again and as far as I can tell, everything was setup correctly. I
added "HCC debug" to the output message to make sure it's the correct plugin.
The updated outputs:
$ mpirun ./a.out < test.in
[c1725.crane.hcc.unl.edu:218844] HCC debug: [[26513,0],0] iof:hnp pushing fd 35
for process [
Hmmm...well, the problem appears to be that we aren’t setting up the input
channel to read stdin. This happens immediately after the application is
launched - there is no “if” clause or anything else in front of it. The only
way it wouldn’t get called is if all the procs weren’t launched, but th
HI everyone,
I am using a linux fedora. I downloaded/installed
openmpi-1.7.3-1.fc20(64-bit) and openmpi-devel-1.7.3-1.fc20(64-bit). As
well as pypar-openmpi-2.1.5_108-3.fc20(64-bit) and
python3-mpi4py-openmpi-1.3.1-1.fc20(64-bit). The problem I am having is
building mpi4py using the mpicc wrapper.
Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to the
mpirun command. Please see attached for the results.
$mpirun -mca plm_base_verbose 5 ./a.out < test.in
I mentioned in my initial post that the test job can run properly for the 1st
time. But if I kill the job and
Well, that helped a bit. For some reason, your system is skipping a step in the
launch state machine, and so we never hit the step where we setup the IO
forwarding system.
Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so I
have no way to replicate it. Must be a subtl
Oh my - that indeed illustrated the problem!! It is indeed a race condition on
the backend orted. I’ll try to fix it - probably have to send you a patch to
test?
> On Aug 30, 2016, at 1:04 PM, Jingchao Zhang wrote:
>
> $mpirun -mca state_base_verbose 5 ./a.out < test.in
>
> Please see attache
Hello everyone,
I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function
inside a for-loop. My program spawns N workers within each iteration of the
for-loop, makes some changes to the input for the next iteration, and then
proceeds to the next iteration.
After a few iterations
Yes, I can definitely help to test the patch.
Jingchao
From: users on behalf of r...@open-mpi.org
Sent: Tuesday, August 30, 2016 2:23:12 PM
To: Open MPI Users
Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
Oh my - that indeed illustrated the problem!
Thank you! The patch fixed the problem. I did multiple tests with your program
and another application. No more process hangs!
Cheers,
Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400
From: users on behalf of r...@open-mp
Sam,
at first you mentionned Open MPI 1.7.3.
though this is now a legacy version, you posted to the right place.
then you
# python setup.py build --mpicc=/usr/lib64/mpich/bin/mpicc
this is mpich, which is a very reputable MPI implementation, but not
Open MPI.
so i do invite you to use Op
12 matches
Mail list logo