ed that there’s a problem with their
>> realm-specific IP addressing (RSIP) for the compute nodes, which they’re
>> working on fixing. I also tried running the same Q-Chem / OpenMPI job
>> on a management node which I think has the same hardware (but not the
>> RSIP), a
stion about running single-node jobs.
Hi Ralph,
Thanks. I'll add some print statements to the code and try to figure
out precisely where the failure is happening.
- Lee-Ping
On Sep 30, 2014, at 12:06 PM, Ralph Castain mailto:r...@open-mpi.org>> wrote:
On Sep 30, 2014, at 11:19 A
Blue Waters support gets back to me with the fix. :)
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Tuesday, September 30, 2014 1:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] General question about running single-node jobs
Hi Ralph,
Thanks. I'll add some print statements to the code and try to figure out
precisely where the failure is happening.
- Lee-Ping
On Sep 30, 2014, at 12:06 PM, Ralph Castain wrote:
>
> On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote:
>
>> Hi Ralph,
>>
If so, then I should b
On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote:
> Hi Ralph,
>
>>> If so, then I should be able to (1) locate where the port number is
>>> defined in the code, and (2) randomize the port number every time it's
>>> called to work around the issue. What do you think?
>>
>> That might work,
Hi Ralph,
>> If so, then I should be able to (1) locate where the port number is defined
>> in the code, and (2) randomize the port number every time it's called to
>> work around the issue. What do you think?
>
> That might work, depending on the code. I'm not sure what it is trying to
> co
On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang wrote:
> Hi Ralph,
>
> Thank you. I think your diagnosis is probably correct. Are these sockets
> the same as TCP/UDP ports (though different numbers) that are used in web
> servers, email etc?
Yes
> If so, then I should be able to (1) locate w
Hi Ralph,
Thank you. I think your diagnosis is probably correct. Are these sockets the
same as TCP/UDP ports (though different numbers) that are used in web servers,
email etc? If so, then I should be able to (1) locate where the port number is
defined in the code, and (2) randomize the port
I don't know anything about your application, or what the functions in your
code are doing. I imagine it's possible that you are trying to open statically
defined ports, which means that running the job again too soon could leave the
OS thinking the socket is already busy. It takes awhile for th
Here's another data point that might be useful: The error message is much more
rare if I run my application on 4 cores instead of 8.
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote:
> Sorry for my last email - I think I spoke too quick. I realized after
> reading some mo
Sorry for my last email - I think I spoke too quick. I realized after reading
some more documentation that OpenMPI always uses TCP sockets for out-of-band
communication, so it doesn't make sense for me to set OMPI_MCA_oob=^tcp. That
said, I am still running into a strange problem in my applica
Hi there,
My application uses MPI to run parallel jobs on a single node, so I have no
need of any support for communication between nodes. However, when I use
mpirun to launch my application I see strange errors such as:
-
12 matches
Mail list logo