Re: [OMPI users] General question about running single-node jobs.

2014-10-02 Thread Lee-Ping Wang
ed that there’s a problem with their >> realm-specific IP addressing (RSIP) for the compute nodes, which they’re >> working on fixing. I also tried running the same Q-Chem / OpenMPI job >> on a management node which I think has the same hardware (but not the >> RSIP), a

Re: [OMPI users] General question about running single-node jobs.

2014-10-02 Thread Gus Correa
stion about running single-node jobs. Hi Ralph, Thanks. I'll add some print statements to the code and try to figure out precisely where the failure is happening. - Lee-Ping On Sep 30, 2014, at 12:06 PM, Ralph Castain mailto:r...@open-mpi.org>> wrote: On Sep 30, 2014, at 11:19 A

Re: [OMPI users] General question about running single-node jobs.

2014-10-02 Thread Lee-Ping Wang
Blue Waters support gets back to me with the fix. :) Thanks, - Lee-Ping From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang Sent: Tuesday, September 30, 2014 1:15 PM To: Open MPI Users Subject: Re: [OMPI users] General question about running single-node jobs

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
Hi Ralph, Thanks. I'll add some print statements to the code and try to figure out precisely where the failure is happening. - Lee-Ping On Sep 30, 2014, at 12:06 PM, Ralph Castain wrote: > > On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote: > >> Hi Ralph, >> If so, then I should b

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Ralph Castain
On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote: > Hi Ralph, > >>> If so, then I should be able to (1) locate where the port number is >>> defined in the code, and (2) randomize the port number every time it's >>> called to work around the issue. What do you think? >> >> That might work,

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
Hi Ralph, >> If so, then I should be able to (1) locate where the port number is defined >> in the code, and (2) randomize the port number every time it's called to >> work around the issue. What do you think? > > That might work, depending on the code. I'm not sure what it is trying to > co

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Ralph Castain
On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang wrote: > Hi Ralph, > > Thank you. I think your diagnosis is probably correct. Are these sockets > the same as TCP/UDP ports (though different numbers) that are used in web > servers, email etc? Yes > If so, then I should be able to (1) locate w

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
Hi Ralph, Thank you. I think your diagnosis is probably correct. Are these sockets the same as TCP/UDP ports (though different numbers) that are used in web servers, email etc? If so, then I should be able to (1) locate where the port number is defined in the code, and (2) randomize the port

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Ralph Castain
I don't know anything about your application, or what the functions in your code are doing. I imagine it's possible that you are trying to open statically defined ports, which means that running the job again too soon could leave the OS thinking the socket is already busy. It takes awhile for th

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Here's another data point that might be useful: The error message is much more rare if I run my application on 4 cores instead of 8. Thanks, - Lee-Ping On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote: > Sorry for my last email - I think I spoke too quick. I realized after > reading some mo

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Sorry for my last email - I think I spoke too quick. I realized after reading some more documentation that OpenMPI always uses TCP sockets for out-of-band communication, so it doesn't make sense for me to set OMPI_MCA_oob=^tcp. That said, I am still running into a strange problem in my applica

[OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Hi there, My application uses MPI to run parallel jobs on a single node, so I have no need of any support for communication between nodes. However, when I use mpirun to launch my application I see strange errors such as: -