Re: [OMPI users] General question about running single-node jobs.

2014-10-02 Thread Lee-Ping Wang
sometimes preferable to > write the temporary files to a disk local to the node. > QCLOCALSCR > spec- > ifies this directory. The temporary files will be copied to > QCSCRATCH > at > the end of the job, unless the job is terminated abnormally. I > n such cases &

Re: [OMPI users] General question about running single-node jobs.

2014-10-02 Thread Lee-Ping Wang
Blue Waters support gets back to me with the fix. :) Thanks, - Lee-Ping From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang Sent: Tuesday, September 30, 2014 1:15 PM To: Open MPI Users Subject: Re: [OMPI users] General question about running single-node jobs

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
Hi Ralph, Thanks. I'll add some print statements to the code and try to figure out precisely where the failure is happening. - Lee-Ping On Sep 30, 2014, at 12:06 PM, Ralph Castain wrote: > > On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote: > >> Hi Ralph, >>

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
st time it's executed on the BW compute node - it's only subsequent executions that give the error messages. Thanks, - Lee-Ping On Sep 30, 2014, at 11:05 AM, Ralph Castain wrote: > > On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang wrote: > >> Hi Ralph, >> >>

Re: [OMPI users] General question about running single-node jobs.

2014-09-30 Thread Lee-Ping Wang
ou are trying to open > statically defined ports, which means that running the job again too soon > could leave the OS thinking the socket is already busy. It takes awhile for > the OS to release a socket resource. > > > On Sep 29, 2014, at 5:49 PM, Lee-Ping Wang wrote: > >>

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-30 Thread Lee-Ping Wang
ing library issues. It looks like >> someone has left incorrect configure logic in the system such that we always >> attempt to build Infiniband-related code, but without linking against the >> library. >> >> We'll have to try and track it down. >> >&g

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Here's another data point that might be useful: The error message is much more rare if I run my application on 4 cores instead of 8. Thanks, - Lee-Ping On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote: > Sorry for my last email - I think I spoke too quick. I realized after > r

Re: [OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
lly a novice to MPI - but it seems like the initial execution of the program isn't freeing up some system resource as it should. Is there something that needs to be corrected in the code? Thanks, - Lee-Ping On Sep 29, 2014, at 5:12 PM, Lee-Ping Wang wrote: > Hi there, > &g

[OMPI users] General question about running single-node jobs.

2014-09-29 Thread Lee-Ping Wang
Hi there, My application uses MPI to run parallel jobs on a single node, so I have no need of any support for communication between nodes. However, when I use mpirun to launch my application I see strange errors such as: -

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
runtime. - Lee-Ping On Sep 29, 2014, at 6:03 AM, Gustavo Correa wrote: > Hi Lee-Ping > > Did you cleanup the old build, to start fresh? > > make distclean > configure --disable-vt ... > ... > > I hope this helps, > Gus Correa > > On Sep 29, 2014, at 8:4

Re: [OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
libmpi.so: undefined reference to `ibv_destroy_cq' collect2: error: ld returned 1 exit status Thanks, - Lee-Ping On Sep 29, 2014, at 5:27 AM, Lee-Ping Wang wrote: > Hi there, > > I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of > the BTL compo

[OMPI users] OpenMPI 1.8.3 build without BTL

2014-09-29 Thread Lee-Ping Wang
Hi there, I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of the BTL components (they tend to break my single node jobs). ./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran --prefix=$QC_EXT_LIBS/openmpi --enable-static --enable-mca-no-build=btl Building gives me thi

Re: [OMPI users] Process is hanging

2014-09-22 Thread Lee-Ping Wang
er in the 1.8 series, but I can't replicate it now with the nightly 1.8 tarball that is about to be released as 1.8.3. http://www.open-mpi.org/nightly/v1.8/ On Sep 21, 2014, at 12:25 PM, Lee-Ping Wang wrote: Hmm, I didn't know those were segfault reports. It could indeed be a

Re: [OMPI users] Process is hanging

2014-09-21 Thread Lee-Ping Wang
On Sep 21, 2014, at 10:02 AM, Lee-Ping Wang wrote: My program isn't segfaulting - it's returning a non-zero status and then existing. Thanks, - Lee-Ping From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Sunday, September 21, 2014 8:5

Re: [OMPI users] Process is hanging

2014-09-21 Thread Lee-Ping Wang
Process is hanging Just to be clear: is your program returning a non-zero status and then exiting, or is it segfaulting? On Sep 21, 2014, at 8:22 AM, Lee-Ping Wang wrote: I'm using version 1.8.1. Thanks, - Lee-Ping From: users [mailto:users-boun...@open-mpi.org] On

Re: [OMPI users] Process is hanging

2014-09-21 Thread Lee-Ping Wang
u are using? On Sep 21, 2014, at 6:08 AM, Lee-Ping Wang wrote: Hi there, I'm running into an issue where mpirun isn't terminating when my executable has a nonzero exit status - instead it's hanging indefinitely. I'm attaching my process tree, the error message from

[OMPI users] Process is hanging

2014-09-21 Thread Lee-Ping Wang
Hi there, I'm running into an issue where mpirun isn't terminating when my executable has a nonzero exit status - instead it's hanging indefinitely. I'm attaching my process tree, the error message from the application, and the messages printed to stderr. Please let me know what I can do.

Re: [OMPI users] Changing directory from /tmp

2013-09-04 Thread Lee-Ping Wang
wrote: Hi, Am 04.09.2013 um 19:21 schrieb Ralph Castain: you can specify it with OMPI_TMPDIR in your environment, or "-mca orte_tmpdir_base " on your cmd line Wouldn't --tmpdir=... do the same with `mpirun` for way the latter you mentioned? -- Reuti On Sep 4, 2013, at 10:

[OMPI users] Changing directory from /tmp

2013-09-04 Thread Lee-Ping Wang
Hi there, On a few clusters I am running into an issue where a temporary directory cannot be created due to the root filesystem being full, causing mpirun to crash. Would it be possible to change the location where this directory is being created? [compute-109-4.local:12055] opal_os_dirpath_c

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
the comman line argument: -machinefile /scratch/leeping/pbs_nodefile.$HOSTNAME This will leave the PBS_NODEFILE variable intact, and have the same net effect as your workflow. Anyway, congratulations for sorting things out and making it work! Gus Correa On Aug 10, 2013, at 7:40 PM, Lee-Ping

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
t-tm Regardless, you can always deselect Torque support at runtime. Just put the following in your environment: OMPI_MCA_ras=^tm That will tell ORTE to ignore the Torque allocation module and it should then look at the machinefile. On Aug 10, 2013, at 4:18 PM, "Lee-Ping Wang"

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
PBS environment variables: echo $PBS_JOBID echo $PBS_NODEFILE ls -l $PBS_NODEFILE cat $PBS_NODEFILE cat $PBS_JOBID [this one should fail, because that is not a file, but may work the PBS variables were messed up along the way] I hope this helps, Gus Correa On Aug 10, 2013, at 6:39 PM, Lee-Ping Wang w

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
riginal Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang Sent: Saturday, August 10, 2013 3:07 PM To: 'Open MPI Users' Subject: Re: [OMPI users] Error launching single-node tasks from multiple-node job. Hi Gus, I tried your suggestions. Here is the co

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang Sent: Saturday, August 10, 2013 12:51 PM To: 'Open MPI Users' Subject: Re: [OMPI users] Error launching single-node tasks from multiple-node job. Hi Gus, Thank you. You gave me many helpful suggestions, whi

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
-boun...@open-mpi.org] On Behalf Of Gustavo Correa Sent: Saturday, August 10, 2013 12:39 PM To: Open MPI Users Subject: Re: [OMPI users] Error launching single-node tasks from multiple-node job. Hi Lee-Ping On Aug 10, 2013, at 3:15 PM, Lee-Ping Wang wrote: > Hi Gus, > > Thank you for your

Re: [OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
_NODEFILE -np 16 ./my-Q-chem-executable I hope this helps, Gus Correa On Aug 10, 2013, at 1:51 PM, Lee-Ping Wang wrote: > Hi there, > > Recently, I've begun some calculations on a cluster where I submit a multiple node job to the Torque batch system, and the job executes mult

[OMPI users] Error launching single-node tasks from multiple-node job.

2013-08-10 Thread Lee-Ping Wang
behavior so it always thinks it's running on a single node, regardless of the type of job I submit to the batch system? Thank you, - Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford University) [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File