sometimes preferable to
> write the temporary files to a disk local to the node.
> QCLOCALSCR
> spec-
> ifies this directory. The temporary files will be copied to
> QCSCRATCH
> at
> the end of the job, unless the job is terminated abnormally. I
> n such cases
&
Blue Waters support gets back to me with the fix. :)
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Tuesday, September 30, 2014 1:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] General question about running single-node jobs
Hi Ralph,
Thanks. I'll add some print statements to the code and try to figure out
precisely where the failure is happening.
- Lee-Ping
On Sep 30, 2014, at 12:06 PM, Ralph Castain wrote:
>
> On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote:
>
>> Hi Ralph,
>>
st time it's executed on the BW compute node - it's only
subsequent executions that give the error messages.
Thanks,
- Lee-Ping
On Sep 30, 2014, at 11:05 AM, Ralph Castain wrote:
>
> On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang wrote:
>
>> Hi Ralph,
>>
>>
ou are trying to open
> statically defined ports, which means that running the job again too soon
> could leave the OS thinking the socket is already busy. It takes awhile for
> the OS to release a socket resource.
>
>
> On Sep 29, 2014, at 5:49 PM, Lee-Ping Wang wrote:
>
>>
ing library issues. It looks like
>> someone has left incorrect configure logic in the system such that we always
>> attempt to build Infiniband-related code, but without linking against the
>> library.
>>
>> We'll have to try and track it down.
>>
>&g
Here's another data point that might be useful: The error message is much more
rare if I run my application on 4 cores instead of 8.
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote:
> Sorry for my last email - I think I spoke too quick. I realized after
> r
lly a novice to MPI - but it
seems like the initial execution of the program isn't freeing up some system
resource as it should. Is there something that needs to be corrected in the
code?
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:12 PM, Lee-Ping Wang wrote:
> Hi there,
>
&g
Hi there,
My application uses MPI to run parallel jobs on a single node, so I have no
need of any support for communication between nodes. However, when I use
mpirun to launch my application I see strange errors such as:
-
runtime.
- Lee-Ping
On Sep 29, 2014, at 6:03 AM, Gustavo Correa wrote:
> Hi Lee-Ping
>
> Did you cleanup the old build, to start fresh?
>
> make distclean
> configure --disable-vt ...
> ...
>
> I hope this helps,
> Gus Correa
>
> On Sep 29, 2014, at 8:4
libmpi.so: undefined reference to `ibv_destroy_cq'
collect2: error: ld returned 1 exit status
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:27 AM, Lee-Ping Wang wrote:
> Hi there,
>
> I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of
> the BTL compo
Hi there,
I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of the
BTL components (they tend to break my single node jobs).
./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran
--prefix=$QC_EXT_LIBS/openmpi --enable-static --enable-mca-no-build=btl
Building gives me thi
er in the 1.8 series, but I can't replicate it now with the nightly 1.8
tarball that is about to be released as 1.8.3.
http://www.open-mpi.org/nightly/v1.8/
On Sep 21, 2014, at 12:25 PM, Lee-Ping Wang wrote:
Hmm, I didn't know those were segfault reports. It could indeed be a
On Sep 21, 2014, at 10:02 AM, Lee-Ping Wang wrote:
My program isn't segfaulting - it's returning a non-zero status and then
existing.
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Sunday, September 21, 2014 8:5
Process is hanging
Just to be clear: is your program returning a non-zero status and then
exiting, or is it segfaulting?
On Sep 21, 2014, at 8:22 AM, Lee-Ping Wang wrote:
I'm using version 1.8.1.
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On
u are using?
On Sep 21, 2014, at 6:08 AM, Lee-Ping Wang wrote:
Hi there,
I'm running into an issue where mpirun isn't terminating when my executable
has a nonzero exit status - instead it's hanging indefinitely. I'm
attaching my process tree, the error message from
Hi there,
I'm running into an issue where mpirun isn't terminating when my executable
has a nonzero exit status - instead it's hanging indefinitely. I'm
attaching my process tree, the error message from the application, and the
messages printed to stderr. Please let me know what I can do.
wrote:
Hi,
Am 04.09.2013 um 19:21 schrieb Ralph Castain:
you can specify it with OMPI_TMPDIR in your environment, or "-mca orte_tmpdir_base
" on your cmd line
Wouldn't --tmpdir=... do the same with `mpirun` for way the latter you
mentioned?
-- Reuti
On Sep 4, 2013, at 10:
Hi there,
On a few clusters I am running into an issue where a temporary directory cannot
be created due to the root filesystem being full, causing mpirun to crash.
Would it be possible to change the location where this directory is being
created?
[compute-109-4.local:12055] opal_os_dirpath_c
the
comman line argument: -machinefile /scratch/leeping/pbs_nodefile.$HOSTNAME
This will leave the PBS_NODEFILE variable intact, and have the same net
effect as your workflow.
Anyway, congratulations for sorting things out and making it work!
Gus Correa
On Aug 10, 2013, at 7:40 PM, Lee-Ping
t-tm
Regardless, you can always deselect Torque support at runtime. Just put the
following in your environment:
OMPI_MCA_ras=^tm
That will tell ORTE to ignore the Torque allocation module and it should
then look at the machinefile.
On Aug 10, 2013, at 4:18 PM, "Lee-Ping Wang"
PBS environment
variables:
echo $PBS_JOBID
echo $PBS_NODEFILE
ls -l $PBS_NODEFILE
cat $PBS_NODEFILE
cat $PBS_JOBID [this one should fail, because that is not a file, but may
work the PBS variables were messed up along the way]
I hope this helps,
Gus Correa
On Aug 10, 2013, at 6:39 PM, Lee-Ping Wang w
riginal Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Saturday, August 10, 2013 3:07 PM
To: 'Open MPI Users'
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Gus,
I tried your suggestions. Here is the co
-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Saturday, August 10, 2013 12:51 PM
To: 'Open MPI Users'
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Gus,
Thank you. You gave me many helpful suggestions, whi
-boun...@open-mpi.org] On Behalf Of Gustavo Correa
Sent: Saturday, August 10, 2013 12:39 PM
To: Open MPI Users
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Lee-Ping
On Aug 10, 2013, at 3:15 PM, Lee-Ping Wang wrote:
> Hi Gus,
>
> Thank you for your
_NODEFILE -np 16 ./my-Q-chem-executable
I hope this helps,
Gus Correa
On Aug 10, 2013, at 1:51 PM, Lee-Ping Wang wrote:
> Hi there,
>
> Recently, I've begun some calculations on a cluster where I submit a
multiple node job to the Torque batch system, and the job executes mult
behavior so it always thinks it's running on a single node,
regardless of the type of job I submit to the batch system?
Thank you,
- Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford
University)
[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File
27 matches
Mail list logo