Re: [OMPI users] OpenMPI job initializing problem

Beichuan Yan Fri, 21 Mar 2014 00:00:27 -0400 (EDT)

Here is an example of my data measured in seconds:

communication overhead = commuT + migraT + print, compuT is computational cost, 
totalT = compuT + communication overhead, overhead% denotes percentage of 
communication overhead


intelmpi (walltime=00:03:51)
iter         [commuT          migraT              printT]              compuT   
         totalT          overhead%
3999   4.945993e-03   2.689362e-04   1.440048e-04   1.689100e-02   2.224994e-02 
  2.343795e+01
5999   4.938126e-03   1.451969e-04   2.689362e-04   1.663089e-02   2.198315e-02 
  2.312373e+01
7999   4.904985e-03   1.490116e-04   1.451969e-04   1.678491e-02   2.198410e-02 
  2.298933e+01
9999   4.915953e-03   1.380444e-04   1.490116e-04   1.687193e-02   2.207494e-02 
  2.289473e+01

openmpi (walltime=00:04:32)
iter          [commuT          migraT             printT]              compuT   
           totalT         overhead%
3999   3.574133e-03   1.139641e-04   1.089573e-04   1.598001e-02   1.977706e-02 
  1.864836e+01
5999   3.574848e-03   1.189709e-04   1.139641e-04   1.599526e-02   1.980305e-02 
  1.865278e+01
7999   3.571033e-03   1.168251e-04   1.189709e-04   1.601100e-02   1.981783e-02 
  1.860879e+01
9999   3.587008e-03   1.258850e-04   1.168251e-04   1.596618e-02   1.979589e-02 
  1.875587e+01

It can be seen that Open MPI is faster in both communication and computation 
measured by MPI_Wtime calls, but the wall time reported by PBS pro is larger.


-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
Sent: Thursday, March 20, 2014 15:08
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI job initializing problem

On 03/20/2014 04:48 PM, Beichuan Yan wrote:
> Ralph and Noam,
>
> Thanks for the clarifications, they are important.
I could be wrong in understanding the filesystem.
>
> Spirit appears to use a scratch directory for
shared memory backing which is mounted on Lustre, and does not seem to have 
local directories or does not allow user to change TEMPDIR. Here is the info:
> [compute node]$ stat -f -L -c %T /tmp
> tmpfs
> [compute node]$ stat -f -L -c %T /home/yanb/scratch lustre
>

So, /tmp is a tmpfs, in memory/RAM.
Maybe they don't open writing permissions for regular users on /tmp?

> On another university supercomputer, I found the following:
> node0448[~]$ stat -f -L -c %T /tmp
> ramfs
> node0448[~]$ stat -f -L -c %T /home/yanb/scratch/ lustre Is this /tmp
> at compute node a local directory? I don't know how to tell it.
>
> Thanks,
> Beichuan
>
>
>
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph
> Castain
> Sent: Thursday, March 20, 2014 12:13
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI job initializing problem
>
>
> On Mar 20, 2014, at 9:48 AM, Beichuan Yan <beichuan....@colorado.edu> wrote:
>
>> Hi,
>>
>> Today I tested OMPI v1.7.5rc5 and surprisingly, it works like a charm!
>>
>> I found discussions related to this issue:
>>
>> 1. http://www.open-mpi.org/community/lists/users/2011/11/17688.php
>> The correct solution here is get your sys admin to make /tmp local. Making 
>> /tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux 
>> world - it should never be done, for the reasons stated by Jeff.
>>
>> my comment: for most clusters I have used, /tmp is NOT local. Open MPI 
>> community may not enforce it.
>
> We don't enforce anything, but /tmp being network mounted is a VERY
> unusual situation in the cluster world, and highly unrecommended
>
>
>>
>> 2. http://www.open-mpi.org/community/lists/users/2011/11/17684.php
>> In the upcoming OMPI v1.7, we revamped the shared memory setup code such 
>> that it'll actually use /dev/shm properly, or use some other mechanism other 
>> than a mmap file backed in a real filesystem. So the issue goes away.
>>
>> my comment: up to OMPI v1.7.4, this shmem issue is still there. However, it 
>> is resolved in OMPI v1.7.5rc5. This is surprising.
>>
>> Anyway, OMPI v1.7.5rc5 works well for multi-processes-on-one-node (shmem) 
>> mode on Spirit. There is no need to tune TCP or IB parameters to use it. My 
>> code just runs well:
>>
>> My test data takes 20 minutes to run with OMPI v1.7.4, but needs less than 1 
>> minute with OMPI v1.7.5rc5. I don't know what the magic is. I am wondering 
>> when OMPI v1.7.5 final will be released.
>>
>> I will update performance comparison between Intel MPI and Open MPI.
>>
>> Thanks,
>> Beichuan
>>
>>
>>
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>> Correa
>> Sent: Friday, March 07, 2014 18:41
>> To: Open MPI Users
>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>
>> On 03/06/2014 04:52 PM, Beichuan Yan wrote:
>>> No, I did all these and none worked.
>>>
>>> I just found, with exact the same code, data and job settings, a job can 
>>> really run one day while cannot the other day. It is NOT repeatable. I 
>>> don't know what the problem is: hardware? OpenMPI? PBS Pro?
>>>
>>> Anyway, I may have to give up using OpenMPI on that system and switch to 
>>> IntelMPI which always work.
>>>
>>> Thanks,
>>> Beichuan
>>
>> Well, this machine may have been setup to run only Intel MPI (DAPL?) and SGI 
>> MPI.
>> It is a pity that it doesn't seem to work with OpenMPI.
>>
>> In any case, good luck with your research project.
>>
>> Gus Correa
>>
>>>
>>> -----Original Message-----
>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>> Correa
>>> Sent: Thursday, March 06, 2014 13:51
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>
>>> On 03/06/2014 03:35 PM, Beichuan Yan wrote:
>>>> Gus,
>>>>
>>>> Yes, 10.148.0.0/16 is the IB subnet.
>>>>
>>>> I did try others but none worked:
>>>> #export
>>>> TCP="--mca btl sm,openib"
>>>> No run, no output
>>>
>>> If I remember right, and unless this changed in recent OMPI vervsions, you 
>>> also need "self":
>>>
>>> -mca btl sm,openib,self
>>>
>>> Alternatively, you could rule out tcp:
>>>
>>> -mca btl ^tcp
>>>
>>>>
>>>> #export
>>>> TCP="--mca btl sm,openib --mca btl_tcp_if_include 10.148.0.0/16"
>>>> No run, no output
>>>>
>>>> Beichuan
>>>
>>> Likewise, "self" is missing here.
>>>
>>> Also, I don't know if you can ask for openib and also add --mca 
>>> btl_tcp_if_include 10.148.0.0/16.
>>> Note that one turns off tcp (I think), whereas the other requests a
>>> tcp interface (or that the IB interface with IPoIB functionality).
>>> That combination sounds weird to me.
>>> The OMPI developers may clarify if this is valid syntax/syntax combination.
>>>
>>> I would try simply -mca btl sm,openib,self, which is likely to give
>>> you the IB transport with verbs, plus shared memory intra-node, plus
>>> the
>>> (mandatory?) self (loopback interface?).
>>> In my experience, this will also help identify any malfunctioning IB HCA in 
>>> the nodes (with a failure/error message).
>>>
>>>
>>> I hope it helps,
>>> Gus Correa
>>>
>>>
>>>>
>>>> -----Original Message-----
>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>> Correa
>>>> Sent: Thursday, March 06, 2014 13:16
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>
>>>> Hi Beichuan
>>>>
>>>> So, it looks like that now the program runs, even though with specific 
>>>> settings depending on whether you're using OMPI 1.6.5 or 1.7.4, right?
>>>>
>>>> It looks like the problem now is performance, right?
>>>>
>>>> System load affects performance, but unless the network is overwhelmed, or 
>>>> perhaps the Lustre file system is hanging or too slow, I would think that 
>>>> a walltime increase from 1min to 10min is not related to system load, but 
>>>> something else.
>>>>
>>>> Do you remember the setup that gave you 1min walltime?
>>>> Was it the same that you sent below?
>>>> Do you happen to know which nodes?
>>>> Are you sharing nodes with other jobs, or are you running alone on the 
>>>> nodes?
>>>> Sharing with other processes may slow down your job.
>>>> If you request all cores in the node, PBS should give you a full node 
>>>> (unless they tricked PBS to think the nodes have more cores than they 
>>>> actually do).
>>>> How do you request the nodes in your #PBS directives?
>>>> Do you request nodes and ppn, or do you request procs?
>>>>
>>>> I suggest that you do:
>>>> cat $PBS_NODEFILE
>>>> in your PBS script, just to document which nodes are actually given to you.
>>>>
>>>> Also helpful to document/troubleshoot is to add -v and -tag-output to your 
>>>> mpiexec command line.
>>>>
>>>>
>>>> The difference in walltime could be due to some malfunction of IB HCAs on 
>>>> the nodes, for instance.
>>>> Since you are allowing (if I remember right) the use of TCP, OpenMPI will 
>>>> try to use any interfaces that you did not rule out.
>>>> If your mpiexec command line doesn't make any restriction, it will use 
>>>> anything available, if I remember right.
>>>> (Jeff will correct me in the next second.) If your mpiexec command line 
>>>> has mca btl_tcp_if_include 10.148.0.0/16 it will use the 10.148.0.0/16 
>>>> subnet in with TCP transport, I think.
>>>> (Jeff will cut my list subscription after that one, for spreading
>>>> misinformation.)
>>>>
>>>> In either case my impression is that you may have left a door open to the 
>>>> use of non-IB (and non-IB-verbs) transport.
>>>>
>>>> Is 10.148.0.0/16 the an Infiniband subnet or an Ethernet subnet?
>>>>
>>>> Did you remeber Jeff's suggestion from a while ago to avoid TCP (over 
>>>> Ethernet or over IB), and stick to IB verbs?
>>>>
>>>>
>>>> Is 10.148.0.0/16 the IB or the Ethernet subnet?
>>>>
>>>> On 03/02/2014 02:38 PM, Jeff Squyres (jsquyres) wrote:
>>>>>   Both 1.6.x and 1.7.x/1.8.x will need verbs.h to use the native
>>>>> verbs  network stack.
>>>>>
>>>>>   You can use emulated TCP over IB (e.g., using the OMPI TCP BTL),
>>>>> but  it's nowhere near as fast/efficient the native verbs network stack.
>>>>>
>>>>
>>>>
>>>> You could force the use of IB verbs with
>>>>
>>>> -mca btl ^tcp
>>>>
>>>> or with
>>>>
>>>> -mca btl sm,openib,self
>>>>
>>>> on the mpiexec command line.
>>>>
>>>> In this case, if any of the IB HCAs on the nodes is bad, the job
>>>> will abort with an error message, instead of running too slow (if
>>>> it is using other networks).
>>>>
>>>> There are also ways to tell OMPI to do a more verbose output, that
>>>> may perhaps help diagnose the problem.
>>>> ompi_info | grep verbose
>>>> may give some hints (I confess I don't remember them).
>>>>
>>>>
>>>> Believe me, this did happen to me, i.e., to run MPI programs in a
>>>> cluster that had all sorts of non-homogeneous nodes, some with
>>>> faulty IB HCAs, some with incomplete OFED installation, some that
>>>> were not mounting shared file systems properly, etc.
>>>> [I didn't administer that one!]
>>>> Hopefully that is not the problem you are facing, but verbose
>>>> output may help anyways.
>>>>
>>>> I hope this helps,
>>>> Gus Correa
>>>>
>>>>
>>>>
>>>> On 03/06/2014 01:49 PM, Beichuan Yan wrote:
>>>>> 1. For $TMPDIR and $TCP, there are four combinations by commenting on/off 
>>>>> (note the system's default TMPDIR=/work3/yanb):
>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>> 10.148.0.0/16"
>>>>>
>>>>> 2. I tested the 4 combinations for OpenMPI 1.6.5 and OpenMPI 1.7.4 
>>>>> respectively for the pure-MPI mode (no OPENMP threads; 8 nodes, each node 
>>>>> runs 16 processes). The results are weird: of all 8 cases, only TWO of 
>>>>> them can run, but run so slow:
>>>>>
>>>>> OpenMPI 1.6.5:
>>>>> export TMPDIR=/work1/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>> 10.148.0.0/16"
>>>>> Warning: shared-memory, /work1/home/yanb/tmp/ Run, take 10
>>>>> minutes, slow
>>>>>
>>>>> OpenMPI 1.7.4:
>>>>> #export TMPDIR=/work1/home/yanb/tmp #TCP="--mca btl_tcp_if_include
>>>>> 10.148.0.0/16"
>>>>> Warning: shared-memory /work3/yanb/605832.SPIRIT/ Run, take 10
>>>>> minutess, slow
>>>>>
>>>>> So you see, a) openmpi 1.6.5 and 1.7.4 need different settings to
>>>>> run;
>>>> b) whether specifying TMPDIR, I got the shared memory warning.
>>>>>
>>>>> 3. But a few days ago, OpenMPI 1.6.5 worked great and took only 1
>>>>> minute
>>>> (now it takes 10 minutes). I am so confused by the results.
>>>> Does the system loading level or fluctuation or PBS pro affect
>>>> OpenMPI performance?
>>>>>
>>>>> Thanks,
>>>>> Beichuan
>>>>>
>>>>> -----Original Message-----
>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>> Correa
>>>>> Sent: Tuesday, March 04, 2014 08:48
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>
>>>>> Hi Beichuan
>>>>>
>>>>> So, from "df" it looks like /home is /work1, right?
>>>>>
>>>>> Also, "mount" shows only /work[1-4], not the other
>>>>> 7 CWFS panfs (Panasas?), which apparently are not available in the 
>>>>> compute nodes/blades.
>>>>>
>>>>> I presume you have access and are using only some of the
>>>>> /work[1-4]
>>>>> (lustre) file systems for all your MPI and other software installation, 
>>>>> right? Not the panfs, right?
>>>>>
>>>>> Awkward that it doesn't work, because lustre is supposed to be a parallel 
>>>>> file system, highly available to all nodes (assuming it is mounted on all 
>>>>> nodes).
>>>>>
>>>>> It also shows a small /tmp with a tmpfs file system, which is volatile, 
>>>>> in memory:
>>>>>
>>>>> http://en.wikipedia.org/wiki/Tmpfs
>>>>>
>>>>> I would guess they don't let you write there, so TMPDIR=/tmp may not be a 
>>>>> possible option, but this is just a wild guess.
>>>>> Or maybe OMPI requires an actual non-volatile file system to write its 
>>>>> shared memory auxiliary files and other stuff that normally goes on /tmp? 
>>>>>  [Jeff, Ralph, help!!] I kind of remember some old discussion on this 
>>>>> list about this, but maybe it was in another list.
>>>>>
>>>>> [You could ask the sys admin about this, and perhaps what he
>>>>> recommends to use to replace /tmp.]
>>>>>
>>>>> Just in case they may have some file system mount point mixup, you could 
>>>>> try perhaps TMPDIR=/work1/yanb/tmp (rather than /home) You could also try 
>>>>> TMPDIR=/work3/yanb/tmp, as if I remember right this is another file 
>>>>> system you have access to (not sure anymore, it may have been in the 
>>>>> previous emails).
>>>>> Either way, you may need to create the tmp directory beforehand.
>>>>>
>>>>> **
>>>>>
>>>>> Any chances that this is an environment mixup?
>>>>>
>>>>> Say, that you may be inadvertently using the SGI-MPI mpiexec Using a 
>>>>> /full/path/to/mpiexec in your job may clarify this.
>>>>>
>>>>> "which mpiexec" will tell, but since the environment on the compute nodes 
>>>>> may not be exactly the same as in the login node, it may not be reliable 
>>>>> information.
>>>>>
>>>>> Or perhaps you may not be pointing to the OMPI libraries?
>>>>> Are you exporting PATH and LD_LIBRARY_PATH on .bashrc/.tcshrc, with the 
>>>>> OMPI items (bin and lib) *PREPENDED* (not appended), so as to take 
>>>>> precedence over other possible/SGI/pre-existent MPI items?
>>>>>
>>>>> Those are pretty (ugly) common problems.
>>>>>
>>>>> **
>>>>>
>>>>> I hope this helps,
>>>>> Gus Correa
>>>>>
>>>>> On 03/03/2014 10:13 PM, Beichuan Yan wrote:
>>>>>> 1. info from a compute node
>>>>>> -bash-4.1$ hostname
>>>>>> r32i1n1
>>>>>> -bash-4.1$ df -h /home
>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1
>>>>>>                           1.2P  136T  1.1P  12% /work1 -bash-4.1$
>>>>>> mount devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on
>>>>>> /tmp type tmpfs (rw,size=150m) none on /proc/sys/fs/binfmt_misc
>>>>>> type binfmt_misc
>>>>>> (rw) cpuset on /dev/cpuset type cpuset (rw)
>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1 on /work1 type lustre
>>>>>> (rw,flock)
>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2 on /work2 type lustre
>>>>>> (rw,flock)
>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3 on /work3 type lustre
>>>>>> (rw,flock)
>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4 on /work4 type lustre
>>>>>> (rw,flock)
>>>>>>
>>>>>>
>>>>>> 2. For "export TMPDIR=/home/yanb/tmp", I created it beforehand, and I 
>>>>>> did see mpi-related temporary files there when the job gets started.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>>> Correa
>>>>>> Sent: Monday, March 03, 2014 18:23
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>
>>>>>> Hi Beichuan
>>>>>>
>>>>>> OK, it says "unclassified.html", so I presume it is not a problem.
>>>>>>
>>>>>> The web site says the computer is an SGI ICE X.
>>>>>> I am not familiar to it, so what follows are guesses.
>>>>>>
>>>>>> The SGI site brochure suggests that the nodes/blades have local disks:
>>>>>> https://www.sgi.com/pdfs/4330.pdf
>>>>>>
>>>>>> The file systems prefixed with IP addresses (work[1-4]) and with panfs 
>>>>>> (cwfs and CWFS[1-6]) and a colon (:) are shared exports (not local), but 
>>>>>> not necessarily NFS (panfs may be Panasas?).
>>>>>>      From this output it is hard to tell where /home is, but I would 
>>>>>> guess it is also shared (not local).
>>>>>> Maybe "df -h /home" will tell.  Or perhaps "mount".
>>>>>>
>>>>>> You may be logged in to a login/service node, so although it does have a 
>>>>>> /tmp (your ls / shows tmp), this doesn't guarantee that the compute 
>>>>>> nodes/blades also do.
>>>>>>
>>>>>> Since your jobs failed when you specified TMPDIR=/tmp, I would guess 
>>>>>> /tmp doesn't exist on the nodes/blades, or is not writable.
>>>>>>
>>>>>> Did you try to submit a job with, say, "mpiexec -np 16 ls -ld /tmp"?
>>>>>> This should tell if /tmp exists on the nodes, if it is writable.
>>>>>>
>>>>>> A stupid question:
>>>>>> When you tried your job with this:
>>>>>>
>>>>>> export TMPDIR=/home/yanb/tmp
>>>>>>
>>>>>> Did you create the directory /home/yanb/tmp beforehand?
>>>>>>
>>>>>> Anyway, you may need to ask the help of a system administrator of this 
>>>>>> machine.
>>>>>>
>>>>>> Gus Correa
>>>>>>
>>>>>> On 03/03/2014 07:43 PM, Beichuan Yan wrote:
>>>>>>> Gus,
>>>>>>>
>>>>>>> I am using this system: 
>>>>>>> http://centers.hpc.mil/systems/unclassified.html#Spirit. I don't know 
>>>>>>> exactly configurations of the file system. Here is the output of "df 
>>>>>>> -h":
>>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>>> /dev/sda6             919G   16G  857G   2% /
>>>>>>> tmpfs                  32G     0   32G   0% /dev/shm
>>>>>>> /dev/sda5             139M   33M  100M  25% /boot
>>>>>>> adfs3v-s:/adfs3/hafs14
>>>>>>>                            6.5T  678G  5.5T  11% /scratch
>>>>>>> adfs3v-s:/adfs3/hafs16
>>>>>>>                            6.5T  678G  5.5T  11% /var/spool/mail
>>>>>>> 10.148.18.45@o2ib:10.148.18.46@o2ib:/fs1
>>>>>>>                            1.2P  136T  1.1P  12% /work1
>>>>>>> 10.148.18.132@o2ib:10.148.18.133@o2ib:/fs4
>>>>>>>                            1.2P  793T  368T  69% /work4
>>>>>>> 10.148.18.104@o2ib:10.148.18.165@o2ib:/fs3
>>>>>>>                            1.2P  509T  652T  44% /work3
>>>>>>> 10.148.18.76@o2ib:10.148.18.164@o2ib:/fs2
>>>>>>>                            1.2P  521T  640T  45% /work2
>>>>>>> panfs://172.16.0.10/CWFS
>>>>>>>                            728T  286T  443T  40% /p/cwfs
>>>>>>> panfs://172.16.1.61/CWFS1
>>>>>>>                            728T  286T  443T  40% /p/CWFS1
>>>>>>> panfs://172.16.0.210/CWFS2
>>>>>>>                            728T  286T  443T  40% /p/CWFS2
>>>>>>> panfs://172.16.1.125/CWFS3
>>>>>>>                            728T  286T  443T  40% /p/CWFS3
>>>>>>> panfs://172.16.1.224/CWFS4
>>>>>>>                            728T  286T  443T  40% /p/CWFS4
>>>>>>> panfs://172.16.1.224/CWFS5
>>>>>>>                            728T  286T  443T  40% /p/CWFS5
>>>>>>> panfs://172.16.1.224/CWFS6
>>>>>>>                            728T  286T  443T  40% /p/CWFS6
>>>>>>> panfs://172.16.1.224/CWFS7
>>>>>>>                            728T  286T  443T  40% /p/CWFS7
>>>>>>>
>>>>>>> 1. My home directory is /home/yanb.
>>>>>>> My simulation files are located at /work3/yanb.
>>>>>>> The default TMPDIR set by system is just /work3/yanb
>>>>>>>
>>>>>>> 2. I did try not to set TMPDIR and let it default, which is just case 1 
>>>>>>> and case 2.
>>>>>>>        Case1: #export TMPDIR=/home/yanb/tmp
>>>>>>>                  TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>>>>>>           It gives no apparent reason.
>>>>>>>        Case2: #export TMPDIR=/home/yanb/tmp
>>>>>>>                  #TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>>>>>>>           It gives warning of shared memory file on network file system.
>>>>>>>
>>>>>>> 3. With "export TMPDIR=/tmp", the job gives the same, no apparent 
>>>>>>> reason.
>>>>>>>
>>>>>>> 4. FYI, "ls /" gives:
>>>>>>> ELT    apps  cgroup  hafs1   hafs12  hafs2  hafs5  hafs8        home   
>>>>>>> lost+found  mnt  p      root     selinux  tftpboot  var    work3
>>>>>>> admin  bin   dev     hafs10  hafs13  hafs3  hafs6  hafs9        lib    
>>>>>>> media       net  panfs  sbin     srv      tmp       work1  work4
>>>>>>> app    boot  etc     hafs11  hafs15  hafs4  hafs7  hafs_x86_64  lib64  
>>>>>>> misc        opt  proc   scratch  sys      usr       work2  workspace
>>>>>>>
>>>>>>> Beichuan
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus
>>>>>>> Correa
>>>>>>> Sent: Monday, March 03, 2014 17:24
>>>>>>> To: Open MPI Users
>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>>
>>>>>>> Hi Beichuan
>>>>>>>
>>>>>>> If you are using the university cluster, chances are that /home is not 
>>>>>>> local, but on an NFS share, or perhaps Lustre (which you may have 
>>>>>>> mentioned before, I don't remember).
>>>>>>>
>>>>>>> Maybe "df -h" will show what is local what is not.
>>>>>>> It works for NFS, it prefixes file systems with the server name, but I 
>>>>>>> don't know about Lustre.
>>>>>>>
>>>>>>> Did you try just not to set TMPDIR and let it default?
>>>>>>> If the default TMPDIR is on Lustre (did you say this?, anyway I
>>>>>>> don't
>>>>>>> remember) you could perhaps try to force it to /tmp:
>>>>>>> export TMPDIR=/tmp,
>>>>>>> If the cluster nodes are diskfull /tmp is likely to exist and be local 
>>>>>>> to the cluster nodes.
>>>>>>> [But the cluster nodes may be diskless ... :( ]
>>>>>>>
>>>>>>> I hope this helps,
>>>>>>> Gus Correa
>>>>>>>
>>>>>>> On 03/03/2014 07:10 PM, Beichuan Yan wrote:
>>>>>>>> How to set TMPDIR to a local filesystem? Is /home/yanb/tmp a local 
>>>>>>>> filesystem? I don't know how to tell a directory is local file system 
>>>>>>>> or network file system.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>>>>>>>> Jeff Squyres (jsquyres)
>>>>>>>> Sent: Monday, March 03, 2014 16:57
>>>>>>>> To: Open MPI Users
>>>>>>>> Subject: Re: [OMPI users] OpenMPI job initializing problem
>>>>>>>>
>>>>>>>> How about setting TMPDIR to a local filesystem?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 3, 2014, at 3:43 PM, Beichuan Yan<beichuan....@colorado.edu>    
>>>>>>>>    wrote:
>>>>>>>>
>>>>>>>>> I agree there are two cases for pure-MPI mode: 1. Job fails with no 
>>>>>>>>> apparent reason;  2 job complains shared-memory file on network file 
>>>>>>>>> system, which can be resolved by " export TMPDIR=/home/yanb/tmp", 
>>>>>>>>> /home/yanb/tmp is my local directory. The default TMPDIR points to a 
>>>>>>>>> Lustre directory.
>>>>>>>>>
>>>>>>>>> There is no any other output. I checked my job with "qstat -n" and 
>>>>>>>>> found that processes were actually not started on compute nodes even 
>>>>>>>>> though PBS Pro has "started" my job.
>>>>>>>>>
>>>>>>>>> Beichuan
>>>>>>>>>
>>>>>>>>>> 3. Then I test pure-MPI mode: OPENMP is turned off, and each compute 
>>>>>>>>>> node runs 16 processes (clearly shared-memory of MPI is used). Four 
>>>>>>>>>> combinations of "TMPDIR" and "TCP" are tested:
>>>>>>>>>> case 1:
>>>>>>>>>> #export TMPDIR=/home/yanb/tmp TCP="--mca btl_tcp_if_include
>>>>>>>>>> 10.148.0.0/16"
>>>>>>>>>> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE
>>>>>>>>>> ./paraEllip3d input.txt
>>>>>>>>>> output:
>>>>>>>>>> Start Prologue v2.5 Mon Mar  3 15:47:16 EST 2014 End Prologue
>>>>>>>>>> v2.5 Mon Mar  3 15:47:16 EST 2014
>>>>>>>>>> -bash: line 1: 448597 Terminated              
>>>>>>>>>> /var/spool/PBS/mom_priv/jobs/602244.service12.SC
>>>>>>>>>> Start Epilogue v2.5 Mon Mar  3 15:50:51 EST 2014 Statistics
>>>>>>>>>> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb
>>>>>>>>>> ,
>>>>>>>>>> w
>>>>>>>>>> all
>>>>>>>>>> t
>>>>>>>>>> i
>>>>>>>>>> m
>>>>>>>>>> e
>>>>>>>>>> =00:03:24 End Epilogue v2.5 Mon Mar  3 15:50:52 EST 2014
>>>>>>>>>
>>>>>>>>> It looks like you have two general cases:
>>>>>>>>>
>>>>>>>>> 1. The job fails for no apparent reason (like above), or 2.
>>>>>>>>> The job complains that your TMPDIR is on a shared filesystem
>>>>>>>>>
>>>>>>>>> Right?
>>>>>>>>>
>>>>>>>>> I think the real issue, then, is to figure out why your jobs are 
>>>>>>>>> failing with no output.
>>>>>>>>>
>>>>>>>>> Is there anything in the stderr output?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jeff Squyres
>>>>>>>>> jsquy...@cisco.com
>>>>>>>>> For corporate legal information go to:
>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> jsquy...@cisco.com
>>>>>>>> For corporate legal information go to:
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI job initializing problem

Reply via email to