Hi Reuti, A simple hello_world program works without the h_vmem limit. Honestly, I am not familiar with Open MPI. The command qconf -spl and qconf -sp ompi give the information below. But strangely, it begins to work after I insert *unset SGE_ROOT* in my job script. I don't know why.
However, it still cannot work smoothly through 60hrs I setup. After running for about two hours, it stops without any error messages. Is this related to the h_vemem limit? *$ qconf -spl* 16per 1per 2per 4per hadoop make ompi openmp *$ qconf -sp ompi* pe_name ompi slots 9999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min SGE version: 6.1u6 Open MPI version: 1.2.9 *Job script updated:* #$ -S /bin/bash #$ -N couple #$ -cwd #$ -j y #$ -R y #$ -l h_rt=62:00:00 #$ -l h_vmem=2G #$ -o couple.out #$ -e couple.err #$ -pe ompi* 8 *unset SGE_ROOT* ./app Thanks, Pengcheng On Sun, Aug 24, 2014 at 1:00 PM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: A daemon on node cl231 failed to start as expected (Reuti) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 23 Aug 2014 18:49:38 +0200 > From: Reuti <re...@staff.uni-marburg.de> > To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] A daemon on node cl231 failed to start as > expected > Message-ID: > <8f21a4d9-9e8d-4e20-9ae6-04a495a33...@staff.uni-marburg.de> > Content-Type: text/plain; charset=windows-1252 > > Hi, > > Am 23.08.2014 um 16:09 schrieb Pengcheng Wang: > > > I need to run a single driver program that only require one proc with > the command mpirun -np 1 ./app or ./app. But it will schedule the launch of > other executable files including parallel and sequential computing. So I > require more than one proc to run it. It can be run smoothly as an > interactive job with the command below. > > > > qrsh -cwd -pe "ompi*" 6 -l h_rt=00:30:00,test=true ./app > > > > But after I submitted the job, a strange error occurred and it > stopped... Please find the job script and error message below: > > > > ? job submission script: > > #$ -S /bin/bash > > #$ -N couple > > #$ -cwd > > #$ -j y > > #$ -l h_rt=05:00:00 > > #$ -l h_vmem=2G > > Is a simple hello_world program listing the threads working? Does it work > without the h_vmem limit? > > > > #$ -o couple.out > > #$ -pe ompi* 6 > > Which PEs can be addressed here? What are their allocation rules (looks > like you need "$pe_slots"). > > What version of SGE? > What version of Open MPI? > Compiled with --with-sge? > > For me it's working in either way. > > -- Reuti > > > > ./app > > > > error message: > > error: executing task of job 6777095 failed: > > [cl231:23777] ERROR: A daemon on node cl231 failed to start as expected. > > [cl231:23777] ERROR: There may be more information available from > > [cl231:23777] ERROR: the 'qstat -t' command on the Grid Engine tasks. > > [cl231:23777] ERROR: If the problem persists, please restart the > > [cl231:23777] ERROR: Grid Engine PE job > > [cl231:23777] ERROR: The daemon exited unexpectedly with status 1. > > > > Thanks for any help! > > > > Pengcheng > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/25141.php > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ------------------------------ > > End of users Digest, Vol 2966, Issue 1 > **************************************