Hi Reuti,

A simple hello_world program works without the h_vmem limit. Honestly, I am
not familiar with Open MPI. The command qconf -spl and qconf -sp ompi give
the information below. But strangely, it begins to work after I insert *unset
SGE_ROOT* in my job script. I don't know why.

However, it still cannot work smoothly through 60hrs I setup. After running
for about two hours, it stops without any error messages. Is this related
to the h_vemem limit?

*$ qconf -spl*
16per
1per
2per
4per
hadoop
make
ompi
openmp

*$ qconf -sp ompi*
pe_name           ompi
slots             9999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

SGE version: 6.1u6
Open MPI version: 1.2.9

*Job script updated:*
#$ -S /bin/bash
#$ -N couple
#$ -cwd
#$ -j y
#$ -R y
#$ -l h_rt=62:00:00
#$ -l h_vmem=2G
#$ -o couple.out
#$ -e couple.err
#$ -pe ompi* 8
*unset SGE_ROOT*
   ./app

Thanks,
Pengcheng

On Sun, Aug 24, 2014 at 1:00 PM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
>         us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
>         users-requ...@open-mpi.org
>
> You can reach the person managing the list at
>         users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>    1. Re: A daemon on node cl231 failed to start as expected (Reuti)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 23 Aug 2014 18:49:38 +0200
> From: Reuti <re...@staff.uni-marburg.de>
> To: Open MPI Users <us...@open-mpi.org>
> Subject: Re: [OMPI users] A daemon on node cl231 failed to start as
>         expected
> Message-ID:
>         <8f21a4d9-9e8d-4e20-9ae6-04a495a33...@staff.uni-marburg.de>
> Content-Type: text/plain; charset=windows-1252
>
> Hi,
>
> Am 23.08.2014 um 16:09 schrieb Pengcheng Wang:
>
> > I need to run a single driver program that only require one proc with
> the command mpirun -np 1 ./app or ./app. But it will schedule the launch of
> other executable files including parallel and sequential computing. So I
> require more than one proc to run it. It can be run smoothly as an
> interactive job with the command below.
> >
> > qrsh -cwd -pe "ompi*" 6 -l h_rt=00:30:00,test=true ./app
> >
> > But after I submitted the job, a strange error occurred and it
> stopped... Please find the job script and error message below:
> >
> > ? job submission script:
> > #$ -S /bin/bash
> > #$ -N couple
> > #$ -cwd
> > #$ -j y
> > #$ -l h_rt=05:00:00
> > #$ -l h_vmem=2G
>
> Is a simple hello_world program listing the threads working? Does it work
> without the h_vmem limit?
>
>
> > #$ -o couple.out
> > #$ -pe ompi*  6
>
> Which PEs can be addressed here? What are their allocation rules (looks
> like you need "$pe_slots").
>
> What version of SGE?
> What version of Open MPI?
> Compiled with --with-sge?
>
> For me it's working in either way.
>
> -- Reuti
>
>
> >     ./app
> >
> > error message:
> > error: executing task of job 6777095 failed:
> > [cl231:23777] ERROR: A daemon on node cl231 failed to start as expected.
> > [cl231:23777] ERROR: There may be more information available from
> > [cl231:23777] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> > [cl231:23777] ERROR: If the problem persists, please restart the
> > [cl231:23777] ERROR: Grid Engine PE job
> > [cl231:23777] ERROR: The daemon exited unexpectedly with status 1.
> >
> > Thanks for any help!
> >
> > Pengcheng
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/08/25141.php
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ------------------------------
>
> End of users Digest, Vol 2966, Issue 1
> **************************************

Reply via email to