On 11/17/2010 09:32 AM, Ralph Castain wrote:
Cris' output is coming solely from the HNP, which is correct given the way things were executed. My comment was from another email where he did what I asked, which was to include the flags:

--report-bindings --leave-session-attached

so we could see the output from each orted. In that email, it was clear that while mpirun was bound to multiple cores, the orteds are being bound to a -single- core.

Hence the problem.

Hmm, I see Ralph's comment on 11/15 but I don't see any output that shows what Ralph say's above. The only report-bindings output I see is when he runs without OGE binding. Can someone give me the date and time of Chris' email with the --report-bindings and --leave-session-attached. Or a rerun of the below with the --leave-session-attached option would also help.

I find it confusing that --leave-session-attached is not required when the OGE binding argument is not given.

--td
HTH
Ralph


On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje <terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

    On 11/17/2010 07:41 AM, Chris Jewell wrote:
    On 17 Nov 2010, at 11:56, Terry Dontje wrote:
    You are absolutely correct, Terry, and the 1.4 release series does include 
the proper code. The point here, though, is that SGE binds the orted to a 
single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.
    I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?
    As requested using

    $ qsub -pe mpi 8 -binding linear:2 myScript.com'

    and

    'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

    [exec5:06671] System has detected external process binding to cores 0028
    [exec5:06671] ras:gridengine: JOB_ID: 59434
    [exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
    [exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=2
    [exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
    [exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
    [exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
    [exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
    [exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1

    No more info.  I note that the external binding is slightly different to 
what I had before, but our cluster is busier today :-)

    I would have expected more output.

    --td

    Chris


    --
    Dr Chris Jewell
    Department of Statistics
    University of Warwick
    Coventry
    CV4 7AL
    UK
    Tel: +44 (0)24 7615 0778






    _______________________________________________
    users mailing list
    us...@open-mpi.org  <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users


-- Oracle
    Terry D. Dontje | Principal Software Engineer
    Developer Tools Engineering | +1.781.442.2631
    Oracle *- Performance Technologies*
    95 Network Drive, Burlington, MA 01803
    Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to