Oh come on, Howard - before you go dumping more components into the system, 
let’s explore WHY he hit this problem.

Geez…

> On Mar 25, 2015, at 9:16 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
> 
> kind of working fine.  I don't like users having to add these kind of 
> specialized --mca settings
> just to get something to work.  sounds like time for yet another cray 
> specific component.
> 
> 
> 
> 2015-03-25 10:14 GMT-06:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>>:
> It’s working just fine, Howard - we found the problem.
> 
>> On Mar 25, 2015, at 9:12 AM, Howard Pritchard <hpprit...@gmail.com 
>> <mailto:hpprit...@gmail.com>> wrote:
>> 
>> Mark,
>> 
>> If you're wanting to use the orte-submit feature, you will need to get 
>> mpirun working.
>> 
>> Could you rerun using the mpirun launch method but with
>> 
>> --mca oob_base_verbose 10 --mca ess_base_verbose 2
>> 
>> set?
>> 
>> 
>> Also, you may want to make sure you are using the ipogif0 eth device. This
>> can be controlled using the oob_tcp_if_include mca parameter, i.e.
>> 
>> mpirun --mca oob_tcp_if_include ipogif0
>> 
>> I'm assuming your use case doesn't require connectivity between processes
>> running on the compute nodes and some external service in suggesting this
>> option.
>> 
>> 2015-03-25 8:14 GMT-06:00 Mark Santcroos <mark.santcr...@rutgers.edu 
>> <mailto:mark.santcr...@rutgers.edu>>:
>> Hi Howard,
>> 
>> > On 25 Mar 2015, at 14:58 , Howard Pritchard <hpprit...@gmail.com 
>> > <mailto:hpprit...@gmail.com>> wrote:
>> > How are you building ompi?
>> 
>> My configure is rather straightforward:
>> ./configure --prefix=$OMPI_PREFIX --disable-getpwuid
>> 
>> Maybe I got spoiled on Hopper/Edison and I need more explicit configuration 
>> on BW ...
>> 
>> >  Also what happens if you use. aprun.
>> 
>> Not sure if you meant in combination with mpirun or not, so I'll provide 
>> both:
>> 
>> > aprun -n2 ./a.out
>> Hello from rank 1, thread 0, on nid16869. (core affinity = 0)
>> Hello from rank 0, thread 0, on nid16868. (core affinity = 0)
>> After sleep from rank 1, thread 0, on nid16869. (core affinity = 0)
>> After sleep from rank 0, thread 0, on nid16868. (core affinity = 0)
>> Application 23791589 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks 
>> ~13229, outblocks ~66
>> 
>> > aprun -n2 mpirun ./a.out
>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>> directory
>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>> directory
>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>> Application 23791590 exit codes: 1
>> Application 23791590 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks 
>> ~9596, outblocks ~478
>> 
>> > I work with ompi on the nersc edison and hopper daily.
>> 
>> I use Edison and Hopper too, and there it works for me indeed.
>> 
>> > typically i use aprun though.
>> 
>> I want to use orte-submit and friends, so I "explicitly" don't want to use 
>> aprun.
>> 
>> > you definitely dont need to use ccm.
>> > and shouldnt.
>> 
>> Depends on the use-case, but happy to leave that out of scope for now :-)
>> 
>> Thanks!
>> 
>> Mark
>> 
>> 
>> >
>> > On Mar 25, 2015 6:00 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu 
>> > <mailto:mark.santcr...@rutgers.edu>> wrote:
>> > Hi,
>> >
>> > Any users of Open MPI on Blue Waters here?
>> > And then I specifically mean in "native" mode, not inside CCM.
>> >
>> > After configuring and building as I do on other Cray's, mpirun gives me 
>> > the following:
>> > [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>> > file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>> > [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>> > file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>> >
>> > Version is the latest and greatest from git.
>> >
>> > So I'm interested to hear whether people have been successful on Blue 
>> > Waters and/or whether the error rings a bell for people.
>> >
>> > Thanks!
>> >
>> > Mark
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2015/03/26505.php 
>> > <http://www.open-mpi.org/community/lists/users/2015/03/26505.php>
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2015/03/26506.php 
>> > <http://www.open-mpi.org/community/lists/users/2015/03/26506.php>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26507.php 
>> <http://www.open-mpi.org/community/lists/users/2015/03/26507.php>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26520.php 
>> <http://www.open-mpi.org/community/lists/users/2015/03/26520.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26521.php 
> <http://www.open-mpi.org/community/lists/users/2015/03/26521.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26522.php

Reply via email to