Hi Ralph,

Besides the items in the other mail, I have three more items that would need 
resolving at some point.

1. STDOUT/STDERR currently go to the orte-dvm console.
   I'm sure this is not a fundamental limitation.
   Even if getting the information to the orte-submit instance would be 
problematic, the orte-dvm writing this to a file per session would be good 
enough too.

2. Failing applications currently tear down the dvm.
   Ideally that would not be the case, and this would be handled in relation to 
item (3).
   Possibly this needs to be configurable, if others would like to see 
different behaviour.

3. orte-submit doesn't return the exit code of the application.

To be clear, I realise the current implementation is a proof of concept, so 
these are no complaints, just wishes of where I hope to see this going!

FWIW: these items might require less intricate knowledge of OMPI in general, so 
with some pointers/guidance I can probably work on these myself if needed.

Cheers,

Mark 

ps. I did a quick-and-dirty integration with our own tool and the ORTE 
abstraction maps like a charm!
    
(https://github.com/radical-cybertools/radical.pilot/commit/2d36e886081bf8531097edfc95ada1826257e460)

> On 03 Feb 2015, at 20:38 , Mark Santcroos <mark.santcr...@rutgers.edu> wrote:
> 
> Hi Ralph,
> 
>> On 03 Feb 2015, at 16:28 , Ralph Castain <r...@open-mpi.org> wrote:
>> I think I fixed some of the handshake issues - please give it another try.
>> You should see orte-submit properly shutdown upon completion,
> 
> Indeed, it works on my laptop now! Great!
> It feels quite fast too, for sort tasks :-)
> 
>> and orte-dvm properly shutdown when sent the terminate cmd.
> 
> ACK. This also works as expected.
> 
>> I was able to cleanly run MPI jobs on my laptop.
> 
> Do you also see the following errors/warnings on the dvm side?
> 
> [netbook:28324] [[20896,0],0] Releasing job data for [INVALID]
> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI mark@netbook 
> Distribution, ident: 1.9.0a1, repo rev: dev-811-g7299cc3, Unreleased 
> developer copy, 132)
> [netbook:28324] sess_dir_finalize: proc session dir does not exist
> [netbook:28324] [[20896,0],0] dvm: job [20896,20] has completed
> [netbook:28324] [[20896,0],0] Releasing job data for [20896,20]
> 
> The "INVALID" message is there for every "submit", the sess_dir_finalize 
> exists per instance/core.
> Is that something to worry about, that needs fixing or is that a 
> configuration issue?
> 
> I haven't been able to test on Edison because of maintenance 
> (today+tomorrow), so I will report on that later.
> 
> Thanks again!
> 
> Mark

Reply via email to