On Jun 2, 2009, at 3:26 PM, Allen Barnett wrote:

>  Does OMPI say that it has IBV fork support?
> ompi_info --param btl openib --parsable | grep have_fork_support

My RHEL4 system reports:

MCA btl: parameter "btl_openib_want_fork_support" (current value: "-1")
MCA btl: information "btl_openib_have_fork_support" (value: "1")

as does the build installed on the Altix system.


Ok, good. Note, however, that OMPI indicating that it has support simply means that the verbs installed has support for it. It does *not* mean that the underlying kernel supports it.

> Be sure to also see http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork

We're using OMPI 1.2.8.


Good.

> > Also, would MPI_COMM_SPAWN suffer from the same difficulties?
>
> It shouldn't; we proxy the launch of new commands off to mpirun /
> OMPI's run-time system.  Specifically: the new process(es) are not
> POSIX children of the process(es) that called MPI_COMM_SPAWN.

Is a program started with MPI_COMM_SPAWN required to call MPI_INIT?


Yes. OMPI v1.3 has an extension (a specific MPI_Info key) to indicate that the spawned program is not an MPI application, but I do not believe that that existed back in the 1.2 series.

I
guess what I'm asking is if I will have to make my partitioner an
OpenMPI program as well?



If you use MPI_COMM_SPAWN with the 1.2 series, yes.

Another less attractive but functional solution would be to do what I did for the new command notifier due in the OMPI v1.5 series ("notifier" = subsystem to notify external agents when OMPI detects something wrong, like write to the syslog, send an email, write to a sysadmin mysql db, etc., "command" = plugin that simply forks and runs whatever command you want). During MPI_INIT, the fork notifier pre- forks a dummy process. This dummy process then waits for commands via a pipe. When the parent (MPI process itself) wants to fork a child, it sends the argv to exec down the pipe and has the child process actually do the fork and exec.

Proxying all the fork requests through a secondary process like this avoids all the problems with registered memory in the child process. This is icky, but it is an unfortunately necessity for OS-bypass/ registration-based networks like OpenFabrics.

In your case, you'd want to pre-fork before calling MPI_INIT. But the rest of the technique is pretty much the same.

Have a look at the code in this tree if it helps:

    https://svn.open-mpi.org/trac/ompi/browser/trunk/orte/mca/notifier/command

--
Jeff Squyres
Cisco Systems

Reply via email to