Hi,

I just would like to known if this functionality (a prefix field in hostfile
if i understand well ) has been integrated in the 1.2.4 ??
Thanks for your answer

------- On Mar 22, 2007, at 10:38 AM, Ralph Castain wrote:

We had a nice chat about this on the OpenRTE telecon this morning. The
question of what to do with multiple prefix's has been a long-running issue,

most recently captured in bug trac report #497. The problem is that prefix
is intended to tell us where to find the ORTE/OMPI executables, and
therefore is associated with a node - not an app_context. What we haven't
been able to define is an appropriate notation that a user can exploit to
tell us the association.

This issue has arisen on several occasions where either (a) users have
heterogeneous clusters with a common file system, so the prefix must be
adjusted on each *type* of node to point to the correct type of binary; and
(b) for whatever reason, typically on rsh/ssh clusters, users have installed

the binaries in different locations on some of the nodes. In this latter
case, the reports have been from homogeneous clusters, so the *type* of
binary was never the issue - it just wasn't located where we expected.

Sun's solution is (I believe) what most of us would expect - they locate
their executables in the same relative location on all their nodes. The
binary in that location is correct for that local architecture. This
requires, though, that the "prefix" location not be on a common file system.


Unfortunately, that isn't the case with LANL's roadrunner, nor can we expect

that everyone will follow that sensible approach :-). So we need a notation
to support the "exception" case where someone needs to truly specify prefix
versus node(s).

We discussed a number of options, including auto-detecting the local arch
and appending it to the specified "prefix" and several others. After
discussing them, those of us on the call decided that adding a field to the
hostfile that specifies the prefix to use on that host would be the best
solution. This could be done on a cluster-level basis, so - although it is
annoying to create the data file - at least it would only have to be done
once.

Again, this is the exception case, so requiring a little inconvenience seems

a reasonable thing to do.

Anyone have heartburn and/or other suggestions? If not, we might start to
play with this next week. We would have to do some small modifications to
the RAS, RMAPS, and PLS components to ensure that any multi-prefix info gets

correctly propagated and used across all platforms for consistent behavior.

Ralph

Reply via email to