On Jan 30, 2009, at 4:54 PM, Dirk Eddelbuettel wrote:
| > where things end in the loop over oapl_list() elements. I still
see a
| > fprintf() statment just before
| >
| > if (MCA_SUCCESS == component->mca_register_component_params()) {
| >
| > in the middle of the open_components function in the file
| > mca_base_components_open.c
|
| Do you know if component is non-NULL and has a sensible value (i.e.,
| pointing to a valid component)?
Do not. Everything (in particular below /etc/openmpi/) is at default
values
with the sole exception of
# edd 18 Dec 2008
mca_component_show_load_errors = 0
Could that kill it? [ Goes off and tests... ] No, still dies with
segfault
in open_components.
FWIW: mca_component_show_load_errors should only affect conditional
output of some warning messages.
| Does ompi_info work? (ompi_info uses this exact same code to find/
| open components) If ompi_info fails, you should be able to attach a
| debugger to that, since it's a serial and [relatively]
straightforward
| app.
Yes, ompi_info happily runs and returns around 111 lines. It seems
to loop
over around 25 mca components.
Open MPI is otherwise healthy and happy. It's just that Rmpi does
not get
along with Open MPI 1.3 .... but this happens to be my personal use-
case :-/
Quite puzzling. This portion of the code has already successfully
opened the components and is looping over a list of the components
that were found. It *sounds* like that list has somehow gotten
corrupted.
Is there any way you can check that the values of component and
component->mca_register_component_params are non-NULL / valid?
FWIW, component should be a pointer to the struct that we use to
represent plugins; it's a member of the list element from the list of
found components. Here's some code from right above the problematic
line:
for (item = opal_list_get_first(src);
opal_list_get_end(src) != item;
item = opal_list_get_next(item)) {
cli = (mca_base_component_list_item_t *) item;
component = cli->cli_component;
So you might want to examine cli as well and ensure that it has
sensible values (the casting trick that we do is fairly common in the
OMPI code base -- the list item is the first data member of the
mca_base_component_list_item_t, so we can cast to/from it as required).
--
Jeff Squyres
Cisco Systems