Daryl -
I'm unable to replicate your problem. I was testing on a Fedora Core
3 system with Clustermatic 5. Is is possible that you have a random
dso from a previous build in your installation path? How are you
running mpirun -- maybe I'm just not hitting the same code path you
are...
Thanks,
Brian
On Nov 17, 2005, at 8:17 AM, Daryl W. Grunau wrote:
Date: Tue, 15 Nov 2005 08:43:58 -0800
From: Jeff Squyres <jsquy...@open-mpi.org>
Subject: Re: [O-MPI users] OMPI 1.0 rc6 --with-bproc errors
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <de7cd3a86b5a3e18ca88a83925c58...@open-mpi.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed
Daryl --
I don't think that anyone directly replied to you, but I saw some
commits fixing this yesterday (actually, they were already on the
trunk; we forgot to bring them over to the v1.0 branch). Could you
give this morning's nightly snapshot tarball a whirl?
On Nov 14, 2005, at 10:30 AM, Daryl W. Grunau wrote:
[[ snip ]]
Jeff, thanks for the reply. I was able to compile rc7 but now am
getting a
core dump from orterun. Here's the gdb output:
bluesteel> gdb /opt/OpenMPI/openmpi-1.0rc7/ib/bin/orterun core.11247
GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License,
and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
libthread_db library "/lib64/tls/libthread_db.so.1".
Core was generated by `orterun -H
200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215 -np'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib64/libbproc.so.4...done.
Loaded symbols for /usr/lib64/libbproc.so.4
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libaio.so.1...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libutil.so.1...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/tls/libpthread.so.0...done.
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
#0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1)
at totalview.c:267
267 totalview.c: No such file or directory.
in totalview.c
(gdb) where
#0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1)
at totalview.c:267
#1 0x0000000000417158 in job_state_callback (jobid=1, state=3
'\003') at orterun.c:582
#2 0x0000000000463c21 in orte_rmgr_urm_callback (data=0x7a9440,
cbdata=Variable "cbdata" is not available.
) at rmgr_urm.c:253
#3 0x0000000000420d28 in orte_gpr_replica_deliver_notify_msg
(msg=0x7a94a0)
at gpr_replica_deliver_notify_msg_api.c:141
#4 0x00000000004269f1 in orte_gpr_replica_process_callbacks () at
gpr_replica_messaging_fn.c:78
#5 0x000000000042d7a5 in orte_gpr_replica_recv (status=Variable
"status" is not available.
) at gpr_replica_recv_proxy_msgs.c:85
#6 0x0000000000451e59 in mca_oob_recv_callback (status=2326,
peer=0x812f90, msg=0x758c80, count=Variable "count" is not available.
)
at oob_base_recv_nb.c:159
#7 0x0000000000456308 in mca_oob_tcp_msg_recv_complete
(msg=0x5e7210, peer=Variable "peer" is not available.
) at oob_tcp_msg.c:461
#8 0x0000000000457e9f in mca_oob_tcp_peer_recv_handler
(sd=Variable "sd" is not available.
) at oob_tcp_peer.c:733
#9 0x000000000047795d in opal_event_loop (flags=1) at event.c:428
#10 0x000000000047ceb3 in opal_progress () at opal_progress.c:256
#11 0x0000000000416b45 in opal_condition_wait (c=0x5d0700,
m=0x5d06c0) at condition.h:74
#12 0x000000000041687e in orterun (argc=6, argv=0x7ffffffff3c8) at
orterun.c:384
#13 0x0000000000416223 in main (argc=6, argv=0x7ffffffff3c8) at
main.c:13
I'm presently trying to build/run rc8 to see if it also has
problems - I'll
report what I find.
Daryl
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users