Hugh --

We are actually unable to replicate the problem; we've run some single-threaded and multi-threaded apps with no problems. This is unfortunately probably symptomatic of bugs that are still remaining in the code. :-(

Can you try disabling MPI progress threads (I believe that tcp may be the only BTL component that has async progress support implemented anyway; sm *may*, but I'd have to go back and check)? Leave MPI threads enabled (i.e., MPI_THREAD_MULTIPLE) and see if that gets you further.



Hugh Merz wrote:
It's still only lightly tested.  I'm surprised that it totally hangs for
you, though -- what is your simple test program doing?


It just initializes mpi (tried both mpi_init and mpi_init_thread), prints a string and exits. It works fine without thread support compiled into ompi.

It happens with any mpi program I try.

Attaching gdb to each thread of the executable gives:

(original process)
#0  0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
#1  0x401e8609 in __pthread_wait_for_restart_signal () from 
/lib/i686/libpthread.so.0
#2  0x401e4eec in pthread_cond_wait () from /lib/i686/libpthread.so.0
#3  0x40bda418 in mca_oob_tcp_msg_wait () from 
/opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_oob_tcp.so

(thread 1)
#0  0x420e01a7 in poll () from /lib/i686/libc.so.6
#1  0x401e5c30 in __pthread_manager () from /lib/i686/libpthread.so.0

(thread 2)
#0  0x420e01a7 in poll () from /lib/i686/libc.so.6
#1  0x4013268b in poll_dispatch () from 
/opt/openmpi-1.0rc2_asynch/lib/libopal.so.0
Cannot access memory at address 0x3e8

(thread 3)
#0  0x420dae14 in read () from /lib/i686/libc.so.6
#1  0x401f3b18 in __DTOR_END__ () from /lib/i686/libpthread.so.0
#2  0x40c8dfe3 in mca_btl_sm_component_event_thread ()
    from /opt/openmpi-1.0rc2_asynch/lib/openmpi/mca_btl_sm.so

And there are also 2 additional threads spawned by each of mpirun and orted.

Any clues or hints on how to debug this would be appreciated, but I understand that it is probably not high priority right now.

Thanks,

Hugh


Hugh Merz wrote:

Howdy,

  I tried installing the release candidate with thread support
enabled ( --enable-mpi-threads and --enable-progress-threads ) using an
old rh7.3 install and a recent fc4 install (Intel compilers). When I try
to run a simple test program, the executable, mpirun and orted all sleep
in what appears to be a deadlock.  If I compile ompi without threads
everything works fine.

  The faq states that thread support has only been lightly tested, and
there was only brief discussion about it in the maillist 8 months ago -
have there been any developments, and should I expect it to work properly?

Thanks,

Hugh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to