Only the obvious, and not very helpful one: comm_spawn isn't thread
safe at this time. You'll need to serialize your requests to that
function.
I believe the thread safety constraints within OMPI are discussed to
some extent on the FAQ site. At the least, they have been discussed in
some depth on this mailing list several times. Might be some further
nuggets of advice on workarounds in there.
On Sep 16, 2009, at 7:37 PM, Ashika Umanga Umagiliya wrote:
Any tips ? Anyone ? :(
Ashika Umanga Umagiliya wrote:
One more modification , I do not call MPI_Finalize() from the
"libParallel.so" library.
Ashika Umanga Umagiliya wrote:
Greetings all,
After some reading , I found out that I have to build openMPI
using "--enable-mpi-threads"
After thatm I changed MPI_INIT() code in my "libParallel.so" and
in "parallel-svr" (please refer to http://i27.tinypic.com/
mtqurp.jpg ) to :
int sup;
MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);
Now when multiple requests comes (multiple threads) MPI gives
following two errors:
"<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG:
Data unpack would read past end of buffer in file dpm_orte.c at
line 299</stddiag>
[umanga:6127] *** An error occurred in MPI_Comm_spawn
[umanga:6127] *** on communicator MPI_COMM_SELF
[umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
[umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 6127 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
"
or sometimes :
"[umanga:5477] *** An error occurred in MPI_Comm_spawn
[umanga:5477] *** on communicator MPI_COMM_SELF
[umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
[umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
<stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data
unpack would read past end of buffer in file dpm_orte.c at line
299</stddiag>
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 5477 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------"
Any tips ?
Thank you
Ashika Umanga Umagiliya wrote:
Greetings all,
Please refer to image at:
http://i27.tinypic.com/mtqurp.jpg
Here the process illustrated in the image:
1) C++ Webservice loads the "libParallel.so" when it starts up.
(dlopen)
2) When a new request comes from a client,*new thread* is
created, SOAP data is bound to C++ objects and calcRisk() method
of webservice invoked.Inside this method, "calcRisk()" of
"libParallel" is invoked (using dlsym ..etc)
3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr"
MPI application.
(I am using boost MPI and boost serializarion to send custom-data-
types across spawned processes.)
4) "parallel-svr" (MPI Application in image) execute the parallel
logic and send the result back to "libParallel.so" using boost
MPI send..etc.
5) "libParallel.so" send the result to webservice,bind into SOAP
and sent result to client and the thread ends.
My problem is :
Everthing works fine for the first request from the client,
For the second request it throws an error (i assume from
libParallel.so") saying:
"--------------------------------------------------------------------------
Calling any MPI-function after calling MPI_Finalize is erroneous.
The only exceptions are MPI_Initialized, MPI_Finalized and
MPI_Get_version.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:19390] Abort after MPI_FINALIZE completed successfully;
not able to guarantee that all other processes were killed!"
Is this because of multithreading ? Any idea how to fix this ?
Thanks in advance,
umanga
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users