One more modification , I do not call MPI_Finalize() from the "libParallel.so" library.

Ashika Umanga Umagiliya wrote:
Greetings all,

After some reading , I found out that I have to build openMPI using "--enable-mpi-threads" After thatm I changed MPI_INIT() code in my "libParallel.so" and in "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to :

 int sup;
MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);

Now when multiple requests comes (multiple threads) MPI gives following two errors:

"<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299</stddiag>
[umanga:6127] *** An error occurred in MPI_Comm_spawn
[umanga:6127] *** on communicator MPI_COMM_SELF
[umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
[umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) --------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 6127 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
"

or  sometimes :

"[umanga:5477] *** An error occurred in MPI_Comm_spawn
[umanga:5477] *** on communicator MPI_COMM_SELF
[umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
[umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
<stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 299</stddiag> --------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 5477 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------"


Any tips ?

Thank you

Ashika Umanga Umagiliya wrote:
Greetings all,

Please refer to image at:
http://i27.tinypic.com/mtqurp.jpg

Here the process illustrated in the image:

1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen)
2) When a new request comes from a client,*new thread* is created, SOAP data is bound to C++ objects and calcRisk() method of webservice invoked.Inside this method, "calcRisk()" of "libParallel" is invoked (using dlsym ..etc) 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI application. (I am using boost MPI and boost serializarion to send custom-data-types across spawned processes.) 4) "parallel-svr" (MPI Application in image) execute the parallel logic and send the result back to "libParallel.so" using boost MPI send..etc. 5) "libParallel.so" send the result to webservice,bind into SOAP and sent result to client and the thread ends.

My problem is :

Everthing works fine for the first request from the client,
For the second request it throws an error (i assume from libParallel.so") saying:

"--------------------------------------------------------------------------
Calling any MPI-function after calling MPI_Finalize is erroneous.
The only exceptions are MPI_Initialized, MPI_Finalized and MPI_Get_version. --------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:19390] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!"


Is this because of multithreading ? Any idea how to fix this ?

Thanks in advance,
umanga



Reply via email to