Any tips ? Anyone ? :(
Ashika Umanga Umagiliya wrote:
One more modification , I do not call MPI_Finalize() from the
"libParallel.so" library.
Ashika Umanga Umagiliya wrote:
Greetings all,
After some reading , I found out that I have to build openMPI using
"--enable-mpi-threads"
After thatm I changed MPI_INIT() code in my "libParallel.so" and in
"parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to :
int sup;
MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);
Now when multiple requests comes (multiple threads) MPI gives
following two errors:
"<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data
unpack would read past end of buffer in file dpm_orte.c at line
299</stddiag>
[umanga:6127] *** An error occurred in MPI_Comm_spawn
[umanga:6127] *** on communicator MPI_COMM_SELF
[umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
[umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 6127 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
"
or sometimes :
"[umanga:5477] *** An error occurred in MPI_Comm_spawn
[umanga:5477] *** on communicator MPI_COMM_SELF
[umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
[umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
<stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data
unpack would read past end of buffer in file dpm_orte.c at line
299</stddiag>
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 5477 on
node umanga exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------"
Any tips ?
Thank you
Ashika Umanga Umagiliya wrote:
Greetings all,
Please refer to image at:
http://i27.tinypic.com/mtqurp.jpg
Here the process illustrated in the image:
1) C++ Webservice loads the "libParallel.so" when it starts up.
(dlopen)
2) When a new request comes from a client,*new thread* is created,
SOAP data is bound to C++ objects and calcRisk() method of
webservice invoked.Inside this method, "calcRisk()" of "libParallel"
is invoked (using dlsym ..etc)
3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr"
MPI application.
(I am using boost MPI and boost serializarion to send
custom-data-types across spawned processes.)
4) "parallel-svr" (MPI Application in image) execute the parallel
logic and send the result back to "libParallel.so" using boost MPI
send..etc.
5) "libParallel.so" send the result to webservice,bind into SOAP and
sent result to client and the thread ends.
My problem is :
Everthing works fine for the first request from the client,
For the second request it throws an error (i assume from
libParallel.so") saying:
"--------------------------------------------------------------------------
Calling any MPI-function after calling MPI_Finalize is erroneous.
The only exceptions are MPI_Initialized, MPI_Finalized and
MPI_Get_version.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[umanga:19390] Abort after MPI_FINALIZE completed successfully; not
able to guarantee that all other processes were killed!"
Is this because of multithreading ? Any idea how to fix this ?
Thanks in advance,
umanga