We'll follow up on this github issue. Alexander -- thanks for the bug report. If you'd like to follow the progress of this issue, comment on https://github.com/open-mpi/ompi/issues/369.
> On Feb 1, 2015, at 5:08 PM, Oscar Vega-Gisbert <ov...@dsic.upv.es> wrote: > > Hi, > > I created an issue with a simplified example: > > https://github.com/open-mpi/ompi/issues/369 > > Regards, > Oscar > > > El 25/01/15 a las 19:36, Oscar Vega-Gisbert escribió: >> Hi, >> >> I also reproduce this behaviour. But I think this crash is not related with >> garbage collector. Java is much better than you think. >> >> May be MPI corrupts the Java runtime heap. >> >> Regards, >> Oscar >> >> El 22/01/15 a las 08:07, Gilles Gouaillardet escribió: >>> Alexander, >>> >>> i was able to reproduce this behaviour. >>> >>> basically, bad things happen when the garbage collector is invoked ... >>> i was even able to reproduce some crashes (but that happen at random >>> stages) very early in the code >>> by manually inserting calls to the garbage collector (e.g. System.gc();) >>> >>> Cheers, >>> >>> Gilles >>> >>> On 2015/01/19 9:03, Alexander Daryin wrote: >>>> Hi >>>> >>>> I am using Java MPI bindings and periodically get fatal erros. This is >>>> illustrated by the following model Java program. >>>> >>>> import mpi.MPI; >>>> import mpi.MPIException; >>>> import mpi.Prequest; >>>> import mpi.Request; >>>> import mpi.Status; >>>> >>>> import java.nio.ByteBuffer; >>>> import java.util.Random; >>>> >>>> public class TestJavaMPI { >>>> >>>> private static final int NREQ = 16; >>>> private static final int BUFFSIZE = 0x2000; >>>> private static final int NSTEP = 1000000000; >>>> >>>> public static void main(String... args) throws MPIException { >>>> MPI.Init(args); >>>> Random random = new Random(); >>>> Prequest[] receiveRequests = new Prequest[NREQ]; >>>> Request[] sendRequests = new Request[NREQ]; >>>> ByteBuffer[] receiveBuffers = new ByteBuffer[NREQ]; >>>> ByteBuffer[] sendBuffers = new ByteBuffer[NREQ]; >>>> for(int i = 0; i < NREQ; i++) { >>>> receiveBuffers[i] = MPI.newByteBuffer(BUFFSIZE); >>>> sendBuffers[i] = MPI.newByteBuffer(BUFFSIZE); >>>> receiveRequests[i] = MPI.COMM_WORLD.recvInit(receiveBuffers[i], >>>> BUFFSIZE, MPI.BYTE, MPI.ANY_SOURCE, MPI.ANY_TAG); >>>> receiveRequests[i].start(); >>>> sendRequests[i] = MPI.COMM_WORLD.iSend(sendBuffers[i], 0, >>>> MPI.BYTE, MPI.PROC_NULL, 0); >>>> } >>>> for(int step = 0; step < NSTEP; step++) { >>>> if( step % 128 == 0 ) System.out.println(step); >>>> int index; >>>> do { >>>> Status status = Request.testAnyStatus(receiveRequests); >>>> if( status != null ) >>>> receiveRequests[status.getIndex()].start(); >>>> index = Request.testAny(sendRequests); >>>> } while( index == MPI.UNDEFINED ); >>>> sendRequests[index].free(); >>>> sendRequests[index] = MPI.COMM_WORLD.iSend(sendBuffers[index], >>>> BUFFSIZE, MPI.BYTE, >>>> random.nextInt(MPI.COMM_WORLD.getSize()), 0); >>>> } >>>> MPI.Finalize(); >>>> } >>>> } >>>> >>>> On Linux, this produces a segfault after about a million steps. On OS X, >>>> instead of segfault it prints the following error message >>>> >>>> java(64053,0x127e4d000) malloc: *** error for object 0x7f80eb828808: >>>> incorrect checksum for freed object - object was probably modified after >>>> being freed. >>>> *** set a breakpoint in malloc_error_break to debug >>>> [mbp:64053] *** Process received signal *** >>>> [mbp:64053] Signal: Abort trap: 6 (6) >>>> [mbp:64053] Signal code: (0) >>>> [mbp:64053] [ 0] 0 libsystem_platform.dylib 0x00007fff86b5ff1a _sigtramp >>>> + 26 >>>> [mbp:64053] [ 1] 0 ??? 0x0000000000000000 0x0 + 0 >>>> [mbp:64053] [ 2] 0 libsystem_c.dylib 0x00007fff80c7bb73 abort + 129 >>>> [mbp:64053] [ 3] 0 libsystem_malloc.dylib 0x00007fff8c26ce06 szone_error >>>> + 625 >>>> [mbp:64053] [ 4] 0 libsystem_malloc.dylib 0x00007fff8c2645c8 >>>> small_free_list_remove_ptr + 154 >>>> [mbp:64053] [ 5] 0 libsystem_malloc.dylib 0x00007fff8c2632bf >>>> szone_free_definite_size + 1856 >>>> [mbp:64053] [ 6] 0 libjvm.dylib 0x000000010e257d89 _ZN2os4freeEPvt + 63 >>>> [mbp:64053] [ 7] 0 libjvm.dylib 0x000000010dea2b0a >>>> _ZN9ChunkPool12free_all_butEm + 136 >>>> [mbp:64053] [ 8] 0 libjvm.dylib 0x000000010e30ab33 >>>> _ZN12PeriodicTask14real_time_tickEi + 77 >>>> [mbp:64053] [ 9] 0 libjvm.dylib 0x000000010e3372a3 >>>> _ZN13WatcherThread3runEv + 267 >>>> [mbp:64053] [10] 0 libjvm.dylib 0x000000010e25d87e >>>> _ZL10java_startP6Thread + 246 >>>> [mbp:64053] [11] 0 libsystem_pthread.dylib 0x00007fff8f1402fc >>>> _pthread_body + 131 >>>> [mbp:64053] [12] 0 libsystem_pthread.dylib 0x00007fff8f140279 >>>> _pthread_body + 0 >>>> [mbp:64053] [13] 0 libsystem_pthread.dylib 0x00007fff8f13e4b1 >>>> thread_start + 13 >>>> [mbp:64053] *** End of error message *** >>>> >>>> OpenMPI version is 1.8.4. Java version is 1.8.0_25-b17. >>>> >>>> Best regards, >>>> Alexander Daryin >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/01/26215.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/01/26230.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Searchable archives: > http://www.open-mpi.org/community/lists/users/2015/02/index.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/