Just checking if there's some solution for this. Thank you, Saliya
On Tue, Mar 11, 2014 at 10:54 PM, Saliya Ekanayake <esal...@gmail.com>wrote: > I forgot to mention that I tried the hello.c version instead of Java and > it too failed in a similar manner, but > > 1. On a single node with --mca btl ^tcp it went up to 24 procs before > failing > 2. On 8 nodes with --mca btl ^tcp it could go only up to 16 procs > > > On Tue, Mar 11, 2014 at 5:06 PM, Saliya Ekanayake <esal...@gmail.com>wrote: > >> I just tested with "ml" turned off as you suggested, but unfortunately it >> didn't solve the issue. >> >> However, I found that by explicitly setting --mca btl ^tcp the code >> worked on upto 4 nodes with each running 8 procs. If I don't specify this >> it'll simply fail even on one node with 8 procs. >> >> Thank you, >> Saliya >> >> >> On Tue, Mar 11, 2014 at 4:35 PM, Jeff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> >>> Looks like we still have a bug in one of our components -- can you try: >>> >>> mpirun --mca coll ^ml ... >>> >>> This will deactivate the "ml" collective component. See if that enables >>> you to run (this particular component has nothing to do with Java). >>> >>> >>> On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake <esal...@gmail.com> wrote: >>> >>> > Just tested that this happens even with the simple Hello.java program >>> given in OMPI distribution. >>> > >>> > I've made a tarball containing details of the error adhering to >>> http://www.open-mpi.org/community/help/. Please let me know if I have >>> missed any info necessary. >>> > >>> > Thank you, >>> > Saliya >>> > >>> > >>> > >>> > >>> > On Mon, Mar 10, 2014 at 10:46 AM, Jeff Squyres (jsquyres) < >>> jsquy...@cisco.com> wrote: >>> > Greetings, and thanks for trying out our Java bindings. >>> > >>> > Can you provide some more details? E.g., is there a particular >>> program you're running that incurs these problems? Or is there even a >>> particular MPI function that you're using that results in this segv (e.g., >>> perhaps we have a specific bug somewhere)? >>> > >>> > Can you reduce the segv to a small example that we can reproduce (and >>> therefore fix)? >>> > >>> > >>> > On Mar 10, 2014, at 12:05 AM, Saliya Ekanayake <esal...@gmail.com> >>> wrote: >>> > >>> > > Hi, >>> > > >>> > > I have 8 nodes each with 2 quad core sockets. Also, the nodes have >>> IB connectivity. I am trying to run OMPI Java binding in OMPI trunk >>> revision 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV >>> error as below. >>> > > >>> > > I wonder if you have any suggestion to resolve this? >>> > > >>> > > Thank you, >>> > > Saliya >>> > > >>> > > # A fatal error has been detected by the Java Runtime Environment: >>> > > # >>> > > # SIGSEGV (0xb) at pc=0x000000313867b75b, pid=12229, >>> tid=47864973515072 >>> > > # >>> > > # JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build >>> 1.8.0-ea-b118) >>> > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode >>> linux-amd64 compressed oops) >>> > > # Problematic frame: >>> > > # C [libc.so.6+0x7b75b] memcpy+0x15b >>> > > >>> > > >>> > > -- >>> > > Saliya Ekanayake esal...@gmail.com >>> > > http://saliya.org >>> > > _______________________________________________ >>> > > users mailing list >>> > > us...@open-mpi.org >>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > >>> > >>> > -- >>> > Jeff Squyres >>> > jsquy...@cisco.com >>> > For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> > >>> > _______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > >>> > >>> > >>> > -- >>> > Saliya Ekanayake esal...@gmail.com >>> > Cell 812-391-4914 Home 812-961-6383 >>> > http://saliya.org >>> > <hellobug.tar.gz>_______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Saliya Ekanayake esal...@gmail.com >> Cell 812-391-4914 Home 812-961-6383 >> http://saliya.org >> > > > > -- > Saliya Ekanayake esal...@gmail.com > Cell 812-391-4914 Home 812-961-6383 > http://saliya.org > -- Saliya Ekanayake esal...@gmail.com Cell 812-391-4914 Home 812-961-6383 http://saliya.org