Thanks Warner, This is frustrating... I read the ticket. 6 months already and 2 releases postponed... Frankly, I am very skeptical that this will be fixed for 1.3.4. I really hope so, but when 1.3.4 will be released?
I have to think about going with 1.2.x and possible disruptions in my configuration (I use Fink) or wait. And I offered myself to test any nightly snapshot claiming this bug is fixed. Cheers, Alan On Fri, Aug 14, 2009 at 17:20, Warner Yuen <wy...@apple.com> wrote: > Hi Alan, > > Xgrid support for Open MPI is currently broken in the latest version of > Open MPI. See the ticket below. However, I believe that Xgrid still works > with one of the earlier 1.2 versions of Open MPI. I don't recall for sure, > but I think that it's Open MPI 1.2.3. > > #1777: Xgrid support is broken in the v1.3 series > > ---------------------+------------------------------------------------------ > Reporter: jsquyres | Owner: brbarret > Type: defect | Status: accepted > Priority: major | Milestone: Open MPI 1.3.4 > Version: trunk | Resolution: > Keywords: | > > ---------------------+------------------------------------------------------ > Changes (by bbenton): > > * milestone: Open MPI 1.3.3 => Open MPI 1.3.4 > > > Warner Yuen > Scientific Computing > Consulting Engineer > Apple, Inc. > email: wy...@apple.com > Tel: 408.718.2859 > > > > > On Aug 14, 2009, at 6:21 AM, users-requ...@open-mpi.org wrote: > > >> Message: 1 >> Date: Fri, 14 Aug 2009 14:21:30 +0100 >> From: Alan <alanwil...@gmail.com> >> Subject: [OMPI users] openmpi with xgrid >> To: us...@open-mpi.org >> Message-ID: >> <cf58c8d00908140621v18d384f2wef97ee80ca3de...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> >> Hi there, >> I saw that http://www.open-mpi.org/community/lists/users/2007/08/3900.php >> . >> >> I use fink, and so I changed the openmpi.info file in order to get >> openmpi >> with xgrid support. >> >> As you can see: >> amadeus[2081]:~/Downloads% /sw/bin/ompi_info >> Package: Open MPI root@amadeus.local Distribution >> Open MPI: 1.3.3 >> Open MPI SVN revision: r21666 >> Open MPI release date: Jul 14, 2009 >> Open RTE: 1.3.3 >> Open RTE SVN revision: r21666 >> Open RTE release date: Jul 14, 2009 >> OPAL: 1.3.3 >> OPAL SVN revision: r21666 >> OPAL release date: Jul 14, 2009 >> Ident string: 1.3.3 >> Prefix: /sw >> Configured architecture: x86_64-apple-darwin9 >> Configure host: amadeus.local >> Configured by: root >> Configured on: Fri Aug 14 12:58:12 BST 2009 >> Configure host: amadeus.local >> Built by: >> Built on: Fri Aug 14 13:07:46 BST 2009 >> Built host: amadeus.local >> C bindings: yes >> C++ bindings: yes >> Fortran77 bindings: yes (single underscore) >> Fortran90 bindings: yes >> Fortran90 bindings size: small >> C compiler: gcc >> C compiler absolute: /sw/var/lib/fink/path-prefix-10.6/gcc >> C++ compiler: g++ >> C++ compiler absolute: /sw/var/lib/fink/path-prefix-10.6/g++ >> Fortran77 compiler: gfortran >> Fortran77 compiler abs: /sw/bin/gfortran >> Fortran90 compiler: gfortran >> Fortran90 compiler abs: /sw/bin/gfortran >> C profiling: yes >> C++ profiling: yes >> Fortran77 profiling: yes >> Fortran90 profiling: yes >> C++ exceptions: no >> Thread support: posix (mpi: no, progress: no) >> Sparse Groups: no >> Internal debug support: no >> MPI parameter check: runtime >> Memory profiling support: no >> Memory debugging support: no >> libltdl support: yes >> Heterogeneous support: no >> mpirun default --prefix: no >> MPI I/O support: yes >> MPI_WTIME support: gettimeofday >> Symbol visibility support: yes >> FT Checkpoint support: no (checkpoint thread: no) >> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.3) >> MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3.3) >> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.3) >> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3) >> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.3) >> MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3.3) >> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3) >> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3) >> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3) >> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3) >> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3) >> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3) >> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3) >> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3) >> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3) >> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA odls: default (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3) >> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3) >> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.3.3) >> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3) >> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3) >> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3) >> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3) >> >> All seemed fine and I also have xgrid controller and agent running in my >> laptop, and then when I tried: >> >> /sw/bin/om-mpirun -c 2 mpiapp # hello world example for mpi >> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file >> src/plm_xgrid_module.m at line 119 >> [amadeus.local:40293] [[804,0],0] ORTE_ERROR_LOG: Unknown error: 1 in file >> src/plm_xgrid_module.m at line 153 >> -------------------------------------------------------------------------- >> om-mpirun was unable to start the specified application as it encountered >> an >> error. >> More information may be available above. >> -------------------------------------------------------------------------- >> 2009-08-14 14:16:19.715 om-mpirun[40293:10b] *** Terminating app due to >> uncaught exception 'NSInvalidArgumentException', reason: '*** >> -[NSKVONotifying_XGConnection<0x1001164b0> finalize]: called when >> collecting >> not enabled' >> 2009-08-14 14:16:19.716 om-mpirun[40293:10b] Stack: ( >> 140735390096156, >> 140735366109391, >> 140735390122388, >> 4295943988, >> 4295939168, >> 4295171139, >> 4295883300, >> 4295025321, >> 4294973498, >> 4295401605, >> 4295345774, >> 4295056598, >> 4295116412, >> 4295119970, >> 4295401605, >> 4294972881, >> 4295401605, >> 4295345774, >> 4295056598, >> 4295172615, >> 4295938185, >> 4294971936, >> 4294969401, >> 4294969340 >> ) >> terminate called after throwing an instance of 'NSException' >> [amadeus:40293] *** Process received signal *** >> [amadeus:40293] Signal: Abort trap (6) >> [amadeus:40293] Signal code: (0) >> [amadeus:40293] [ 0] 2 libSystem.B.dylib >> 0x00000000831443fa _sigtramp + 26 >> [amadeus:40293] [ 1] 3 ??? >> 0x000000005fbfb1e8 0x0 + 1606398440 >> [amadeus:40293] [ 2] 4 libstdc++.6.dylib >> 0x00000000827f2085 _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 377 >> [amadeus:40293] [ 3] 5 libobjc.A.dylib >> 0x0000000081811adf objc_end_catch + 280 >> [amadeus:40293] [ 4] 6 libstdc++.6.dylib >> 0x00000000827f0425 __gxx_personality_v0 + 1259 >> [amadeus:40293] [ 5] 7 libstdc++.6.dylib >> 0x00000000827f045b _ZSt9terminatev + 19 >> [amadeus:40293] [ 6] 8 libstdc++.6.dylib >> 0x00000000827f054c __cxa_rethrow + 0 >> [amadeus:40293] [ 7] 9 libobjc.A.dylib >> 0x0000000081811966 objc_exception_rethrow + 0 >> [amadeus:40293] [ 8] 10 CoreFoundation >> 0x0000000082ef8194 _CF_forwarding_prep_0 + 5700 >> [amadeus:40293] [ 9] 11 mca_plm_xgrid.so >> 0x00000000000ee734 orte_plm_xgrid_finalize + 4884 >> [amadeus:40293] [10] 12 mca_plm_xgrid.so >> 0x00000000000ed460 orte_plm_xgrid_finalize + 64 >> [amadeus:40293] [11] 13 libopen-rte.0.dylib >> 0x0000000000031c43 orte_plm_base_close + 195 >> [amadeus:40293] [12] 14 mca_ess_hnp.so >> 0x00000000000dfa24 0x0 + 916004 >> [amadeus:40293] [13] 15 libopen-rte.0.dylib >> 0x000000000000e2a9 orte_finalize + 89 >> [amadeus:40293] [14] 16 om-mpirun >> 0x000000000000183a start + 4210 >> [amadeus:40293] [15] 17 libopen-pal.0.dylib >> 0x000000000006a085 opal_event_add_i + 1781 >> [amadeus:40293] [16] 18 libopen-pal.0.dylib >> 0x000000000005c66e opal_progress + 142 >> [amadeus:40293] [17] 19 libopen-rte.0.dylib >> 0x0000000000015cd6 orte_trigger_event + 70 >> [amadeus:40293] [18] 20 libopen-rte.0.dylib >> 0x000000000002467c orte_daemon_recv + 4332 >> [amadeus:40293] [19] 21 libopen-rte.0.dylib >> 0x0000000000025462 orte_daemon_cmd_processor + 722 >> [amadeus:40293] [20] 22 libopen-pal.0.dylib >> 0x000000000006a085 opal_event_add_i + 1781 >> [amadeus:40293] [21] 23 om-mpirun >> 0x00000000000015d1 start + 3593 >> [amadeus:40293] [22] 24 libopen-pal.0.dylib >> 0x000000000006a085 opal_event_add_i + 1781 >> [amadeus:40293] [23] 25 libopen-pal.0.dylib >> 0x000000000005c66e opal_progress + 142 >> [amadeus:40293] [24] 26 libopen-rte.0.dylib >> 0x0000000000015cd6 orte_trigger_event + 70 >> [amadeus:40293] [25] 27 libopen-rte.0.dylib >> 0x0000000000032207 orte_plm_base_launch_failed + 135 >> [amadeus:40293] [26] 28 mca_plm_xgrid.so >> 0x00000000000ed089 orte_plm_xgrid_spawn + 89 >> [amadeus:40293] [27] 29 om-mpirun >> 0x0000000000001220 start + 2648 >> [amadeus:40293] [28] 30 om-mpirun >> 0x0000000000000839 start + 113 >> [amadeus:40293] [29] 31 om-mpirun >> 0x00000000000007fc start + 52 >> [amadeus:40293] *** End of error message *** >> [1] 40293 abort /sw/bin/om-mpirun -c 2 mpiapp >> >> >> Is there anyone using openmpi with xgrid successfully keen to share >> his/her >> experience? I am not new to xgrid or mpi, but to both integrated I must >> say >> that I am in uncharted waters. >> >> Any help would be very appreciated. >> >> Many thanks in advance, >> Alan >> -- >> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate >> Department of Biochemistry, University of Cambridge. >> 80 Tennis Court Road, Cambridge CB2 1GA, UK. >> >>> http://www.bio.cam.ac.uk/~awd28<< >>>> >>> -------------- next part -------------- >> HTML attachment scrubbed and removed >> >> ------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> End of users Digest, Vol 1318, Issue 2 >> ************************************** >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate Department of Biochemistry, University of Cambridge. 80 Tennis Court Road, Cambridge CB2 1GA, UK. >>http://www.bio.cam.ac.uk/~awd28<<