Re: [OMPI users] MPI_Comm_spawn errors
Hi Joao, Unfortunately, spawn is broken on the development trunk right now. We are working on a major revamp of the runtime system which should fix these problems, but it is not ready yet. Sorry about that :( Tim Joao Vicente Lima wrote: Hi all, I'm getting errors with spawn in the situations: 1) spawn1.c - spawning 2 process on localhost, one by one, the error is: spawning ... [localhost:31390] *** Process received signal *** [localhost:31390] Signal: Segmentation fault (11) [localhost:31390] Signal code: Address not mapped (1) [localhost:31390] Failing at address: 0x98 [localhost:31390] [ 0] /lib/libpthread.so.0 [0x2b1d38a17ed0] [localhost:31390] [ 1] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_dyn_finalize+0xd2) [0x2b1d37667cb2] [localhost:31390] [ 2] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_finalize+0x3b) [0x2b1d3766358b] [localhost:31390] [ 3] /usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_mpi_finalize+0x248) [0x2b1d37679598] [localhost:31390] [ 4] ./spawn1(main+0xac) [0x400ac4] [localhost:31390] [ 5] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b1d38c43b74] [localhost:31390] [ 6] ./spawn1 [0x400989] [localhost:31390] *** End of error message *** -- mpirun has exited due to process rank 0 with PID 31390 on node localhost calling "abort". This will have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- With 1 process spawned or with 2 process spawned in one call there is no output from child. 2) spawn2.c - no response, this init is MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &required) the attachments contains the programs, ompi_info and config.log. Some suggest ? thanks a lot. Joao. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] openmpi/openib problems
Hi all, I'm having problems getting openmpi to work correctly using verbs on some systems. It's been working using openib for quite some time, but I need to get it working using verbs for some research I'm doing. Anyway all seems to be good on the openib side of things. ibv_devinfo and ibv_devices returns device information, and they are listed as active on each node. Also all hosts are visible to each other (ibhosts shows a full list). The problem I see with openmpi is I have the openib btl, but not the openib mpool, and when looking at the contents of ompi/mca/mpool/ I don't see openib there (sm and rdma are both listed and ompi_info shows they've been included in the build). Any help would be appreciated. Thanks, Jessie
Re: [OMPI users] openmpi/openib problems
jessie puls wrote: Hi all, I'm having problems getting openmpi to work correctly using verbs on some systems. It's been working using openib for quite some time, but I need to get it working using verbs for some research I'm doing. This would make a whole lot more sense if I'd typed it correctly. It's been working using ipoib. Anyway all seems to be good on the openib side of things. ibv_devinfo and ibv_devices returns device information, and they are listed as active on each node. Also all hosts are visible to each other (ibhosts shows a full list). The problem I see with openmpi is I have the openib btl, but not the openib mpool, and when looking at the contents of ompi/mca/mpool/ I don't see openib there (sm and rdma are both listed and ompi_info shows they've been included in the build). Any help would be appreciated. Thanks, Jessie ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] mpi.h macro naming
Thanks in advance if this is already fixed in a later release I've not caught up to, I'm at 1.2.3. Is there some subtle reason that ompi's mpi.h leaves the following macros both unguarded with an ifndef and un-prefixed with OMPI_ ? This produces considerable amounts of compiler whinage for other codes that include mpi.h. As always, extraneous whinage makes real errors harder to find. (And yes, those other codes also need *their* definitions of HAVE_LONG_LONG, etc properly protected). And of course who knows how the answer was defined for any given unprotected appearance of these macros? /* Define to 1 if the system has the type `long long'. */ #undef HAVE_LONG_LONG /* The size of a `bool', as computed by sizeof. */ #undef SIZEOF_BOOL /* The size of a `int', as computed by sizeof. */ #undef SIZEOF_INT If it's simply a matter of developer hours, I can post a patch somewhere to address this. It appears that of these, only sizeof_int affects more than a few source files. thanks, Ben Allan
[OMPI users] processes aborting on MPI_Finalize
This is probably some coding error on my part, but under some problem divisions I get processes aborting when I call MPI_Finalize(). Perhaps they are still waiting incorrectly to recived some message or something like that. Sometimes it seems to work. Is there a good way to get to the bottom of this error? output- 4 additional processes aborted (not shown) Sam Adams General Dynamics Information Technology Phone: 210.536.5945
Re: [OMPI users] Can't get OPENMPI to run parallel job with Myrinet/GM
Would you be able to send me the mpirun command and args that you use? how can I get more output to study? I added "--display-map -d -v " to my mpirun command, which gives more output, but not the reason for the failure. The information contained herein is GOODYEAR PROPRIETARY information and includes GOODYEAR CONFIDENTIAL information. Reproduction of this document, disclosure of the information, and use for any purpose other than to conduct business with Goodyear is expressly prohibited. George Bosilca To Sent by: Open MPI Users users-bounces@ope cc n-mpi.org t901...@rds4020.akr.goodyear.com Subject Re: [OMPI users] Can't get OPENMPI 02/14/2008 10:18 to run parallel job with Myrinet/GM PM Please respond to Open MPI Users I run a full testing on the GM with 1.2.5 and with the trunk. Both of them run to completion without any errors. Moreover, the error message only say that one of the processes was terminated, which usually means that something bad happened somewhere else, and the runtime decided to terminate the whole job. This might be a segfault, an abort. Without more information it will be difficult to help or to offer any advice.. george. On Feb 14, 2008, at 11:15 AM, Tom Wurgler wrote: > > I am trying to use openmpi 1.2.5 (I also tried 1.2.4) to run a > parallel job > using GM drivers. The only message I get is: > > mpirun noticed that job rank 0 with PID 19508 on node node93 exited on > signal 15 (Terminated). > > I can run serially on one node (4 processors), it just dies when > trying to use > more than one node. > > Any help appreciated. > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Can't get OPENMPI to run parallel job with Myrinet/GM
Tom, Here is how I configured Open MPI. It's mostly the default configuration ... ../../ompi-trunk/configure --prefix=/nfs/home/bosilca/opt/unstable/fog/ fast --disable-debug --enable-picky --with-platform=optimized -- disable-mpi-cxx --disable-mpi-f90 --enable-mpi-f77 --disable-mpi- profiling --with-gm=/opt/gm -enable-visibility No specific arguments were required to run the tests. You can force the GM BTL by using "--mca btl gm,self" or "--mca btl gm,sm,self" if you need shared memory. george. On Feb 19, 2008, at 4:59 PM, twu...@goodyear.com wrote: Would you be able to send me the mpirun command and args that you use? how can I get more output to study? I added "--display-map -d -v " to my mpirun command, which gives more output, but not the reason for the failure. The information contained herein is GOODYEAR PROPRIETARY information and includes GOODYEAR CONFIDENTIAL information. Reproduction of this document, disclosure of the information, and use for any purpose other than to conduct business with Goodyear is expressly prohibited. George Bosilca .edu > To Sent by: Open MPI Users > users- bounces@ope cc n-mpi.org t901...@rds4020.akr.goodyear.com Subject Re: [OMPI users] Can't get OPENMPI 02/14/2008 10:18 to run parallel job with Myrinet/GM PM Please respond to Open MPI Users I run a full testing on the GM with 1.2.5 and with the trunk. Both of them run to completion without any errors. Moreover, the error message only say that one of the processes was terminated, which usually means that something bad happened somewhere else, and the runtime decided to terminate the whole job. This might be a segfault, an abort. Without more information it will be difficult to help or to offer any advice.. george. On Feb 14, 2008, at 11:15 AM, Tom Wurgler wrote: I am trying to use openmpi 1.2.5 (I also tried 1.2.4) to run a parallel job using GM drivers. The only message I get is: mpirun noticed that job rank 0 with PID 19508 on node node93 exited on signal 15 (Terminated). I can run serially on one node (4 processors), it just dies when trying to use more than one node. Any help appreciated. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature