[OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
etty motivated to get this working. Thanks for any insights. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
=0x7fff8588b2c8) at mpihello-long.c:11 Thanks! -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
MPI_Comm_rank(MPI_COMM_WORLD, &node); printf("Hello World from Node %d.\n", node); for(i=0; i<=1; i++) f=i*2.718281828*i+i+i*3.141592654; MPI_Finalize(); } And my environment is a pretty standard CentOS-6.2 install. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
I_Init () from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0 #5 0x00400826 in main (argc=1, argv=0x7fff9fe113f8) at mpihello-long.c:11 Another question. How reproducible is this on your system? In my testing today, it's been 100% reproducible. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
mpirun -np $NSLOTS $HOME/mybin/mpihello-long.ompi-1.4-debug where $NSLOTS is set by SGE based on how many slots in the PE one requests. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
surprising. Heh. You're telling me. Thanks for taking an interest in this. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
pp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0 #7 0x00400826 in main (argc=1, argv=0x7fff93634788) at mpihello-long.c:11 -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
On Tue, 13 Mar 2012 at 11:28pm, Gutierrez, Samuel K wrote Can you rebuild without the "--enable-mpi-threads" option and try again. I did and still got segfaults (although w/ slightly different backtraces). See the response I just sent to Ralph. -- Joshua Baker-LePain QB3 Shar

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
when I run across multiple machines with all the threads un-niced, but I haven't been able to reproduce that at will like I can for the other case. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain
t fail either. Do you face the same if you stay in one and the same queue across the machines? Jobs don't crash if they either: a) all run in the same queue, or b) run in multiple queues all on one machine -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain
t file and kept fully up to date. And, yes, the application is compiled against the exact library I'm running it with. Thanks again to all for looking at this. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
e desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be unique for each orted again (assuming its using different ports for each). Gotcha! I suspect getting the allocator to handle this cleanly is the better solution, though. If I can help (testing patches, e.g.), let me know

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
ne of the most productive exchanges I've had on a mailing list in far too long. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
e truth is our cluster is primarily used for, and thus SGE is tuned for, large numbers of serial jobs. We do have *some* folks running parallel code, and it *is* starting to get to the point where I need to reconfigure things to make that part work better. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
the OP would be the smarter way IMO. And I agree with that as well. I understand if the decision is made to leave the parser the way it is, given that my setup is outside the norm. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
On Thu, 15 Mar 2012 at 11:38am, Ralph Castain wrote No, I'll fix the parser as we should be able to run anyway. Just can't guarantee which queue the job will end up in, but at least it -will- run. Makes sense to me. Thanks! -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain
issue), but I downloaded it from <https://svn.open-mpi.org/trac/ompi/changeset/26148> and applied that. My test job ran just fine, and looking at the nodes verified a single orted process per node despite SGE assigning slots in multiple queues. In short, WORKSFORME. Thanks! -- Jo

Re: [OMPI users] mpicc command not found - Fedora

2012-03-29 Thread Joshua Baker-LePain
ll the env variables properly set. But I don't know what Fedora version that started with. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF