[OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain
I run a decent size (600+ nodes, 4000+ cores) heterogeneous (multiple 
generations of x86_64 hardware) cluster.  We use SGE (currently 6.1u4, 
which, yes, is pretty ancient) and just upgraded from CentOS 5.7 to 6.2. 
We had been using MPICH2 under CentOS 5, but I'd much rather use OpenMPI 
as packaged by RH/CentOS.  Our SGE queues are setup with a high priority 
queue, running un-niced, and a low priority queue running at nice 19, each 
with 1 slot per core on every node.


I'm seeing consistent segfaults with OpenMPI when I submit jobs without 
specifying a queue (meaning some threads run niced, others run un-niced). 
This was initially reported to me by 2 users, each with their own code, 
but I can reproduce it with my own very simple test program.  The 
segfaults occur whether I'm using the default OpenMPI version of 1.5 or 
compat-openmpi-1.4.3.  I'll note that I did upgrade the distro RPM of 
openmpi-1.5.3 to 1.5.4 to get around the broken SGE integration 
<https://bugzilla.redhat.com/show_bug.cgi?id=789150>.  I can't 
absolutely say that jobs run entirely in the high priority queue do 
not segfault.  But, if they do, it's not nearly as reproducible.  The 
segfaults also don't seem to occur if a job runs entirely on one node.


The error logs of failed jobs contain a stanza like this for each thread 
which segfaulted:

[opt207:03766] *** Process received signal ***
[opt207:03766] Signal: Segmentation fault (11)
[opt207:03766] Signal code: Address not mapped (1)
[opt207:03766] Failing at address: 0x2b4e279e778c
[opt207:03766] [ 0] /lib64/libpthread.so.0() [0x37f940f4a0]
[opt207:03766] [ 1] /usr/lib64/openmpi/lib/openmpi/mca_btl_sm.so(+0x42fc) 
[0x2b17aa6002fc]
[opt207:03766] [ 2] /usr/lib64/openmpi/lib/libmpi.so.1(opal_progress+0x5a) 
[0x37fa0d1aba]
[opt207:03766] [ 3] /usr/lib64/openmpi/lib/openmpi/mca_grpcomm_bad.so(+0x24d5) 
[0x2b17a7d234d5]
[opt207:03766] [ 4] /usr/lib64/openmpi/lib/libmpi.so.1() [0x37fa04bd57]
[opt207:03766] [ 5] /usr/lib64/openmpi/lib/libmpi.so.1(MPI_Init+0x170) 
[0x37fa063c70]
[opt207:03766] [ 6] /netapp/sali/jlb/mybin/mpihello-long.ompi-1.5-debug() 
[0x4006e6]
[opt207:03766] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x37f901ecdd]
[opt207:03766] [ 8] /netapp/sali/jlb/mybin/mpihello-long.ompi-1.5-debug() 
[0x400609]
[opt207:03766] *** End of error message ***

A backtrace of the core file looks like this:
#0  sm_fifo_read () at btl_sm.h:353
#1  mca_btl_sm_component_progress () at btl_sm_component.c:588
#2  0x0037fa0d1aba in opal_progress () at runtime/opal_progress.c:207
#3  0x2b17a7d234d5 in barrier () at grpcomm_bad_module.c:277
#4  0x0037fa04bd57 in ompi_mpi_init (argc=1, argv=0x7fff253658f8,
requested=, provided=)
at runtime/ompi_mpi_init.c:771
#5  0x0037fa063c70 in PMPI_Init (argc=0x7fff253657fc, 
argv=0x7fff253657f0)

at pinit.c:84
#6  0x004006e6 in main (argc=1, argv=0x7fff253658f8)
at mpihello-long.c:11

Those are both from a test with 1.5.  The 1.4 errors are essentially 
identical, with the differences mainly in line numbers.  I'm happy to post 
full logs, but I'm trying (albeit unsuccessfully) to keep this from 
turning into a novel.  I'm happy to do as much debugging as I can -- I'm 
pretty motivated to get this working.


Thanks for any insights.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 7:20pm, Gutierrez, Samuel K wrote

Just to be clear, what specific version of Open MPI produced the 
provided backtrace?  This smells like a missing memory barrier problem.


The backtrace in my original post was from 1.5.4 -- I took the 1.5.4 
source and put it into the 1.5.3 SRPM provided by Red Hat.  Below is a 
backtrace from 1.4.3 as shipped by RH/CentOS:


#0  sm_fifo_read () at btl_sm.h:267
#1  mca_btl_sm_component_progress () at btl_sm_component.c:391
#2  0x003e54a129ca in opal_progress () at runtime/opal_progress.c:207
#3  0x2b00fa6bb8d5 in barrier () at grpcomm_bad_module.c:270
#4  0x003e55e37d04 in ompi_mpi_init (argc=,
argv=, requested=,
provided=) at runtime/ompi_mpi_init.c:722
#5  0x003e55e5bae0 in PMPI_Init (argc=0x7fff8588b1cc, argv=0x7fff8588b1c0)
at pinit.c:80
#6  0x00400826 in main (argc=1, argv=0x7fff8588b2c8)
at mpihello-long.c:11

Thanks!

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 7:53pm, Gutierrez, Samuel K wrote

The failure signature isn't exactly what we were seeing here at LANL, 
but there were misplaced memory barriers in Open MPI 1.4.3.  Ticket 2619 
talks about this issue (https://svn.open-mpi.org/trac/ompi/ticket/2619). 
This doesn't explain, however, the failures that you are experiencing 
within Open MPI 1.5.4.  Can you give 1.4.4 a whirl and see if this fixes 
the issue?


Would it be best to use 1.4.4 specifically, or simply the most recent 
1.4.x (which appears to be 1.4.5 at this point)?


Any more information surrounding your failures in 1.5.4 are greatly 
appreciated.


I'm happy to provide, but what exactly are you looking for?  The test code 
I'm running is *very* simple:


#include 
#include 

main(int argc, char **argv)
{
   int node;

   int i, j;
   float f;

   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &node);

   printf("Hello World from Node %d.\n", node);

   for(i=0; i<=1; i++)
   f=i*2.718281828*i+i+i*3.141592654;

   MPI_Finalize();
}

And my environment is a pretty standard CentOS-6.2 install.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 9:15pm, Gutierrez, Samuel K wrote

Any more information surrounding your failures in 1.5.4 are greatly 
appreciated.


I'm happy to provide, but what exactly are you looking for?  The test 
code I'm running is *very* simple:


If you experience this type of failure with 1.4.5, can you send another 
backtrace?  We'll go from there.


In an odd way I'm relieved to say that 1.4.5 failed in the same way.  From 
the SGE log of the run, here's the error message from one of the threads 
that segfaulted:

[iq104:05697] *** Process received signal ***
[iq104:05697] Signal: Segmentation fault (11)
[iq104:05697] Signal code: Address not mapped (1)
[iq104:05697] Failing at address: 0x2ad032188e8c
[iq104:05697] [ 0] /lib64/libpthread.so.0() [0x3e5420f4a0]
[iq104:05697] [ 1] 
/netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so(+0x3c4c) [0x2b0099ec4c4c]
[iq104:05697] [ 2] 
/netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0(opal_progress+0x6a) 
[0x2b00967737ca]
[iq104:05697] [ 3] 
/netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so(+0x18d5) 
[0x2b00975ef8d5]
[iq104:05697] [ 4] /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0(+0x38a24) 
[0x2b009628da24]
[iq104:05697] [ 5] /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0(MPI_Init+0x1b0) 
[0x2b00962b24f0]
[iq104:05697] [ 6] 
/netapp/sali/jlb/mybin/mpihello-long.ompi-1.4-debug(main+0x22) [0x400826]
[iq104:05697] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3e53e1ecdd]
[iq104:05697] [ 8] /netapp/sali/jlb/mybin/mpihello-long.ompi-1.4-debug() 
[0x400749]
[iq104:05697] *** End of error message ***

And the backtrace of the resulting core file:
#0  0x2b0099ec4c4c in mca_btl_sm_component_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
#1  0x2b00967737ca in opal_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
#2  0x2b00975ef8d5 in barrier ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
#3  0x2b009628da24 in ompi_mpi_init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#4  0x2b00962b24f0 in PMPI_Init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#5  0x00400826 in main (argc=1, argv=0x7fff9fe113f8)
at mpihello-long.c:11


Another question.  How reproducible is this on your system?


In my testing today, it's been 100% reproducible.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 5:06pm, Ralph Castain wrote

Out of curiosity: could you send along the mpirun cmd line you are using 
to launch these jobs? I'm wondering if the SGE integration itself is the 
problem, and it only shows up in the sm code.


It's about as simple as it gets:

mpirun -np $NSLOTS $HOME/mybin/mpihello-long.ompi-1.4-debug

where $NSLOTS is set by SGE based on how many slots in the PE one 
requests.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 10:57pm, Gutierrez, Samuel K wrote

Fooey.  What compiler are you using to build Open MPI and how are you 
configuring your build?


I'm using gcc as packaged by RH/CentOS 6.2:

[jlb@opt200 1.4.5-2]$ gcc --version
gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)

I actually tried 2 custom builds of Open MPI 1.4.5.  For the first I tried 
to stick close to the options in RH's compat-openmpi SRPM:


./configure --prefix=$HOME/ompi-1.4.5 --enable-mpi-threads --enable-openib-ibcm 
--with-sge --with-libltdl=external --with-valgrind --enable-memchecker 
--with-psm=no --with-esmtp LDFLAGS='-Wl,-z,noexecstack'

That resulted in the backtrace I sent previously:
#0  0x2b0099ec4c4c in mca_btl_sm_component_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
#1  0x2b00967737ca in opal_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
#2  0x2b00975ef8d5 in barrier ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
#3  0x2b009628da24 in ompi_mpi_init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#4  0x2b00962b24f0 in PMPI_Init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#5  0x00400826 in main (argc=1, argv=0x7fff9fe113f8)
at mpihello-long.c:11

For kicks, I tried a 2nd compile of 1.4.5 with a bare minimum of options:

./configure --prefix=$HOME/ompi-1.4.5 --with-sge

That resulted in a slightly different backtrace that seems to be missing 
a bit:

#0  0x2b7bbc8681d0 in ?? ()
#1  
#2  0x2b7bbd2b8f6c in mca_btl_sm_component_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
#3  0x2b7bb9b2feda in opal_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
#4  0x2b7bba9a98d5 in barrier ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
#5  0x2b7bb965d426 in ompi_mpi_init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#6  0x2b7bb967cba0 in PMPI_Init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#7  0x00400826 in main (argc=1, argv=0x7fff93634788)
at mpihello-long.c:11

Can you also run with a debug build of Open MPI 
so we can see the line numbers?


I'll do that first thing tomorrow.


Another question.  How reproducible is this on your system?


In my testing today, it's been 100% reproducible.


That's surprising.


Heh.  You're telling me.

Thanks for taking an interest in this.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 6:05pm, Ralph Castain wrote

I started playing with this configure line on my Centos6 machine, and 
I'd suggest a couple of things:


1. drop the --with-libltdl=external  ==> not a good idea

2. drop --with-esmtp   ==> useless unless you really want pager messages 
notifying you of problems

3. drop --enable-mpi-threads for now

I'm continuing to play with it, but thought I'd pass those along.


After my first custom build of 1.4.5 didn't work, I built it again using 
an utterly minimal configure line:


./configure --prefix=$HOME/ompi-1.4.5 --with-sge

Runs with this library still failed, but the backtrace did change 
slightly:


#0  0x2b7bbc8681d0 in ?? ()
#1  
#2  0x2b7bbd2b8f6c in mca_btl_sm_component_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
#3  0x2b7bb9b2feda in opal_progress ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
#4  0x2b7bba9a98d5 in barrier ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
#5  0x2b7bb965d426 in ompi_mpi_init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#6  0x2b7bb967cba0 in PMPI_Init ()
   from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
#7  0x00400826 in main (argc=1, argv=0x7fff93634788)
at mpihello-long.c:11

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 11:28pm, Gutierrez, Samuel K wrote


Can you rebuild without the "--enable-mpi-threads" option and try again.


I did and still got segfaults (although w/ slightly different backtraces). 
See the response I just sent to Ralph.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-13 Thread Joshua Baker-LePain

On Tue, 13 Mar 2012 at 5:31pm, Ralph Castain wrote

FWIW: I have a Centos6 system myself, and I have no problems running 
OMPI on it (1.4 or 1.5). I can try building it the same way you do and 
see what happens.


I can run as many threads as I like on a single system with no problems, 
even if those threads are running at different nice levels.  The problem 
seems to arise when I'm both a) running across multiple machines and b) 
running threads at differing nice levels (which often happens as a result 
of our queueing setup).  I can't guarantee that the problem *never* 
happens when I run across multiple machines with all the threads un-niced, 
but I haven't been able to reproduce that at will like I can for the other 
case.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain

On Wed, 14 Mar 2012 at 9:33am, Reuti wrote

I can run as many threads as I like on a single system with no 
problems, even if those threads are running at different nice levels.


How do they get different nice levels - you renice them? I would assume 
that all start at the same of the parent. In your test program you 
posted there are no threads.


Ah, thanks for pointing this out.  Yes, when a job runs on a single host 
(even if SGE has assigned it to multiple queues), there's no qrsh 
involved.  There's just a simple mpirun and all the threads run at the 
same priority.  I did try renicing half the threads, and the job didn't 
fail.


 The problem seems to arise when I'm both a) running across multiple 
machines and b) running threads at differing nice levels (which often 
happens as a result of our queueing setup).


This sounds like you are getting slots from different queues assigned to 
one and the same job. My experience: don't do it, unless you neeed it.


You are correct -- the problem is specific to a parallel job getting slots 
from different queues.  Our cluster is used by a combination of folks 
who've financially supported it, and those that haven't.  Our high 
priority queue, lab.q, runs un-niced and is available only to those who 
have donated money and/or machines to us.  Our low priority queue, long.q, 
runs nice 19 and is available to all.  The goal is to ensure instant 
access by a lab to its "share" of the cluster while letting both those 
users and non-supporting users to use as many cores as they can in long.q. 
We explicitly allow overloading to further support our goal of keeping the 
usage both full and fair.


The setup is a bit convoluted, but it has kept the users (and, more 
importantly, the PIs) happy.  Until the recent upgrade to CentOS 6 and 
concomitant switch from MPICH2 to Open MPI, we've had no issues with 
parallel jobs and this queue setup.  And the test jobs I've tried with our 
old MPICH2 install (and the MPICH tight integration) running under CentOS 
6 don't fail either.


Do you face the same if you stay in one and the same queue across the 
machines?


Jobs don't crash if they either:

a) all run in the same queue, or

b) run in multiple queues all on one machine

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-14 Thread Joshua Baker-LePain
llo-long.ompi-1.4.3-debug
jlb  12796  2.0  0.0 153232  3752 ?S14:41   0:00
  \_ /netapp/sali/jlb/mybin/mpihello-long.ompi-1.4.3-debug


Joshua: the Centos6 is the same on all nodes and the you recompiled the 
application with the actual version of the library? By "threads" you 
refer to "processes"?


All the nodes are installed from the same kickstart file and kept fully
up to date.  And, yes, the application is compiled against the exact
library I'm running it with.

Thanks again to all for looking at this.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote


On Mar 14, 2012, at 5:44 PM, Reuti wrote:


(I was just typing when Ralph's message came in: I can confirm this. To 
avoid it, it would mean for Open MPI to collect all lines from the 
hostfile which are on the same machine. SGE creates entries for each 
queue/host pair in the machine file).


Hmmm…I can take a look at the allocator module and see why we aren't 
doing it. Would the host names be the same for the two queues?


I can't speak authoritatively like Reuti can, but here's what a hostfile
looks like on my cluster (note that all our name resolution is done via 
/etc/hosts -- there's no DNS involved):


iq103 8 lab.q@iq103 
iq103 1 test.q@iq103 
iq104 8 lab.q@iq104 
iq104 1 test.q@iq104 
opt221 2 lab.q@opt221 
opt221 1 test.q@opt221 

@Ralph: it could work if SGE would have a facility to request the 
desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be 
unique for each orted again (assuming its using different ports for 
each).


Gotcha! I suspect getting the allocator to handle this cleanly is the 
better solution, though.


If I can help (testing patches, e.g.), let me know.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Thu, 15 Mar 2012 at 12:44am, Reuti wrote

Which version of SGE are you using? The traditional rsh startup was 
replaced by the builtin startup some time ago (although it should still 
work).


We're currently running the rather ancient 6.1u4 (due to the "If it ain't 
broke..." philosophy).  The hardware for our new queue master recently 
arrived and I'll soon be upgrading to the most recent Open Grid Scheduler 
release.  Are you saying that the upgrade with the new builtin startup 
method should avoid this problem?


Maybe this shows already the problem: there are two `qrsh -inherit`, as 
Open MPI thinks these are different machines (I ran only with one slot 
on each host hence didn't get it first but can reproduce it now). But 
for SGE both may end up in the same queue overriding the openmpi-session 
in $TMPDIR.


Although it's running: you get all output? If I request 4 slots and get 
one from each queue on both machines the mpihello outputs only 3 lines: 
the "Hello World from Node 3" is always missing.


I do seem to get all the output -- there are indeed 64 Hello World lines.

Thanks again for all the help on this.  This is one of the most productive 
exchanges I've had on a mailing list in far too long.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Thu, 15 Mar 2012 at 1:53pm, Reuti wrote

PS: In your example you also had the case 2 slots in the low priority 
queue, what is the actual setup in your cluster?


Our actual setup is:

 o lab.q, slots=numprocs, load_thresholds=np_load_avg=1.5, labs (=SGE
   projects) limited by RQS to a number of slots equal to their "share" of
   the cluster, seq_no=0, priority=0.

 o long.q, slots=numprocs, load_thresholds=np_load_avg=0.9, seq_no=1,
   priority=19

 o short.q, slots=numprocs, load_thresholds=np_load_avg=1.25, users
   limited by RQS to 200 slots, runtime limited to 30 minutes, seq_no=2,
   priority=10

Users are instructed to not select a queue when submitting jobs.  The 
theory is that even if non-contributing users have filled the cluster with 
long.q jobs, contributing users will still have instant access to "their" 
lab.q slots, overloading nodes with jobs running at a higher priority than 
the long.q jobs.  long.q jobs won't start on nodes full of lab.q jobs. 
And short.q is for quick, high priority jobs regardless of cluster status 
(the main use case being processing MRI data into images while a patient 
is physically in the scanner).


The truth is our cluster is primarily used for, and thus SGE is tuned for, 
large numbers of serial jobs.  We do have *some* folks running parallel 
code, and it *is* starting to get to the point where I need to reconfigure 
things to make that part work better.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Thu, 15 Mar 2012 at 4:41pm, Reuti wrote


Am 15.03.2012 um 15:50 schrieb Ralph Castain:


On Mar 15, 2012, at 8:46 AM, Reuti wrote:


Am 15.03.2012 um 15:37 schrieb Ralph Castain:

FWIW: I see the problem. Our parser was apparently written assuming 
every line was a unique host, so it doesn't even check to see if 
there is duplication. Easy fix - can shoot it to you today.


But even with the fix the nice value will be the same for all 
processes forked there. Either all have the nice value of his low 
priority queue or the high priority queue.


Agreed - nothing I can do about that, though. We only do the one qrsh 
call, so the daemons are going to fall into a single queue, and so will 
all their children. In this scenario, it isn't clear to me (from this 
discussion) that I can control which queue gets used


Correct.


Which I understand.  Our queue setup is admittedly a bit wonky (which is
probably why I'm the first one to have this issue).  I'm much more 
concerned with things not crashing than with them absolutely having the 
"right" nice levels.  :)



Should I?


I can't speak for the community. Personally I would say: don't 
distribute parallel jobs among different queues at all, as some 
applications will use some internal communication about the environment 
variables of the master process to distribute them to the slaves (even 
if SGE's `qrsh -inherit ...` is called without -V, and even if Open MPI 
is not told to forward and specific environment variable). If you have a 
custom application it can work of course, but with closed source ones 
you can only test and get the experience whether it's working or not.


Not to mention the timing issue of differently niced processes. 
Adjusting the SGE setup of the OP would be the smarter way IMO.


And I agree with that as well.  I understand if the decision is made to 
leave the parser the way it is, given that my setup is outside the norm.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Thu, 15 Mar 2012 at 11:38am, Ralph Castain wrote

No, I'll fix the parser as we should be able to run anyway. Just can't 
guarantee which queue the job will end up in, but at least it -will- 
run.


Makes sense to me.  Thanks!

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Joshua Baker-LePain

On Thu, 15 Mar 2012 at 11:49am, Ralph Castain wrote

Here's the patch: I've set it up to go into 1.5, but not 1.4 as that 
series is being closed out. Please let me know if this solves the 
problem for you.


I couldn't get the included inline patch to apply to 1.5.4 (probably my 
issue), but I downloaded it from 
<https://svn.open-mpi.org/trac/ompi/changeset/26148> and applied that.  My 
test job ran just fine, and looking at the nodes verified a single orted 
process per node despite SGE assigning slots in multiple queues.


In short, WORKSFORME.

Thanks!

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


Re: [OMPI users] mpicc command not found - Fedora

2012-03-29 Thread Joshua Baker-LePain

On Thu, 29 Mar 2012 at 7:45pm, Rohan Deshpande wrote


I have installed mpi successfully on fedora using *yum install openmpi
openmpi-devel openmpi-libs*


What version of Fedora are you using, and on what architecture (i.e. i686 
or x86_64)?  As far as I can see, the last Fedora distro to use 
openmpi-libs was Fedora 11, which is rather old and unsupported.



I have also added */usr/lib/openmpi/bin* to *PATH *and*
LD_LIBRARY_PATH*variable.

But when I try to complie my program using *mpicc hello.c*
or*/usr/lib/openmpi/bin/mpicc hello.c
* I get error saying *mpicc: command not found*
*
*
I checked the contents of /user/lib/openmpi/bin and there is no mpicc...
here is the screenshot


Current versions of Fedora use the "module" command to load the proper 
environment for Open MPI.  On a 64bit machine, e.g., one would run
"module load openmpi-x86_64" to get all the env variables properly set. 
But I don't know what Fedora version that started with.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF