Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-15 Thread Ralph Castain
On Feb 15, 2010, at 8:44 PM, Terry Frankcombe wrote: > On Mon, 2010-02-15 at 20:18 -0700, Ralph Castain wrote: >> Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you can >> set the bindings to -bind-to-socket or -bind-to-core. Either will give you >> improved performance. >

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-15 Thread Terry Frankcombe
On Mon, 2010-02-15 at 20:18 -0700, Ralph Castain wrote: > Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you can > set the bindings to -bind-to-socket or -bind-to-core. Either will give you > improved performance. > > IIRC, MVAPICH defaults to -bind-to-socket. OMPI defaults

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-15 Thread Ralph Castain
Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you can set the bindings to -bind-to-socket or -bind-to-core. Either will give you improved performance. IIRC, MVAPICH defaults to -bind-to-socket. OMPI defaults to no binding. On Feb 15, 2010, at 6:51 PM, Repsher, Stephen J

[OMPI users] Bad Infiniband latency with subounce

2010-02-15 Thread Repsher, Stephen J
Hello again, Hopefully this is an easier question My cluster uses Infiniband interconnects (Mellanox Infinihost III and some ConnectX). I'm seeing terrible and sporadic latency (order ~1000 microseconds) as measured by the subounce code (http://sourceforge.net/projects/subounce/), but th

Re: [OMPI users] Seg fault with PBS Pro 10.2

2010-02-15 Thread Ralph Castain
Could you please ask them about this: OMPI makes the following call to connect to the mother superior: struct tm_roots tm_root; ret = tm_init(NULL, &tm_root); Could they tell us why this segfaults in PBS Pro? It works correctly with all releases of Torque. Thanks Ralph On Feb 15, 2010, at 12:

Re: [OMPI users] Seg fault with PBS Pro 10.2

2010-02-15 Thread Joshua Bernstein
Well, We all wish the Altair guys would at least try to maintain backwards compatibility with the community, but they have a big habit of breaking things. This isn't the first time they've broken a more customer facing function like tm_spawn. (The also like breaking pbs_statjob too!).

Re: [OMPI users] Seg fault with PBS Pro 10.2

2010-02-15 Thread Jeff Squyres
Bummer! If it helps, could you put us in touch with the PBS Pro people? We usually only have access to Torque when developing the TM-launching stuff (PBS Pro and Torque supposedly share the same TM interface, but we don't have access to PBS Pro, so we don't know if it has diverged over time).

Re: [OMPI users] Seg fault with PBS Pro 10.2

2010-02-15 Thread Repsher, Stephen J
Ralph, This is my first build of OpenMPI so I haven't had this working before. I'm pretty confident that PATH and LD_LIBRARY_PATH issues are not the cause, otherwise launches outside of PBS would fail too. Also, I tried compiling everything statically with the same result. Some additional in