Re: [OMPI users] 1.6.2 affinity failures

2012-12-19 Thread Ralph Castain
I'm afraid these are both known problems in the 1.6.2 release. I believe we fixed npersocket in 1.6.3, though you might check to be sure. On the large-scale issue, cpus-per-rank well might fail under those conditions. The algorithm in the 1.6 series hasn't seen much use, especially at scale. In

[OMPI users] 1.6.2 affinity failures

2012-12-19 Thread Brock Palen
Using openmpi 1.6.2 with intel 13.0 though the problem not specific to the compiler. Using two 12 core 2 socket nodes, mpirun -np 4 -npersocket 2 uptime -- Your job has requested a conflicting number of processes for the a

Re: [OMPI users] mpi problems/many cpus per node

2012-12-19 Thread Ralph Castain
Hooray!! Great to hear - I was running out of ideas :-) On Dec 19, 2012, at 2:01 PM, Daniel Davidson wrote: > I figured this out. > > ssh was working, but scp was not due to an mtu mismatch between the systems. > Adding MTU=1500 to my /etc/sysconfig/network-scripts/ifcfg-eth2 fixed the > pro

Re: [OMPI users] mpi problems/many cpus per node

2012-12-19 Thread Daniel Davidson
I figured this out. ssh was working, but scp was not due to an mtu mismatch between the systems. Adding MTU=1500 to my /etc/sysconfig/network-scripts/ifcfg-eth2 fixed the problem. Dan On 12/17/2012 04:12 PM, Daniel Davidson wrote: Yes, it does. Dan [root@compute-2-1 ~]# ssh compute-2-0 W

Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-19 Thread marco atzeri
On 12/19/2012 12:28 PM, marco atzeri wrote: working on openmpi-1.7rc5. It needs some cleaning and after I need to test. built and passed test http://www.open-mpi.org/community/lists/devel/2012/12/11855.php Regards Marco

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Number Cruncher
On 19/12/12 11:08, Paul Kapinos wrote: Did you *really* wanna to dig into code just in order to switch a default communication algorithm? No, I didn't want to, but with a huge change in performance, I'm forced to do something! And having looked at the different algorithms, I think there's a p

Re: [OMPI users] Possible memory error

2012-12-19 Thread Handerson, Steven
Jeff, others: I fixed the problem we were experiencing by adding a barrier. The bug occurred between a piece of code that uses (many, over a loop) SEND (from the leader) and RECV (in the worker processes) to ship data to the processing nodes from the head / leader, and I think what might have be

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Yann Droneaud
Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit : > Dear John > > I found this output of ibstatus on some nodes (most probably the > problem causing) > [root@compute-01-08 ~]# ibstatus > > Fatal error: device '*': sys files not found > (/sys/class/infiniband/*/ports) > > Do

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Shamis, Pavel
Seems like driver was not started. I would suggest to run lspci and check if the HCA is visible on HW level. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote: Dear Joh

Re: [OMPI users] [Open MPI] #3351: JAVA scatter error

2012-12-19 Thread Siegmar Gross
Hi I shortend this email so that you get earlier to my comments. > > In my opinion Datatype.Vector must set the size of the > > base datatype as extent of the vector and not the true extent, because > > MPI-Java doesn't provide a function to resize a datatype. > > No, I think Datatype.Vector is

Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-19 Thread marco atzeri
On 12/19/2012 11:04 AM, Siegmar Gross wrote: Hi On 12/18/2012 6:55 PM, Jeff Squyres wrote: ...but only of v1.6.x. okay, adding development version on Christmas wishlist ;-) Can you build the package with thread and Java support? --enable-mpi-java \ --enable-opal-multi-threads \ -

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Paul Kapinos
Did you *really* wanna to dig into code just in order to switch a default communication algorithm? Note there are several ways to set the parameters; --mca on command line is just one of them (suitable for quick online tests). http://www.open-mpi.org/faq/?category=tuning#setting-mca-params W

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Number Cruncher
Having run some more benchmarks, the new default is *really* bad for our application (2-10x slower), so I've been looking at the source to try and figure out why. It seems that the biggest difference will occur when the all_to_all is actually sparse (e.g. our application); if most N-M process

Re: [OMPI users] openmpi-1.9a1r27674 on Cygwin-1.7.17

2012-12-19 Thread Siegmar Gross
Hi > On 12/18/2012 6:55 PM, Jeff Squyres wrote: > > ...but only of v1.6.x. > > okay, adding development version on Christmas wishlist > ;-) Can you build the package with thread and Java support? --enable-mpi-java \ --enable-opal-multi-threads \ --enable-mpi-thread-multiple \ --with-thr

Re: [OMPI users] Infiniband errors

2012-12-19 Thread Syed Ahsan Ali
Dear John I found this output of ibstatus on some nodes (most probably the problem causing) [root@compute-01-08 ~]# ibstatus Fatal error: device '*': sys files not found (/sys/class/infiniband/*/ports) Does this show any hardware or software issue? Thanks On Wed, Nov 28, 2012 at 3:17 PM, Jo