Dear John
I found this output of ibstatus on some nodes (most probably the problem
causing)
[root@compute-01-08 ~]# ibstatus
Fatal error: device '*': sys files not found
(/sys/class/infiniband/*/ports)
Does this show any hardware or software issue?
Thanks
On Wed, Nov 28, 2012 at 3:17 PM, Jo
Hi
> On 12/18/2012 6:55 PM, Jeff Squyres wrote:
> > ...but only of v1.6.x.
>
> okay, adding development version on Christmas wishlist
> ;-)
Can you build the package with thread and Java support?
--enable-mpi-java \
--enable-opal-multi-threads \
--enable-mpi-thread-multiple \
--with-thr
Having run some more benchmarks, the new default is *really* bad for our
application (2-10x slower), so I've been looking at the source to try
and figure out why.
It seems that the biggest difference will occur when the all_to_all is
actually sparse (e.g. our application); if most N-M process
Did you *really* wanna to dig into code just in order to switch a default
communication algorithm?
Note there are several ways to set the parameters; --mca on command line is just
one of them (suitable for quick online tests).
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
W
On 12/19/2012 11:04 AM, Siegmar Gross wrote:
Hi
On 12/18/2012 6:55 PM, Jeff Squyres wrote:
...but only of v1.6.x.
okay, adding development version on Christmas wishlist
;-)
Can you build the package with thread and Java support?
--enable-mpi-java \
--enable-opal-multi-threads \
-
Hi
I shortend this email so that you get earlier to my comments.
> > In my opinion Datatype.Vector must set the size of the
> > base datatype as extent of the vector and not the true extent, because
> > MPI-Java doesn't provide a function to resize a datatype.
>
> No, I think Datatype.Vector is
Seems like driver was not started. I would suggest to run lspci and check if
the HCA is visible on HW level.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Dec 19, 2012, at 2:12 AM, Syed Ahsan Ali wrote:
Dear Joh
Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit :
> Dear John
>
> I found this output of ibstatus on some nodes (most probably the
> problem causing)
> [root@compute-01-08 ~]# ibstatus
>
> Fatal error: device '*': sys files not found
> (/sys/class/infiniband/*/ports)
>
> Do
Jeff, others:
I fixed the problem we were experiencing by adding a barrier.
The bug occurred between a piece of code that uses (many, over a loop) SEND
(from the leader)
and RECV (in the worker processes) to ship data to the
processing nodes from the head / leader, and I think what might have be
On 19/12/12 11:08, Paul Kapinos wrote:
Did you *really* wanna to dig into code just in order to switch a
default communication algorithm?
No, I didn't want to, but with a huge change in performance, I'm forced
to do something! And having looked at the different algorithms, I think
there's a p
On 12/19/2012 12:28 PM, marco atzeri wrote:
working on openmpi-1.7rc5.
It needs some cleaning and after I need to test.
built and passed test
http://www.open-mpi.org/community/lists/devel/2012/12/11855.php
Regards
Marco
I figured this out.
ssh was working, but scp was not due to an mtu mismatch between the
systems. Adding MTU=1500 to my
/etc/sysconfig/network-scripts/ifcfg-eth2 fixed the problem.
Dan
On 12/17/2012 04:12 PM, Daniel Davidson wrote:
Yes, it does.
Dan
[root@compute-2-1 ~]# ssh compute-2-0
W
Hooray!! Great to hear - I was running out of ideas :-)
On Dec 19, 2012, at 2:01 PM, Daniel Davidson wrote:
> I figured this out.
>
> ssh was working, but scp was not due to an mtu mismatch between the systems.
> Adding MTU=1500 to my /etc/sysconfig/network-scripts/ifcfg-eth2 fixed the
> pro
Using openmpi 1.6.2 with intel 13.0 though the problem not specific to the
compiler.
Using two 12 core 2 socket nodes,
mpirun -np 4 -npersocket 2 uptime
--
Your job has requested a conflicting number of processes for the
a
I'm afraid these are both known problems in the 1.6.2 release. I believe we
fixed npersocket in 1.6.3, though you might check to be sure. On the
large-scale issue, cpus-per-rank well might fail under those conditions. The
algorithm in the 1.6 series hasn't seen much use, especially at scale.
In
15 matches
Mail list logo