Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Rahul Nabar
On Wed, Aug 25, 2010 at 6:41 AM, John Hearns wrote: > You could sort that out with udev rules on each machine. Sure. I'd always wanted consistent names for the eth interfaces when I set up the cluster but I couldn't get udev to co-operate. Maybe this time! Let me try. > Look in the directory /et

Re: [OMPI users] communicate C++ STL strucutures ??

2010-08-25 Thread Cristobal Navarro
regarding Boost::MPI, does this library serialize custom objects?? this is the object i need to send: class Lattice{ int numNodes; int numEdges; map edges; map nodes; stringstream key; list< list > keyLists; bool c

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Jeff Squyres
On Aug 24, 2010, at 6:26 PM, Rahul Nabar wrote: >> Are all the eth0's on one subnet and all the eth2's on a different subnet? >> >> Or are all eth0's and eth2's all on the same subnet? > > Thanks Jeff! Different subnets. All 10GigE's are on 192.168.x.x and > all 1GigE's are on 10.0.x.x It would

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Rahul Nabar
On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar wrote: > -- > gather: >    NP256    hangs >    NP128    hangs >    NP64    hangs >    NP32    OK > > Note: "gather" always hangs at the following line of the test: >       #bytes #repetitio

[OMPI users] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
Hey Open MPI wizards, I'm trying to debug something in my library that gets loaded into my mpi processes when they are started via mpirun. With other MPIs, I've been able to deliver SIGUSR2 to the process and trigger some debug code I have in my library that sets up a handler for SIGUSR2. Ho

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread John Hearns
On 24 August 2010 18:58, Rahul Nabar wrote: > There are a few unusual things about the cluster. We are using a > 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and > the other 10GigE. These are on seperate subnets although the order of > the eth interfaces is variable. i.e. 10G

Re: [OMPI users] OpenMPI with BLCR runtime problem

2010-08-25 Thread 陈文浩
I was so careless. BLCR Admin Guide says: as the root, load the kernel modules in this order: # /sbin/insmod /usr/local/lib/blcr/2.6.12-1.234/blcr_imports.ko # /sbin/insmod /usr/local/lib/blcr/2.6.12-1.234/blcr.ko In the last email, I load the kernel in the wrong order. And I followed the o

Re: [OMPI users] OpenMPI with BLCR runtime problem

2010-08-25 Thread 陈文浩
I really thank you for your advice, Josh. As you say, when check 'lsmod | grep blcr' on blade02, nothing shows. That means no blcr module is inserted on blade02. I think that's the main reason why I can't C/R mpi programs on these two nodes. But here is the problem: I installed blcr under /opt/blcr