Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread Reese Faucette
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, !&nic_id, sizeof(nic_id), &value, sizeof(int))) != MX_SUCCESS ) { yes, a NIC ID is required for this call because a host may have multiple NICs with

Re: [OMPI users] mixing MX and TCP

2007-06-01 Thread Reese Faucette
Just to brainstorm on this a little - the two different clusters will have different "mapper IDs", and this can be learned via the attached code snippet. As long as fma is the mapper (as opposed the the older, deprecated "gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure that NIC 0

Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-06 Thread Reese Faucette
What version of GM are you running? # rpm -qa |egrep "^gm-[0-9]+|^gm-devel" gm-2.0.24-1 gm-devel-2.0.24-1 Is this too old? Nope, that's just fine. A mismatch between the list of nodes actually configured onto the Myrinet fabric and the machine file is a common source of errors like this. T

Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-05 Thread Reese Faucette
I'm having difficulty with running a simple hello world OpenMPI program over Myrinet gm interconnect - please see the log at the end of this email. The error is tripped by a call to the function gm_global_id_to_node_id( gm_btl->port, gm_endpoint->endpoint_addr.global_id, &gm_e

Re: [OMPI users] coredump in guess_strlen()

2007-01-24 Thread Reese Faucette
[repost - apologies, apparently my first one was unintentionally a followup to another thread] If you ever do an opal_output() with a "%p" in the format string, guess_strlen() can segfault because it neglects to consume the corresponding argument, causing subsequent "%s" in the same format strin

[OMPI users] coredump in guess_strlen()

2007-01-24 Thread Reese Faucette
If you ever do an opal_output() with a "%p" in the format string, guess_strlen() can segfault because it neglects to consume the corresponding argument, causing subsequent "%s" in the same format string to blow up in strlen() on a bad address. Any objections to the following patch to add %p su

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette
Right, that's the maximum number of open MX channels, i.e. processes than can run on the node using MX. With MX (1.2.0c I think), I get weird messages if I run a second mpirun quickly after the first one failed. The myrinet guys, I quite sure, can explain why and how. Somehow, when an application

Re: [OMPI users] Ompi failing on mx only

2007-01-08 Thread Reese Faucette
Second thing. From one of your previous emails, I see that MX is configured with 4 instance by node. Your running with exactly 4 processes on the first 2 nodes. Weirds things might happens ... 4 processes per node will be just fine. This is not like GM where the 4 includes some "reserved" port

Re: [OMPI users] Ompi failing on mx only

2007-01-03 Thread Reese Faucette
$ mpirun --prefix /usr/local/openmpi-1.2b2 --hostfile ./h1-3 -np 1 --mca btl mx,sm,self ./cpi [node-1:09704] mca: base: component_find: unable to open mtl mx: file not found (ignored) [node-1:09704] mca: base: component_find: unable to open btl mx: file not found (ignored) This in particular is

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
As for the MTL, there is a bug in the MX MTL for v1.2 that has been fixed, but after 1.2b2 ... oops, i was stupidly assuming he already had that fix. yes, this is an important fix... -reese

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
Ompi failing on mx only> I've attached the ompi_info from node-1 and node-2. thanks, but i need "mx_info", not "ompi_info" ;-) But now that you mention mapper, I take it that's what SEGV_MAPERR might be referring to. this is an ompi red herring; it has nothing to do with Myrinet mapping, even

Re: [OMPI users] Ompi failing on mx only

2007-01-02 Thread Reese Faucette
Ompi failing on mx onlyHi, Gary- This looks like a config problem, and not a code problem yet. Could you send the output of mx_info from node-1 and from node-2? Also, forgive me counter-asking a possibly dumb OMPI question, but is "-x LD_LIBRARY_PATH" really what you want, as opposed to "-x LD

Re: [OMPI users] mpool_gm_module error

2006-12-12 Thread Reese Faucette
Well I have no luck in finding a way to up the amount the system will allow GM to use. What is a recommended solution? Is this even a problem in most cases? Like am i encountering a corner case? upping the limit was not what i'm suggesting as a fix, just pointing out that it is kind of low an

Re: [OMPI users] mpool_gm_module error

2006-12-11 Thread Reese Faucette
GM: gm_register_memory will be able to lock XXX pages (YYY MBytes) Is there a way to tell GM to pull more memory from the system? GM reserves all IOMMU space that the OS is willing to give it, so what is needed is a way to tell the OS and/or machine to allow a bigger chunk of IOMMU space to b

Re: [OMPI users] mpool_gm_module error

2006-12-11 Thread Reese Faucette
Also I have no idea what the memory window question is, i will look it up on google. aon075:~ root# dmesg | grep GM GM: gm_register_memory will be able to lock 96000 pages (375 MBytes) This just answered it - there is 375MB available for GM to register, which is the IOMMU window size available

Re: [OMPI users] mpool_gm_module error

2006-12-11 Thread Reese Faucette
I have tried moving around machines that the run is done on to the same result in multiple places. The error is: [aon049.engin.umich.edu:21866] [mpool_gm_module.c:100] error(8) registering gm memory This is on a PPC-based OSX system? How many MPI processes per node are you starting? And I as

Re: [OMPI users] myrinet problems on OSX

2006-12-07 Thread Reese Faucette
This is due to a problem in (void *)->(uint64_t_ conversion in OMPI. The following patch fixes the problem, as would an appropriate cast of pval, I suspect. The problem is an inappropriate use of ompi_ptr_t. I would guess that other uses of lval might be suspect also (such as in the Portals c