from:"Kevin . Buckley"

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-14 Thread Kevin . Buckley

> That text message is hard-coded (and apparently out of date); it > does not show the current value. > > I agree that that is misleading. This error message needs to be > improved. OK, good to have that clarified Jeff, cheers. > This might suggest a hardware issue; let us know what you find. Ve

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-13 Thread Kevin . Buckley

> So the error output is not showing what you two think should be > the default value, 20, but then nor is it showing what I think I > have set it to globally, again, 20. > > But anyroad, what I wanted from this is confirmation that the output > is telling me the value that the job was running with

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-13 Thread Kevin . Buckley

Pasha writes > > Actually I'm surprised that default value is 10. I think it > > used to be 20 Jeff writes: > FWIW, the default for the ib_timeout is 20 in both v1.4.x and v1.5.x. > > As Ralph said, ompi_info will show the current value -- not the default > value. Of course, the current v

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-11 Thread Kevin . Buckley

Ralph, > Are you getting those messages from ompi_info? Or from an MPI app >(and if so, what are you doing to get them)? They're coming out of a user's application. Reason I just wanted to check about what the errors are saying is that things are still in tesing mode wrt the IB kit though, as I

[OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-11 Thread Kevin . Buckley

I have recently seen some OpenIB time out errors and see the following reported: * btl_openib_ib_retry_count - The number of times the sender will attempt to retry (defaulted to 7, the maximum value). * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted to 10). The actual

Re: [OMPI users] SGE and openmpi

2011-04-12 Thread Kevin . Buckley

>> #!/bin/bash >> #$ -cwd >> #$ -j y >> #$ -S /bin/bash >> #$ -q all.q >> #$ -pe orte 18 >> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin >> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS myprog > If you have SGE integration, you should not specify the number > of slots requested on t

Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-20 Thread Kevin . Buckley

> It's not hard to test whether or not SELinux is the problem. You can > turn SELinux off on the command-line with this command: > > setenforce 0 > > Of course, you need to be root in order to do this. > > After turning SELinux off, you can try reproducing the error. If it > still occurs, it's SEL

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

2011-02-14 Thread Kevin . Buckley

This probably shows my lack of understanding as to how OpenMPI negotiates the connectivity between nodes when given a choice of interfaces but anyway: does dasher have any network interfaces that vixen does not? The scenario I am imgaining would be that you ssh into dasher from vixen using a "n

[OMPI users] Experiences with Mellanox Connect-X HCA ?

2011-01-27 Thread Kevin . Buckley

Just touting around for any experiences with the following, combination (if it's already out there somewhere?) ahead of fully spec-ing a required software stack: Mellanox Connect-X HCAs talking through a Voltaire ISR4036 IB QDR switch RHEL (yep, not the usual NetBSD!) OFED (built with

Re: [OMPI users] mpiexec hangs - new install

2010-07-25 Thread Kevin . Buckley

> Here's what seems to be a solution that works for SuSE. May be > something similar for other systems: > >1) Edit the file /etc/sysconfig/SuseFirewall2 >2) Look for the keyword FW_TRUSTED_NETS >3) Add the IP addresses of your internal machines there. The format > for multiple

Re: [OMPI users] Fortran - MPI_WORLD_COMM

2010-06-22 Thread Kevin . Buckley

> This is basic fortran programming issue, you may want to consult > some fortran programming book. > > A.Chan It is more an issue with understanding the usual implementations of the MPI Fortran bindings, namely, having to include mpif.h in ALL procedures that wish to make use of the MPI variable

Re: [OMPI users] Fortran - MPI_WORLD_COMM - correction

2010-06-22 Thread Kevin . Buckley

> I think the problem is that you didn't include mpif.h in testsubr(). > Hence, the value of MPI_INTEGER was undefined -- I don't think it's a > problem with the value of MPI_Comm. That's correct. You also don't then need to pass MPI_Comm_World around, it is a parameter defined in mpif-common.h w

[OMPI users] Building on SPARC-Enterprise-T5120

2010-06-16 Thread Kevin . Buckley

Beyond what's documented at the FAQ (Questions 20 and 21) http://www.open-mpi.org/faq/?category=building#build-sun-compilers is there anything else worth tweaking for building on a SPARC-Enterprise-T5120 with the June 2010 Express compiler suite ? Perhaps, instead of -xtarget=ultra3 one should

[OMPI users] NetBSD OpenMPI - SGE - PETSc - PISM

2009-12-17 Thread Kevin . Buckley

A whole swathe of people have been made aware of the issues that have arisen as a result of a researcher here looking to run PISM, which sits on top of PETSc, which sits on top of OpenMPI. I am happy to be able to inform you that the problems we were seeing would seem to have been arising down at

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-17 Thread Kevin . Buckley

> You could confirm that it is the IPv6 loop by simply disabling IPv6 > support - configure with --disable-ipv6 and see if you still get the error > messages > > Thanks for continuing to pursue this! > Ralph > Yeah, but if you disable the IPv6 stuff then there's a completely different path taken

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-16 Thread Kevin . Buckley

> Just to say that I built the NetBSD OpenMPI 1.4 port from the CVS, > so includsing all the recent work and get the exmaples to run, albeit > still with the: > > opal_sockaddr2str failed:Unknown error (return code 4) > > non-fatal errors. > > As promised, I'll do bit more digging into this. Here'

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-10 Thread Kevin . Buckley

>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4 >> now that it available ? Might be a good time to check the fuzzz in the >> existing patches. > > http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile Just to say that I built the NetBSD OpenMPI 1.4 p

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Kevin . Buckley

>> 26a27 >>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt >> >> I have no idea how NetBSD go about resolving such clashes in the long >> term though? > > I've disabled it the same way for this time, my local package differs > from what's in wip: > > --- PLIST 3 Dec 2009 10:18:00 -

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley

>> I attach a patch, but it doesn't work and I don't see where the >> error lies now. It may be that I'm doing something stupid. >> It produces working OpenMPI-1.3.4 package on Dragonfly though. > > Ok, I'll try and merge it in to the working stuff we have here. > I, obviously, just #ifdef'd for N

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley

OK, it works although there are some temporary errors. This is the NetBSD wip openmpi package as downloaded from the webCVS a couple of days ago but with my patches as detailed before (I have not tried comparing yours with mine as yet) and the removal of the compilation and install of the Vampire

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-08 Thread Kevin . Buckley

Aleksej Cc: to the OpenMPI list as the oftdump clash might be of interest elsewhere. > I attach a patch, but it doesn't work and I don't see where the > error lies now. It may be that I'm doing something stupid. > It produces working OpenMPI-1.3.4 package on Dragonfly though. Ok, I'll try and m

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-03 Thread Kevin . Buckley

>> I have actually already taken the IPv6 block and simply tried to >> replace any IPv6 stuff with IPv4 "equivalents", eg: > > At the risk of showing a lot of ignorance, here's the block I coddled > together based on the IPv6 block. > > I have tried to keep it looking as close to the original IPv6

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley

Oh bugger, I did miss the obvious. The "old" code which I had ifdef'd out contained an actual construction of the list itself. OBJ_CONSTRUCT(&opal_if_list, opal_list_t); If I make sure I do one of those, I now get a different set of messages but we are back to running again. mpirun -v -

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley

> I would be leery of the hard-coded stuff. Indeed, so I changed it to: intf.if_mask = prefix( sin_addr->sin_addr.s_addr); which seems to match what the "old" code was doing: still blowing up though. > Reason: the IPv6 code has been a continual source of trouble, > while the IPv4 code has wor

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley

> I believe this line is incorrect: > >>opal_list_append(&opal_if_list, (opal_list_item_t*) >> intf_ptr); > > It needs to be > > opal_list_append(&opal_if_list, &intf_ptr->super); Didn't seem to change things. Any thoughts on the: /* * hardcoded netmask, adri

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley

> I have actually already taken the IPv6 block and simply tried to > replace any IPv6 stuff with IPv4 "equivalents", eg: At the risk of showing a lot of ignorance, here's the block I coddled together based on the IPv6 block. I have tried to keep it looking as close to the original IPv6 block as p

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley

> Given that it is working for us at the moment, and my current > priorities, I doubt I'll get to this over the next 2-3 weeks. > So if you have time and care to look at it before then, please > do! I have actually already taken the IPv6 block and simply tried to replace any IPv6 stuff with IPv4

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley

> Interesting - especially since the existing code works quite well over a > wide range of platforms. So I'm not quite so eager to declare it incorrect > and only working by accident. > > However, I would welcome a proposed patch so we can look at it. This is > always an important area for us, so t

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley

>> I assume that both of you have seen the reply from Aleksej Saushev, >> who seems to be the bloke looking after the port of OpenMPI to the >> NetBSD platform. >> >> >> Aleksej suggested some mods he had partially looked at, in >> >> opal/util/if.c > > Nope - didn't see anything like that :-/ Aah

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley

> "Jeff Squyres" > > > Oy. This is ick, because this error code is coming from horrendously > complex code deep in the depths of OMPI that is probing the OS to > figure out what ethernet interfaces you have. It may or may not be > simple to fix this. > > Do you mind diving into the OMPI code a

[OMPI users] Pointers for understanding failure messages on NetBSD

2009-11-29 Thread Kevin . Buckley

Hi there, I recently compiled OpenMPI 1.3.3 for a NetBSD platform as part of an attempt to get some MPI-based codes running on the SGE cycle stealing grid we have in the School here. I should point out that this has not been done within the pkgsrc build system as yet but that I found I was able t

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

[OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

Re: [OMPI users] SGE and openmpi

Re: [OMPI users] OpenMPI 1.2.x segfault as regular user

Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)

[OMPI users] Experiences with Mellanox Connect-X HCA ?

Re: [OMPI users] mpiexec hangs - new install

Re: [OMPI users] Fortran - MPI_WORLD_COMM

Re: [OMPI users] Fortran - MPI_WORLD_COMM - correction

[OMPI users] Building on SPARC-Enterprise-T5120

[OMPI users] NetBSD OpenMPI - SGE - PETSc - PISM

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

[OMPI users] Pointers for understanding failure messages on NetBSD

31 matches

Site Navigation

Mail list logo

Footer information