> That text message is hard-coded (and apparently out of date); it
> does not show the current value.
>
> I agree that that is misleading. This error message needs to be
> improved.
OK, good to have that clarified Jeff, cheers.
> This might suggest a hardware issue; let us know what you find.
Ve
> So the error output is not showing what you two think should be
> the default value, 20, but then nor is it showing what I think I
> have set it to globally, again, 20.
>
> But anyroad, what I wanted from this is confirmation that the output
> is telling me the value that the job was running with
Pasha writes
> > Actually I'm surprised that default value is 10. I think it
> > used to be 20
Jeff writes:
> FWIW, the default for the ib_timeout is 20 in both v1.4.x and v1.5.x.
>
> As Ralph said, ompi_info will show the current value -- not the default
> value. Of course, the current v
Ralph,
> Are you getting those messages from ompi_info? Or from an MPI app
>(and if so, what are you doing to get them)?
They're coming out of a user's application.
Reason I just wanted to check about what the errors are saying is
that things are still in tesing mode wrt the IB kit though, as I
I have recently seen some OpenIB time out errors and see the
following reported:
* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 10). The actual
>> #!/bin/bash
>> #$ -cwd
>> #$ -j y
>> #$ -S /bin/bash
>> #$ -q all.q
>> #$ -pe orte 18
>> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin
>> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS myprog
> If you have SGE integration, you should not specify the number
> of slots requested on t
> It's not hard to test whether or not SELinux is the problem. You can
> turn SELinux off on the command-line with this command:
>
> setenforce 0
>
> Of course, you need to be root in order to do this.
>
> After turning SELinux off, you can try reproducing the error. If it
> still occurs, it's SEL
This probably shows my lack of understanding as to how OpenMPI
negotiates the connectivity between nodes when given a choice
of interfaces but anyway:
does dasher have any network interfaces that vixen does not?
The scenario I am imgaining would be that you ssh into dasher
from vixen using a "n
Just touting around for any experiences with the following,
combination (if it's already out there somewhere?) ahead
of fully spec-ing a required software stack:
Mellanox Connect-X HCAs talking through
a Voltaire ISR4036 IB QDR switch
RHEL (yep, not the usual NetBSD!)
OFED (built with
> Here's what seems to be a solution that works for SuSE. May be
> something similar for other systems:
>
>1) Edit the file /etc/sysconfig/SuseFirewall2
>2) Look for the keyword FW_TRUSTED_NETS
>3) Add the IP addresses of your internal machines there. The format
> for multiple
> This is basic fortran programming issue, you may want to consult
> some fortran programming book.
>
> A.Chan
It is more an issue with understanding the usual implementations
of the MPI Fortran bindings, namely, having to include mpif.h in
ALL procedures that wish to make use of the MPI variable
> I think the problem is that you didn't include mpif.h in testsubr().
> Hence, the value of MPI_INTEGER was undefined -- I don't think it's a
> problem with the value of MPI_Comm.
That's correct.
You also don't then need to pass MPI_Comm_World around, it is a
parameter defined in mpif-common.h w
Beyond what's documented at the FAQ (Questions 20 and 21)
http://www.open-mpi.org/faq/?category=building#build-sun-compilers
is there anything else worth tweaking for building on a
SPARC-Enterprise-T5120 with the June 2010 Express compiler
suite ?
Perhaps, instead of
-xtarget=ultra3
one should
A whole swathe of people have been made aware of the issues
that have arisen as a result of a researcher here looking to
run PISM, which sits on top of PETSc, which sits on top of
OpenMPI.
I am happy to be able to inform you that the problems we were
seeing would seem to have been arising down at
> You could confirm that it is the IPv6 loop by simply disabling IPv6
> support - configure with --disable-ipv6 and see if you still get the error
> messages
>
> Thanks for continuing to pursue this!
> Ralph
>
Yeah, but if you disable the IPv6 stuff then there's a completely
different path taken
> Just to say that I built the NetBSD OpenMPI 1.4 port from the CVS,
> so includsing all the recent work and get the exmaples to run, albeit
> still with the:
>
> opal_sockaddr2str failed:Unknown error (return code 4)
>
> non-fatal errors.
>
> As promised, I'll do bit more digging into this.
Here'
>> Are you going to upgrade the NetBSD port to build against OpenMPI 1.4
>> now that it available ? Might be a good time to check the fuzzz in the
>> existing patches.
>
> http://pkgsrc-wip.cvs.sourceforge.net/viewvc/pkgsrc-wip/wip/openmpi/Makefile
Just to say that I built the NetBSD OpenMPI 1.4 p
>> 26a27
>>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt
>>
>> I have no idea how NetBSD go about resolving such clashes in the long
>> term though?
>
> I've disabled it the same way for this time, my local package differs
> from what's in wip:
>
> --- PLIST 3 Dec 2009 10:18:00 -
>> I attach a patch, but it doesn't work and I don't see where the
>> error lies now. It may be that I'm doing something stupid.
>> It produces working OpenMPI-1.3.4 package on Dragonfly though.
>
> Ok, I'll try and merge it in to the working stuff we have here.
> I, obviously, just #ifdef'd for N
OK, it works although there are some temporary errors.
This is the NetBSD wip openmpi package as downloaded from the
webCVS a couple of days ago but with my patches as detailed
before (I have not tried comparing yours with mine as yet)
and the removal of the compilation and install of the Vampire
Aleksej
Cc: to the OpenMPI list as the oftdump clash might be of interest
elsewhere.
> I attach a patch, but it doesn't work and I don't see where the
> error lies now. It may be that I'm doing something stupid.
> It produces working OpenMPI-1.3.4 package on Dragonfly though.
Ok, I'll try and m
>> I have actually already taken the IPv6 block and simply tried to
>> replace any IPv6 stuff with IPv4 "equivalents", eg:
>
> At the risk of showing a lot of ignorance, here's the block I coddled
> together based on the IPv6 block.
>
> I have tried to keep it looking as close to the original IPv6
Oh bugger, I did miss the obvious.
The "old" code which I had ifdef'd out contained an actual
construction of the list itself.
OBJ_CONSTRUCT(&opal_if_list, opal_list_t);
If I make sure I do one of those, I now get a different
set of messages but we are back to running again.
mpirun -v -
> I would be leery of the hard-coded stuff.
Indeed, so I changed it to:
intf.if_mask = prefix( sin_addr->sin_addr.s_addr);
which seems to match what the "old" code was doing: still blowing
up though.
> Reason: the IPv6 code has been a continual source of trouble,
> while the IPv4 code has wor
> I believe this line is incorrect:
>
>>opal_list_append(&opal_if_list, (opal_list_item_t*)
>> intf_ptr);
>
> It needs to be
>
> opal_list_append(&opal_if_list, &intf_ptr->super);
Didn't seem to change things.
Any thoughts on the:
/*
* hardcoded netmask, adri
> I have actually already taken the IPv6 block and simply tried to
> replace any IPv6 stuff with IPv4 "equivalents", eg:
At the risk of showing a lot of ignorance, here's the block I coddled
together based on the IPv6 block.
I have tried to keep it looking as close to the original IPv6
block as p
> Given that it is working for us at the moment, and my current
> priorities, I doubt I'll get to this over the next 2-3 weeks.
> So if you have time and care to look at it before then, please
> do!
I have actually already taken the IPv6 block and simply tried to
replace any IPv6 stuff with IPv4
> Interesting - especially since the existing code works quite well over a
> wide range of platforms. So I'm not quite so eager to declare it incorrect
> and only working by accident.
>
> However, I would welcome a proposed patch so we can look at it. This is
> always an important area for us, so t
>> I assume that both of you have seen the reply from Aleksej Saushev,
>> who seems to be the bloke looking after the port of OpenMPI to the
>> NetBSD platform.
>>
>>
>> Aleksej suggested some mods he had partially looked at, in
>>
>> opal/util/if.c
>
> Nope - didn't see anything like that :-/
Aah
> "Jeff Squyres"
>
>
> Oy. This is ick, because this error code is coming from horrendously
> complex code deep in the depths of OMPI that is probing the OS to
> figure out what ethernet interfaces you have. It may or may not be
> simple to fix this.
>
> Do you mind diving into the OMPI code a
Hi there,
I recently compiled OpenMPI 1.3.3 for a NetBSD platform
as part of an attempt to get some MPI-based codes running
on the SGE cycle stealing grid we have in the School here.
I should point out that this has not been done within the
pkgsrc build system as yet but that I found I was able t
31 matches
Mail list logo