Hi
add the following line in /etc/openmpi-mca-params.conf
btl=^openib
- Original Message -
From: "Jeff Squyres"
To: "Open MPI Users"
Sent: Friday, January 11, 2008 12:32:10 AM (GMT+0330) Asia/Tehran
Subject: Re: [OMPI users] openib problems
This can mean that y
This can mean that you have a user-level libibverbs and kernel mismatch.
Do any of the OFED sample programs work properly, or perhaps the
ibv_devinfo program? (ibv_devinfo should query the HCAs on your host
and list the status of all the ports)
On Jan 10, 2008, at 2:33 PM, Brock Palen wr
We just updated rhel4 a few days back and now we get the following
errors when trying to run on infiniband nodes with openmpi-1.2.3 and
openmpi-1.2.0
[0,1,1]: OpenIB on host nyx397 was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performan
Hi Guys,
The alternative to THREAD_MULTIPLE problem is to use --mca
mpi_leave_pinned 1 to mpirun option. This will ensure 1 RDMA operation contrary
to splitting data in MAX RDMA size (default to 1MB).
If your data size is small say below 1 MB, program will run well with
THREAD_MULTIPLE. P
Jeff thanks for all the reply's,
Hate to admit but at the moment we can't log onto the switch.
But the ibcheckerrors command returns nothing out of bounds, and i
think that command also checks the switch ports.
Thanks, we will do some tests
Brock Palen
Center for Advanced Computing
bro...@u
e-
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Andrew Friedley
> Sent: Wednesday, November 28, 2007 9:36 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenIB problems
>
> What value do you suggest then? I know I've seen th
What value do you suggest then? I know I've seen the problem persist at
values of 14 and 16, and would rather be certain that this isn't going
to kill the job that just sat in the queue for a week.
Andrew
Jeff Squyres wrote:
Roland thought that the default value of 10 might be a bit too low a
Roland thought that the default value of 10 might be a bit too low and
that tuning it to be higher, particularly in apps that pound on a
single port, would probably be acceptable.
Tuning up to 20 is probably a bit overkill.
On Nov 27, 2007, at 3:54 PM, Jeff Squyres wrote:
BTW, Andrew is c
BTW, Andrew is correct about the unit for btl_openib_ib_timeout and
that the value is simply passed down to the verbs library when making
an IB connection. Open MPI does nothing else with that value; it's an
IBTA-defined value.
The help message was wrong on the 1.2 branch for a while; I th
Sorry for jumping in late; the holiday and other travel prevented me
from getting to all my mail recently... :-\
Have you checked the counters on the subnet manager to see if any
other errors are occurring? It might be good to clear all the
counters, run the job, and see if the counters a
Ok i will open a case with cisco,
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Nov 27, 2007, at 4:19 PM, Andrew Friedley wrote:
Brock Palen wrote:
What would be a place to look? Should this just be default then
for
OMPI? ompi_info shows the default as 10
Brock Palen wrote:
What would be a place to look? Should this just be default then for
OMPI? ompi_info shows the default as 10 seconds? Is that right
'seconds' ?
The other IB guys can probably answer better than I can -- I'm not an
expert in this part of IB (or really any part I guess :).
What would be a place to look? Should this just be default then for
OMPI? ompi_info shows the default as 10 seconds? Is that right
'seconds' ?
The other IB guys can probably answer better than I can -- I'm not an
expert in this part of IB (or really any part I guess :). Not sure
why
a
Brock Palen wrote:
On Nov 27, 2007, at 10:49 AM, Andrew Friedley wrote:
Brock Palen wrote:
On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
The user used this option and it allowed the run to com
On Nov 27, 2007, at 10:49 AM, Andrew Friedley wrote:
Brock Palen wrote:
On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
The user used this option and it allowed the run to complete.
You say its
Brock Palen wrote:
On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
The user used this option and it allowed the run to complete.
You say its a issue with the fabric ibshowerrors does not show any
On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
The user used this option and it allowed the run to complete.
You say its a issue with the fabric ibshowerrors does not show any
problems.
Its to
Hi Andrew, Brock, and everyone else,
Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
Just FYI, in addition to the above, I retried using the gigabit links
('--mca btl tcp,self', right?) and that failed too, so at least in /m
Thanks,
We have asked the user to try that and let us know if it fails. I
will let the list know if this works.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
If this is what I think it is, try using this MCA par
Hi Brock
We have a user whos code keep failing at a similar point in the
code. The errors (below) would make me think its a fabric problem,
but ibcheckerrors is not returning any issues. He is using
openmpi-1.2.0 With OFED on RHEL4,
Strangely enough, I hit this exact problem about half an
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
If this fixes it -- I don't fully understand what's going on, but it's
an issue in the IB fabrics itself. Someone else might be able to
explain in more detail..
Andrew
Brian Dobbins wrote:
Hi Brock
Hi Brock
We have a user whos code keep failing at a similar point in the
code. The errors (below) would make me think its a fabric problem,
but ibcheckerrors is not returning any issues. He is using
openmpi-1.2.0 With OFED on RHEL4,
Strangely enough, I hit this exact problem about ha
We have a user whos code keep failing at a similar point in the
code. The errors (below) would make me think its a fabric problem,
but ibcheckerrors is not returning any issues. He is using
openmpi-1.2.0 With OFED on RHEL4,
Far field AIM propagators require(MB):1.441955566406250
Arra
23 matches
Mail list logo