Brett Pemberton wrote:
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0
I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with
al
Bogdan Costescu wrote:
Brett Pemberton wrote:
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0
I've seen this error with Mellanox ConnectX cards
Hi,
I am trying to build openmpi 1.3 on Cent_OS with gcc and the lahey f95
compiler with the following configuration:
./configure F77=/share/apps/lf6481/bin/lfc FC=/share/apps/lf6481/bin/lfc
--prefix=/opt/openmpi-1.3_lfc
When I "make install all" the process fails to build libmpi_f90.la
Hi,
In one of my applications I am using cascaded derived MPI datatypes
created with MPI_Type_struct. One of these types is used to just send
a part (one MPI_Char) of a struct consisting of an int followed by two
chars. I.e, the int at the beginning is/should be ignored.
This works fine if I use
Can you please send all the information listed here:
http://www.open-mpi.org/community/help/
On Feb 27, 2009, at 6:38 AM, Tiago Silva wrote:
Hi,
I am trying to build openmpi 1.3 on Cent_OS with gcc and the lahey
f95 compiler with the following configuration:
./configure F77=/share/a
ok, here is the complete output in the tgz file attached. The output is
slightly different as I am now only using "make all" and not installing.
I did a full "make clean" and "rm -fr /*" and the
already exists but is empty.
Thanks
ts-output.tgz
Description: Binary data
Hi
It seems to me more like time issues.
All the runs end with something similar to
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x45485308
Crashed Thread: 0
Thread 0 Crashed:
0 libSystem.B.dylib 0x95208f04 strcmp + 84
1 libopen-rte.
Find the problem
in
orte_pls_xgrid_terminate_orteds
orte_pls_base_get_active_daemons is been call as
orte_pls_base_get_active_daemons(&daemons, jobid)
when the correct way of doing it is
orte_pls_base_get_active_daemons(&daemons, jobid, attrs)
yours.
Ricardo
Hi
It seems to me more like time
I just tried trunk-1.4a1r20458 and I did not see this error, although my
configuration was rather different. I ran across 100 2-CPU sparc nodes,
np=256, connected with TCP.
Hopefully George's comment helps out with this issue.
One other thought to see whether SGE has anything to do with thi
Hello, and thanks for both replies,
I've tried to run non-mpi program but i still measured some latency time
before starting, something around 2 seconds this time.
SSH should be properly configured, in fact i can login to both machines
without password; openmpi and mvapich use ssh as default.
i'v
Hello, i'm posting here another problem of my installation
I wanted to benchmark the differences between tcp and openib transport
if i run a simple non mpi application i get
randori ~ # mpirun --mca btl tcp,self -np 2 -host randori -host tatami
hostname
randori
tatami
but as soon as i switch to
I'm not entirely sure what is causing the problem here, but one thing
does stand out. You have specified two -host options for the same
application - this is not our normal syntax. The usual way of
specifying this would be:
mpirun --mca btl tcp,self -np 2 -host randori,tatami hostname
I'
2009/2/26 Brett Pemberton :
> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> number 12 for wr_id 38996224 opcode 0 qp_idx 0
What OS are you using? I've seen this error and many other Infiniban
On Fri, 2009-02-27 at 09:54 -0700, Matt Hughes wrote:
> 2009/2/26 Brett Pemberton :
> > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> > number 12 for wr_id 38996224 opcode 0 qp_idx 0
>
> Wha
Usually "retry exceeded error" points to some network issues, like bad
cable or some bad connector. You may use ibdiagnet tool for the network
debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED.
Pasha
Brett Pemberton wrote:
Hey,
I've had a couple of errors recently, of
Hello
I am looking for a way to set environment variable with different value on
each node before running MPI executable. (not only export the environment
variable !)
Let's consider that I have cluster with two nodes (n001 and n002) and I want
to set the environment variable GMON_OUT_PREFIX with d
2009/2/27 Nicolas Deladerriere :
> I am looking for a way to set environment variable with different value on
> each node before running MPI executable. (not only export the environment
> variable !)
I typically use a script for things like this. So instead of
specifying your executable directly
Dear All,
I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1
with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have
built my application.
The linux box is 2Xamd64 quad. In the middle of running of my application
(after some 15 iterations), I receive the
On Thu, Feb 26, 2009 at 08:27:15PM -0700, Justin wrote:
> Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any
> known issues with this version and valgrid?
For a now-forgotten reason, I ditched the openmpi that comes on Debian
etch, and installed 1.2.8 in /usr/local.
HTH,
Do
Dear All,
I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1
with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have
built my application.
The linux box is 2Xamd64 quad. In the middle of running of my application
(after some 15 iterations), I receive the
Matt,
Thanks for your solution, but I thought about that and it is not really
convenient in my configuration to change the executable on each node.
I would like to change only mpirun command.
2009/2/27 Matt Hughes
>
> 2009/2/27 Nicolas Deladerriere :
> > I am looking for a way to set environm
I don't know if anyone has tried OMPI on HP-UX, sorry.
On Feb 26, 2009, at 9:14 AM, Nader wrote:
Hello,
Does anyone has installed OMPI on a HPUX system?
I do apprciate any info.
Best Regards.
Nader
___
users mailing list
us...@open-mpi.org
http:/
With further investigation, I have reproduced this problem. I think I
was originally testing against a version that was not recent enough. I
do not see it with r20594 which is from February 19. So, something must
have happened over the last 8 days. I will try and narrow down the issue.
Rol
On Feb 27, 2009, at 12:09 PM, Åke Sandgren wrote:
We see these errors fairly frequently on our CentOS 5.2 system with
Mellanox InfiniHost III cards. The OFED stack is whatever the
CentOS5.2
uses. Has anyone tested that with the 1.4 OFED stack?
FWIW, I have tested OMPI's openib BTL with sev
Hello, i ve corrected the syntax and added the flag you suggested, but
unfortunately the result doen't change.
randori ~ # mpirun --display-map --mca btl tcp,self -np 2 -host
randori,tatami graph
[randori:22322] Map for job: 1Generated by mapping mode: byslot
Starting vpid: 0Vpid ra
Unfortunately, I think I have reproduced the problem as well -- with
SVN trunk HEAD (r20655):
[15:12] svbu-mpi:~/mpi % mpirun --mca bogus foo --bynode -np 2 uptime
[svbu-mpi.cisco.com:24112] [[62779,0],0] ORTE_ERROR_LOG: Data unpack
failed in file base/odls_base_default_fns.c at line 566
---
I notice the following:
- you're creating an *enormous* array on the stack. you might be
better allocating it on the heap.
- the value of "exchanged" will quickly grow beyond 2^31 (i.e.,
MAX_INT) which is the max that the MPI API can handle. Bad Things can/
will happen beyond that value (i
27 matches
Mail list logo