Thanks for your help. kfc is machine name and clement is the username
of this machine. Do you think it is the problem?
Then I tried to remove kfc machine and run again. This time I can run
mpi program and there is no error message output, but it is no program
output too. I think it is som
One minor thing that I notice in your ompi_info output -- your build
and run machines are different (kfc vs. clement).
Are these both FC4 machines, or are they different OS's/distros?
On Nov 10, 2005, at 10:01 AM, Clement Chu wrote:
[clement@kfc TestMPI]$ mpirun -d -np 2 test
[kfc:29199] pr
The name of the launcher is "rsh", but it actually defaults to trying
to fork/exec ssh.
Unfortunately, your backtrace doesn't tell much because there are no
debugging symbols. Can you recompile OMPI with debugging enabled and
send a new backtrace? Use:
./configure CFLAGS=-g
Clarification on this -- my earlier response wasn't quite right...
We actually do not provide F90 bindings for MPI_Reduce (and several
other collectives) because they have 2 user-provided buffers. This
means that for N intrinsic types, there are N^2 possible overloads for
this function (becau
Mike,
I believe this issue has been corrected on the trunk, and should
be in the next release candidate, probably by the end of the week.
Thanks,
Tim
Mike Houston wrote:
mpirun -mca btl_mvapi_rd_min 128 -mca btl_mvapi_rd_max 256 -np 2
-hostfile /u/mhouston/mpihosts mpi_bandwidth 21 131072
13
there is the backtrace result: (now i am using 8085)
Does mpirun start rsh?? I think I need ssh instead of rsh.
[clement@kfc tmp]$ gdb mpirun core.17766
GNU gdb Red Hat Linux (6.3.0.0-1.21rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public L
[clement@kfc TestMPI]$ mpirun -d -np 2 test
[kfc:29199] procdir: (null)
[kfc:29199] jobdir: (null)
[kfc:29199] unidir: /tmp/openmpi-sessions-clement@kfc_0/default-universe
[kfc:29199] top: openmpi-sessions-clement@kfc_0
[kfc:29199] tmp: /tmp
[kfc:29199] [0,0,0] setting up session dir with
[kfc:291
I'm sorry -- I wasn't entirely clear:
1. Are you using a 1.0 nightly tarball or a 1.1 nightly tarball? We
have made a bunch of fixes to the 1.1 tree (i.e., the Subversion
trunk), but have not fully vetted them yet, so they have not yet been
taken to the 1.0 release branch yet. If you have no
I have tried the latest version (rc5 8053), but the error is still here.
Jeff Squyres wrote:
We've actually made quite a few bug fixes since RC4 (RC5 is not
available yet). Would you mind trying with a nightly snapshot tarball?
(there were some SVN commits last night after the nightly snaps
Here's a core I'm getting...
[humphrey@zelda01 humphrey]$ mpiexec --mca btl_tcp_if_include eth0 --mca
oob_tcp_include eth0 -np 2 a.out
mpiexec noticed that job rank 1 with PID 20028 on node "localhost" exited on
signal 11.
1 process killed (possibly by Open MPI)
[humphrey@zelda01 humphrey]$ gdb
By the way, it just *feels* like a race condition somewhere, because the
very next invocation worked (I ctrl-C'd it)...
[humphrey@zelda01 humphrey]$ mpiexec -d --mca btl_tcp_if_include eth0 --mca
oob_tcp_include eth0 -np 2 a.out
[zelda01.localdomain:19923] procdir: (null)
[zelda01.localdomain:19
I'm not seeing any cores -- I'll see if there's anything stopping them from
being produced.
I've attached to one of the hanging "a.out"s (this is with the "mpiexec"
invocation that includes "--mca oob_tcp_include eth0")
(gdb) bt
#0 0x001e3007 in sched_yield () from /lib/tls/libc.so.6
#1 0x00512
Great Leaping Lizards, Batman!
Unbelievably, the MPI_Reduce interfaces were left out. I'm going to go
a complete F90 audit right now to ensure that no other interfaces were
unintentionally excluded; I'll commit a fix today.
Thanks for catching this!
On Nov 9, 2005, at 8:15 PM, Brent LEBACK
We've actually made quite a few bug fixes since RC4 (RC5 is not
available yet). Would you mind trying with a nightly snapshot tarball?
(there were some SVN commits last night after the nightly snapshot was
made; I've just initiated another snapshot build -- r8085 should be on
the web site
Hi,
I got an error when tried the mpirun on mpi program. The following is
the error message:
[clement@kfc TestMPI]$ mpicc -g -o test main.c
[clement@kfc TestMPI]$ mpirun -np 2 test
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited on
signal 11.
[kfc:28466] ERROR: A daemon on
15 matches
Mail list logo