I'm testing a couple of applications with OpenMPI v1.2b, using over 1000
processors, and am getting TCP errors. These apps ran fine for a lesser
number of processors.
The errors can be different for different runs. Here's one:
[blade90][0,1,223][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
for some reason, i am getting intermittent process crashing in
MPI_Bcast. i run my program, which distributes some data via lots
(thousands or more ) of 64k MPI_Bcast calls. the program that is
crashing is fairly big, and it would take some time to widdle down a
small example program. i *am* willi
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Jeff Squyres
>
> Bummer. FWIW, we have some internal testing OMPI Fortran codes that
> are difficult to get to compile uniformly across all Fortran
> compilers (the most difficult po
Yes, only the first segfault is fixed in the nightly builds. You can
run mx_endpoint_info to see how many endpoints are available and if
any are in use.
As far as the segfault you are seeing now, I am unsure what is
causing it. Hopefully someone who knows more about that area of the
code
Thanks Ralph. I will wait for your Torque dynamic host addition solution.
Prakash
>>> r...@lanl.gov 04/02/07 1:00 PM >>>
Hi Prakash
This is telling you that you have an error in the comm_spawn command itself.
I am no expert there, so I'll have to let someone else identify it for you.
There are
Hi Prakash
This is telling you that you have an error in the comm_spawn command itself.
I am no expert there, so I'll have to let someone else identify it for you.
There are no limits to launching on nodes in a hostfile - they are all
automatically considered "allocated" when the file is read. If
Hello,
Thanks for the patch. I still do not know the internals of Open MPI, so can't
test this right away. But here is another test I ran and that failed too.
I have now removed Torque from the equation. I am NOT requesting nodes through
Torque. I SSH to a compute node and start up the code as
No offense, but I would definitely advise against that path. There are
other, much simpler solutions to dynamically add hosts.
We *do* allow dynamic allocation changes - you just have to know how to do
them. Nobody asked before... ;-) Future variations will include an even
simpler, single API sol
Ralph Castain a écrit :
> The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn
> processes on nodes outside of your allocation. This is for several reasons,
> but primarily because (a) we only know about the nodes that were allocated,
> so we have no idea how to spawn a proc
Thanks for the info, Ralph. It is as I thought, but was hoping wouldn't
be that way.
I am requesting more nodes from the resource manager from inside of my
application code using the RM's API. when I know they are available
(allocated by the RM), I am trying to split the application data across
the
The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn
processes on nodes outside of your allocation. This is for several reasons,
but primarily because (a) we only know about the nodes that were allocated,
so we have no idea how to spawn a process anywhere else, and (b) most
Hello,
I have built Open MPI (1.2) with run-time environment enabled for Torque
(2.1.6) resource manager. Initially I am requesting 4 nodes (1 CPU each)
from Torque. The from inside of my MPI code I am trying to spawn more
processes to nodes outside of Torque-assigned nodes using
MPI_Comm_spawn, b
Apologies if you received multiple copies of this posting.
Please feel free to distribute it to those who might be interested.
Hot In
On Apr 1, 2007, at 12:50 PM, de Almeida, Valmor F. wrote:
I would be interested in hearing folk's experiences with gfortran and
ompi-1.2. Is gfortran good enough for prime time?
It is my understanding that gfortran is the GNU path forward -- they
are not spending any time on g77. I've been
Hi Tim,
I installed the openmpi-1.2.1a0r14178 tarball (took this opportunity to
use the intel fortran compiler instead gfortran). With a simple test it
seems to work but note the same messages
->mpirun -np 8 -machinefile mymachines a.out
[x1:25417] mca_btl_mx_init: mx_open_endpoint() failed wit
15 matches
Mail list logo