Hi,
Administrators changed our cluster network topology, and now it has
narrowly-scoped netmasks for internal and outside part of the cluster.
Of course my soft stopped working giving an error during MPI_Init, then
I checked in the FAQ:
How does Open MPI know which TCP addresses are routable
I had almost the same situation when I upgraded OpenMPI from very old
version to 1.2.2. All processes seemed to stuck in MPI_Barrier, as a
walk-around I just commented out all MPI_Barrier occurrences in my
program and it started to work perfectly.
greets, Marcin
Chris Reeves wrote:
(This tim
Hello,
After whole day of coding I'm fighting little bit with one small
fragment which seems strange for me.
For testing I have one head node and two worker nodes on localhost.
Having this code (with debug stuff added like sleeps, barriers, etc):
void CImageData::SpreadToNodes()
{
sleep(5)
Sorry I forgot to mention: Open MPI version 1.2.4
Marcin Skoczylas wrote:
Hello,
After whole day of coding I'm fighting little bit with one small
fragment which seems strange for me.
For testing I have one head node and two worker nodes on localhost.
Having this code (with debug stuff
shows the same behavior.
And yes, the Send and Recv can work independently from the
Broadcast as they are using different tags to match up their data.
Rolf
PS: Simple program at end of message.
Marcin Skoczylas wrote:
Sorry I forgot to mention: Open MPI version 1.2.4
Marcin Skoczylas
Hello,
I'm having troubles to run my software after our administrators changed
the cluster configuration. It was working perfectly before, however now
I get these errors:
$ mpirun --hostfile ./../hostfile -np 10 ./src/smallTest
-
Jeff Squyres wrote:
On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote:
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_
te network.
So in this configuration, one of worker nodes became a head, and
cluster's head is not being used at all.
That solved problem.
Thank you for your support!
regards, Marcin Skoczylas
OpenMPI 1.2.4
mpirun noticed that job rank 0 with PID 19021 on node pc801 exited on
signal 15 (Terminated).
11 additional processes aborted (not shown)
(gdb) bt
#0 0x411b776c in mca_pml_ob1_recv_frag_match () from
/usr/local/openmpi//lib/openmpi/mca_pml_ob1.so
#1 0x411ce010 in mca_btl_sm_co
rsive calls to
opal_progress by the SM BTL that
the yield within opal_progress (intending to give up the cpu to others)
doesn't always work for all
OSes.
--td
--
Message: 1
Date: Tue, 13 Nov 2007 12:26:43 +0100
Fr
Dear open-mpi users,
I saw some posts ago almost the same question as I have, but it didn't
give me satisfactional answer.
I have setup like this:
GUI program on some machine (f.e. laptop)
Head listening on tcpip socket for commands from GUI.
Workers waiting for commands from Head / processing
Brian Barrett wrote:
On Jul 5, 2006, at 8:54 AM, Marcin Skoczylas wrote:
I saw some posts ago almost the same question as I have, but it didn't
give me satisfactional answer.
I have setup like this:
GUI program on some machine (f.e. laptop)
Head listening on tcpip socket for commands
hello,
recently my administrator made some changes on our cluster and now I
have a crash during MPI_Barrier:
[our-host:12566] *** Process received signal ***
[our-host:12566] Signal: Segmentation fault (11)
[our-host:12566] Signal code: Address not mapped (1)
[our-host:12566] Failing at addres
during the execution...
although, a segfault really should not occur.
Thanks,
Jelena
On Tue, 29 May 2007, Marcin Skoczylas wrote:
hello,
recently my administrator made some changes on our cluster and now I
have a crash during MPI_Barrier:
[our-host:12566] *** Process received signal ***
[
14 matches
Mail list logo