Hello, i ve corrected the syntax and added the flag you suggested, but
unfortunately the result doen't change.

randori ~ # mpirun --display-map --mca btl tcp,self  -np 2 -host
randori,tatami graph
[randori:22322]  Map for job: 1    Generated by mapping mode: byslot
     Starting vpid: 0    Vpid range: 2    Num app_contexts: 1
     Data for app_context: index 0    app: graph
         Num procs: 2
         Argv[0]: graph
         Env[0]: OMPI_MCA_btl=tcp,self
         Env[1]: OMPI_MCA_rmaps_base_display_map=1
         Env[2]:
OMPI_MCA_orte_precondition_transports=d45d47f6e1ed0e0b-691fd7f24609dec3
         Env[3]: OMPI_MCA_rds=proxy
         Env[4]: OMPI_MCA_ras=proxy
         Env[5]: OMPI_MCA_rmaps=proxy
         Env[6]: OMPI_MCA_pls=proxy
         Env[7]: OMPI_MCA_rmgr=proxy
         Working dir: /root (user: 0)
         Num maps: 1
         Data for app_context_map: Type: 1    Data: randori,tatami
     Num elements in nodes list: 2
     Mapped node:
         Cell: 0    Nodename: randori    Launch id: -1    Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME    Data Value: NULL
         Oversubscribed: False    Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME    Data Value: [0,1,0]
             Proc Rank: 0    Proc PID: 0    App_context index: 0

     Mapped node:
         Cell: 0    Nodename: tatami    Launch id: -1    Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME    Data Value: NULL
         Oversubscribed: False    Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME    Data Value: [0,1,1]
             Proc Rank: 1    Proc PID: 0    App_context index: 0
Master thread reporting
matrix size 33554432 kB, time is in [us]

(and then it just hangs)

Vittorio

On Fri, Feb 27, 2009 at 6:00 PM, <users-requ...@open-mpi.org> wrote:

>
> Date: Fri, 27 Feb 2009 08:22:17 -0700
> From: Ralph Castain <r...@lanl.gov>
> Subject: Re: [OMPI users] TCP instead of openIB doesn't work
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <e3c4683c-1f97-4558-ab68-006e39a83...@lanl.gov>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> I'm not entirely sure what is causing the problem here, but one thing
> does stand out. You have specified two -host options for the same
> application - this is not our normal syntax. The usual way of
> specifying this would be:
>
> mpirun  --mca btl tcp,self  -np 2 -host randori,tatami hostname
>
> I'm not entirely sure what OMPI does when it gets two separate -host
> arguments - could be equivalent to the above syntax, but could also
> cause some unusual behavior.
>
> Could you retry your job with the revised syntax? Also, could you add
> --display-map to your mpirun cmd line? This will tell us where OMPI
> thinks the procs are going, and a little info about how it interpreted
> your cmd line.
>
> Thanks
> Ralph
>
>
> On Feb 27, 2009, at 8:00 AM, Vittorio Giovara wrote:
>
> > Hello, i'm posting here another problem of my installation
> > I wanted to benchmark the differences between tcp and openib transport
> >
> > if i run a simple non mpi application i get
> > randori ~ # mpirun  --mca btl tcp,self  -np 2 -host randori -host
> > tatami hostname
> > randori
> > tatami
> >
> > but as soon as i switch to my benchmark program i have
> > mpirun  --mca btl tcp,self  -np 2 -host randori -host tatami graph
> > Master thread reporting
> > matrix size 33554432 kB, time is in [us]
> >
> > and instead of starting the send/receive functions it just hangs
> > there; i also checked the transmitted packets with wireshark but
> > after the handshake no more packets are exchanged
> >
> > I read in the archives that there were some problems in this area
> > and so i tried what was suggested in previous emails
> >
> > mpirun --mca btl ^openib  -np 2 -host randori -host tatami graph
> > mpirun --mca pml ob1  --mca btl tcp,self  -np 2 -host randori -host
> > tatami graph
> >
> > gives exactly the same output as before (no mpisend/receive)
> > while the next commands gives something more interesting
> >
> > mpirun --mca pml cm  --mca btl tcp,self  -np 2 -host randori -host
> > tatami graph
> >
> --------------------------------------------------------------------------
> > No available pml components were found!
> >
> > This means that there are no components of this type installed on your
> > system or all the components reported that they could not be used.
> >
> > This is a fatal error; your MPI process is likely to abort.  Check the
> > output of the "ompi_info" command and ensure that components of this
> > type are available on your system.  You may also wish to check the
> > value of the "component_path" MCA parameter and ensure that it has at
> > least one directory that contains valid MCA components.
> >
> >
> --------------------------------------------------------------------------
> > [tatami:06619] PML cm cannot be selected
> > mpirun noticed that job rank 0 with PID 6710 on node randori exited
> > on signal 15 (Terminated).
> >
> > which is not possible as if i do ompi_info --param all there is the
> > CM pml component
> >
> >                  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
> >                  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
> >
> >
> > my test program is quite simple, just a couple of MPI_Send and
> > MPI_Recv (just after the signature)
> > do you have any ideas that might help me?
> > thanks a lot
> > Vittorio
> >
> > ========================
> > #include "mpi.h"
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <math.h>
> >
> > #define M_COL 4096
> > #define M_ROW 524288
> > #define NUM_MSG 25
> >
> > unsigned long int  gigamatrix[M_ROW][M_COL];
> >
> > int main (int argc, char *argv[]) {
> >     int numtasks, rank, dest, source, rc, tmp, count, tag=1;
> >     unsigned long int  exp, exchanged;
> >     unsigned long int i, j, e;
> >     unsigned long matsize;
> >     MPI_Status Stat;
> >     struct timeval timing_start, timing_end;
> >     double inittime = 0;
> >     long int totaltime = 0;
> >
> >     MPI_Init (&argc, &argv);
> >     MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
> >     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
> >
> >
> >     if (rank == 0) {
> >         fprintf (stderr, "Master thread reporting\n", numtasks - 1);
> >         matsize = (long) M_COL * M_ROW / 64;
> >         fprintf (stderr, "matrix size %d kB, time is in [us]\n",
> > matsize);
> >
> >         source = 1;
> >         dest = 1;
> >
> >         /*warm up phase*/
> >         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> >         rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> >         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> >         rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> >
> >         for (e = 0; e < NUM_MSG; e++) {
> >             exp = pow (2, e);
> >             exchanged = 64 * exp;
> >
> >             /*timing of ops*/
> >             gettimeofday (&timing_start, NULL);
> >             rc = MPI_Send (&gigamatrix[0], exchanged,
> > MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
> >             rc = MPI_Recv (&gigamatrix[0], exchanged,
> > MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
> >             gettimeofday (&timing_end, NULL);
> >
> >             totaltime = (timing_end.tv_sec - timing_start.tv_sec) *
> > 1000000 + (timing_end.tv_usec - timing_start.tv_usec);
> >             memset (&timing_start, 0, sizeof(struct timeval));
> >             memset (&timing_end, 0, sizeof(struct timeval));
> >             fprintf (stdout, "%d kB\t%d\n", exp, totaltime);
> >         }
> >
> >         fprintf(stderr, "task complete\n");
> >
> >     } else {
> >         if (rank >= 1) {
> >             dest = 0;
> >             source = 0;
> >
> >             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >             rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> > MPI_COMM_WORLD);
> >             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >             rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> > MPI_COMM_WORLD);
> >             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >
> >             for (e = 0; e < NUM_MSG; e++) {
> >                 exp = pow (2, e);
> >                 exchanged = 64 * exp;
> >
> >                 rc = MPI_Recv (&gigamatrix[0], (unsigned)
> > exchanged, MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
> >                 rc = MPI_Send (&gigamatrix[0], (unsigned)
> > exchanged, MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
> >
> >             }
> >         }
> >     }
> >
> >     MPI_Finalize ();
> >
> >     return 0;
> > }
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to