users-boun...@open-mpi.org a écrit sur 19/04/2012 12:42:44 :

> De : Rohan Deshpande <rohan...@gmail.com>
> A : Open MPI Users <us...@open-mpi.org>
> Date : 19/04/2012 12:44
> Objet : Re: [OMPI users] machine exited on signal 11 (Segmentation 
fault).
> Envoyé par : users-boun...@open-mpi.org
> 
> No I havent tried using valgrind. 
> 
> here is the latest code - 
> 
> #include "mpi.h"
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define  ARRAYSIZE 200000
> #define  MASTER 0
> 
> int  *data;
> 
> int main(int argc, char* argv[])
> {
> int   numtasks, taskid, rc, dest, offset, i, j, tag1, tag2, source, 
> chunksize, namelen; 
> int mysum, sum;
> int update(int myoffset, int chunk, int myid);
> char myname[MPI_MAX_PROCESSOR_NAME];
> MPI_Status status;
> double start, stop, time;
> double totaltime;
> FILE *fp;
> char line[128];
> char element;
> int n;
> int k=0;
> 
> /***** Initializations *****/
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
> if (numtasks % 4 != 0) {
>    printf("Quitting. Number of MPI tasks must be divisible by 4.\n");
>    MPI_Abort(MPI_COMM_WORLD, rc);
>    MPI_Finalize();
>    }
> MPI_Comm_rank(MPI_COMM_WORLD,&taskid); 
> MPI_Get_processor_name(myname, &namelen);
> printf ("MPI task %d has started on host %s...\n", taskid, myname);
> chunksize = (ARRAYSIZE / numtasks);
> tag2 = 1;
> tag1 = 2;
> 
> /***** Master task only ******/
> if (taskid == MASTER){
>   
>  
>   /* Initialize the array */
>   data = malloc(ARRAYSIZE * sizeof(int));
>   if(NULL == data){
>     printf("Data null");
>   }


==> This array is used in the update subroutine that is called by all 
processes
So this initialisation should be done in the common code.


>   sum = 0;
>   for(i=0; i<ARRAYSIZE; i++) {
>     data[i] =  i * 1 ;
>     sum = sum + data[i];
>     }
>   printf("Initialized array sum = %d\n",sum);
> 
>   /* Send each task its portion of the array - master keeps 1st part */
>   offset = chunksize;
>   for (dest=1; dest<numtasks; dest++) {
>     MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
>     MPI_Send(&data[offset], chunksize, MPI_INT, dest, tag2, 
MPI_COMM_WORLD);
>     printf("Sent %d elements to task %d offset= 
%d\n",chunksize,dest,offset);
>     offset = offset + chunksize;
>     }
> 
>   /* Master does its part of the work */
>   offset = 0;
>   mysum = update(offset, chunksize, taskid);
> 
>   /* Wait to receive results from each task */
>   for (i=1; i<numtasks; i++) {
>     source = i;
>     MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, 
&status);
>     MPI_Recv(&data[offset], chunksize, MPI_INT, source, tag2,
>       MPI_COMM_WORLD, &status);
>     }
> 
>   /* Get final sum and print sample results */  
>   MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD);
>  /* printf("Sample results: \n");
>   offset = 0;
>   for (i=0; i<numtasks; i++) {
>     for (j=0; j<5; j++) 
>       printf("  %d",data[offset+j]);ARRAYSIZE
>     printf("\n");
>     offset = offset + chunksize;
>     }*/
>   printf("\n*** Final sum= %d ***\n",sum);
> 
>   }  /* end of master section */
> 
> /***** Non-master tasks only *****/
> 
> if (taskid > MASTER) {
> 
>   /* Receive my portion of array from the master task */
>   start= MPI_Wtime();
>   source = MASTER;
>   MPI_Recv(&offset, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, &status);
>   MPI_Recv(&data[offset], chunksize, MPI_INT, source, 
> tag2,MPI_COMM_WORLD, &status);
> 
>   mysum = update(offset, chunksize, taskid);
>   stop = MPI_Wtime();
>   time = stop -start;
>   printf("time taken by process %d to recieve elements and caluclate
> own sum is = %lf seconds \n", taskid, time);
>   totaltime = totaltime + time;
> 
>   /* Send my results back to the master task */
>   dest = MASTER;
>   MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
>   MPI_Send(&data[offset], chunksize, MPI_INT, MASTER, tag2, 
MPI_COMM_WORLD);
> 
>   MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER, MPI_COMM_WORLD);
> 
>   } /* end of non-master */
> 
> // printf("Total time taken for distribution is -  %lf  seconds", 
totaltime);
> MPI_Finalize();
> 
> }   /* end of main */
> 
> int update(int myoffset, int chunk, int myid) {
>   int i,j; 
>   int mysum;
>   int mydata[myoffset+chunk];
>   /* Perform addition to each of my array elements and keep my sum */
>   mysum = 0;
>  /*  printf("task %d has elements:",myid);
>   for(j = myoffset; j<myoffset+chunk; j++){
>       printf("\t%d", data[j]);
>   }
>  printf("\n");*/
>   for(i=myoffset; i < myoffset + chunk; i++) {
>     
>     //data[i] = data[i] + i;
>     mysum = mysum + data[i];
>     }
>   printf("Task %d has sum = %d\n",myid,mysum);
>   return(mysum);
> }
> 
> On Thu, Apr 19, 2012 at 5:59 PM, Jeffrey Squyres <jsquy...@cisco.com> 
wrote:
> Send the most recent version of your code.
> 
> Have you tried running it through a memory-checking debugger, like 
valgrind?
> 
> 
> On Apr 19, 2012, at 4:24 AM, Rohan Deshpande wrote:
> 
> > Hi Pascal,
> >
> > The offset is received from the master task. so no need to 
> initialize for non master tasks?
> >
> > not sure what u meant exactly.
> >
> > Thanks
> >
> >
> >
> > On Thu, Apr 19, 2012 at 3:36 PM, <pascal.dev...@bull.net> wrote:
> > I do not see where you initialize the offset on the "Non-master 
> tasks". This could be the problem.
> >
> > Pascal
> >
> > users-boun...@open-mpi.org a écrit sur 19/04/2012 09:18:31 :
> >
> > > De : Rohan Deshpande <rohan...@gmail.com>
> > > A : Open MPI Users <us...@open-mpi.org>
> > > Date : 19/04/2012 09:18
> > > Objet : Re: [OMPI users] machine exited on signal 11 (Segmentation 
fault).
> > > Envoyé par : users-boun...@open-mpi.org
> > >
> > > Hi Jeffy,
> > >
> > > I checked the SEND RECV buffers and it looks ok to me. The code I
> > > have sent works only when I statically initialize the array.
> > >
> > > The code fails everytime I use malloc to initialize the array.
> > >
> > > Can you please look at code and let me know what is wrong.
> >
> > > On Wed, Apr 18, 2012 at 8:11 PM, Jeffrey Squyres <jsquy...@cisco.com
> > wrote:
> > > As a guess, you're passing in a bad address.
> > >
> > > Double check the buffers that you're sending to 
MPI_SEND/MPI_RECV/etc.
> > >
> > >
> > > On Apr 17, 2012, at 10:43 PM, Rohan Deshpande wrote:
> > >
> > > > After using malloc i am getting following error
> > > >
> > > >  *** Process received signal ***
> > > >  Signal: Segmentation fault (11)
> > > >  Signal code: Address not mapped (1)
> > > > Failing at address: 0x1312d08
> > > >  [ 0] [0x5e840c]
> > > > [ 1] /usr/local/lib/openmpi/mca_btl_tcp.so(+0x5bdb) [0x119bdb]
> > > >  /usr/local/lib/libopen-pal.so.0(+0x19ce0) [0xb2cce0]
> > > >  /usr/local/lib/libopen-pal.so.0(opal_event_loop+0x27) [0xb2cf47]
> > > >  /usr/local/lib/libopen-pal.so.0(opal_progress+0xda) [0xb200ba]
> > > >  /usr/local/lib/openmpi/mca_pml_ob1.so(+0x3f75) [0xa9ef75]
> > > >  [ 6] /usr/local/lib/libmpi.so.0(MPI_Recv+0x136) [0xea7c46]
> > > >  [ 7] mpi_array(main+0x501) [0x8048e25]
> > > > [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x2fece6]
> > > >  [ 9] mpi_array() [0x8048891]
> > > >  *** End of error message ***
> > > > [machine4][[3968,1],0][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> > > mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> > > > 
> 
--------------------------------------------------------------------------
> > > > mpirun noticed that process rank 1 with PID 2936 on node machine4
> > > exited on signal 11 (Segmentation fault).
> > > >
> > > > Can someone help please.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > On Tue, Apr 17, 2012 at 6:01 PM, Jeffrey Squyres 
<jsquy...@cisco.com
> > wrote:
> > > > Try malloc'ing your array instead of creating it statically on the
> > > stack.  Something like:
> > > >
> > > > int *data;
> > > >
> > > > int main(..) {
> > > > {
> > > >    data = malloc(ARRAYSIZE * sizeof(int));
> > > >    if (NULL == data) {
> > > >        perror("malloc");
> > > >        exit(1);
> > > >    }
> > > >    // ...
> > > > }
> > > >
> > > >
> > > > On Apr 17, 2012, at 5:05 AM, Rohan Deshpande wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to distribute large amount of data using MPI.
> > > > >
> > > > > When I exceed the certain data size the segmentation fault 
occurs.
> > > > >
> > > > > Here is my code,
> > > > >
> > > > >
> > > > > #include "mpi.h"
> > > > > #include <stdio.h>
> > > > > #include <stdlib.h>
> > > > > #include <string.h>
> > > > > #define  ARRAYSIZE    2000000
> > > > > #define  MASTER        0
> > > > >
> > > > > int  data[ARRAYSIZE];
> > > > >
> > > > >
> > > > > int main(int argc, char* argv[])
> > > > > {
> > > > > int   numtasks, taskid, rc, dest, offset, i, j, tag1, tag2,
> > > source, chunksize, namelen;
> > > > > int mysum, sum;
> > > > > int update(int myoffset, int chunk, int myid);
> > > > > char myname[MPI_MAX_PROCESSOR_NAME];
> > > > > MPI_Status status;
> > > > > double start, stop, time;
> > > > > double totaltime;
> > > > > FILE *fp;
> > > > > char line[128];
> > > > > char element;
> > > > > int n;
> > > > > int k=0;
> > > > >
> > > > >
> > > > >
> > > > > /***** Initializations *****/
> > > > > MPI_Init(&argc, &argv);
> > > > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
> > > > > MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
> > > > > MPI_Get_processor_name(myname, &namelen);
> > > > > printf ("MPI task %d has started on host %s...\n", taskid, 
myname);
> > > > > chunksize = (ARRAYSIZE / numtasks);
> > > > > tag2 = 1;
> > > > > tag1 = 2;
> > > > >
> > > > >
> > > > > /***** Master task only ******/
> > > > > if (taskid == MASTER){
> > > > >
> > > > >    /* Initialize the array */
> > > > >   sum = 0;
> > > > >   for(i=0; i<ARRAYSIZE; i++) {
> > > > >     data[i] =  i * 1 ;
> > > > >     sum = sum + data[i];
> > > > >     }
> > > > >   printf("Initialized array sum = %d\n",sum);
> > > > >
> > > > >   /* Send each task its portion of the array - master keeps 
> 1st part */
> > > > >   offset = chunksize;
> > > > >   for (dest=1; dest<numtasks; dest++) {
> > > > >     MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
> > > > >     MPI_Send(&data[offset], chunksize, MPI_INT, dest, tag2,
> > > MPI_COMM_WORLD);
> > > > >     printf("Sent %d elements to task %d offset= %d
> > > \n",chunksize,dest,offset);
> > > > >     offset = offset + chunksize;
> > > > >     }
> > > > >
> > > > >   /* Master does its part of the work */
> > > > >   offset = 0;
> > > > >   mysum = update(offset, chunksize, taskid);
> > > > >
> > > > >   /* Wait to receive results from each task */
> > > > >   for (i=1; i<numtasks; i++) {
> > > > >     source = i;
> > > > >     MPI_Recv(&offset, 1, MPI_INT, source, tag1, 
> MPI_COMM_WORLD, &status);
> > > > >     MPI_Recv(&data[offset], chunksize, MPI_INT, source, tag2,
> > > > >       MPI_COMM_WORLD, &status);
> > > > >     }
> > > > >
> > > > >   /* Get final sum and print sample results */
> > > > >   MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER, 
> MPI_COMM_WORLD);
> > > > >  /* printf("Sample results: \n");
> > > > >   offset = 0;
> > > > >   for (i=0; i<numtasks; i++) {
> > > > >     for (j=0; j<5; j++)
> > > > >       printf("  %d",data[offset+j]);ARRAYSIZE
> > > > >     printf("\n");
> > > > >     offset = offset + chunksize;
> > > > >     }*/
> > > > >   printf("\n*** Final sum= %d ***\n",sum);
> > > > >
> > > > >   }  /* end of master section */
> > > > >
> > > > >
> > > > > #include <stdlib.h>
> > > > > /***** Non-master tasks only *****/
> > > > >
> > > > > if (taskid > MASTER) {
> > > > >
> > > > >   /* Receive my portion of array from the master task */
> > > > >   start= MPI_Wtime();
> > > > >   source = MASTER;
> > > > >   MPI_Recv(&offset, 1, MPI_INT, source, tag1, 
> MPI_COMM_WORLD, &status);
> > > > >   MPI_Recv(&data[offset], chunksize, MPI_INT, source,
> > > tag2,MPI_COMM_WORLD, &status);
> > > > >
> > > > >   mysum = update(offset, chunksize, taskid);
> > > > >   stop = MPI_Wtime();
> > > > >   time = stop -start;
> > > > >   printf("time taken by process %d to recieve elements and
> > > caluclate own sum is = %lf seconds \n", taskid, time);
> > > > >   totaltime = totaltime + time;
> > > > >
> > > > >   /* Send my results back to the master task */
> > > > >   dest = MASTER;
> > > > >   MPI_Send(&offset, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD);
> > > > >   MPI_Send(&data[offset], chunksize, MPI_INT, MASTER, tag2,
> > > MPI_COMM_WORLD);
> > > > >
> > > > >   MPI_Reduce(&mysum, &sum, 1, MPI_INT, MPI_SUM, MASTER, 
> MPI_COMM_WORLD);
> > > > >
> > > > >   } /* end of non-master */
> > > > >
> > > > > // printf("Total time taken for distribution is -  %lf
> > >  seconds", totaltime);
> > > > > MPI_Finalize();
> > > > >
> > > > > }   /* end of main */
> > > > >
> > > > >
> > > > > int update(int myoffset, int chunk, int myid) {
> > > > >   int i,j;
> > > > >   int mysum;
> > > > >   int mydata[myoffset+chunk];
> > > > >   /* Perform addition to each of my array elements and keep my 
sum */
> > > > >   mysum = 0;
> > > > >  /*  printf("task %d has elements:",myid);
> > > > >   for(j = myoffset; j<myoffset+chunk; j++){
> > > > >       printf("\t%d", data[j]);
> > > > >   }
> > > > >  printf("\n");*/
> > > > >   for(i=myoffset; i < myoffset + chunk; i++) {
> > > > >
> > > > >     //data[i] = data[i] + i;
> > > > >     mysum = mysum + data[i];
> > > > >     }
> > > > >   printf("Task %d has sum = %d\n",myid,mysum);
> > > > >   return(mysum);
> > > > > }
> > > > >
> > > > >
> > > > > When I run it with ARRAYSIZE = 2000000 The program works fine.
> > > But when I increase the size ARRAYSIZE = 20000000. The program ends
> > > with segmentation fault.
> > > > > I am running it on a cluster (machine 4 is master, machine 5,6
> > > are slaves)  and np=20
> > > > >
> > > > > MPI task 0 has started on host machine4
> > > > > MPI task 2 has started on host machine4
> > > > > MPI task 3 has started on host machine4
> > > > > MPI task 14 has started on host machine4
> > > > > MPI task 8 has started on host machine6
> > > > > MPI task 10 has started on host machine6
> > > > > MPI task 13 has started on host machine4
> > > > > MPI task 4 has started on host machine5
> > > > > MPI task 6 has started on host machine5
> > > > > MPI task 7 has started on host machine5
> > > > > MPI task 16 has started on host machine5
> > > > > MPI task 11 has started on host machine6
> > > > > MPI task 12 has started on host machine4
> > > > > MPI task 5 has started on hostmachine5
> > > > > MPI task 17 has started on host machine5
> > > > > MPI task 18 has started on host machine5
> > > > > MPI task 15 has started on host machine4
> > > > > MPI task 19 has started on host machine5
> > > > > MPI task 1 has started on host machine4
> > > > > MPI task 9 has started on host machine6
> > > > > Initialized array sum = 542894464
> > > > > Sent 1000000 elements to task 1 offset= 1000000
> > > > > Task 1 has sum = 1055913696
> > > > > time taken by process 1 to recieve elements and caluclate own
> > > sum is = 0.249345 seconds
> > > > > Sent 1000000 elements to task 2 offset= 2000000
> > > > > Sent 1000000 elements to task 3 offset= 3000000
> > > > > Task 2 has sum = 328533728
> > > > > time taken by process 2 to recieve elements and caluclate own
> > > sum is = 0.274285 seconds
> > > > > Sent 1000000 elements to task 4 offset= 4000000
> > > > > 
> 
--------------------------------------------------------------------------
> > > > > mpirun noticed that process rank 3 with PID 5695 on node
> > > machine4 exited on signal 11 (Segmentation fault).
> > > > >
> > > > > Any idea what could be wrong here?
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > ROHAN DESHPANDE
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > us...@open-mpi.org
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > jsquy...@cisco.com
> > > > For corporate legal information go to: http://www.cisco.com/web/
> > > about/doing_business/legal/cri/
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to: http://www.cisco.com/web/
> > > about/doing_business/legal/cri/
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards,
> > >
> > > ROHAN DESHPANDE
> > >
> >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > ROHAN DESHPANDE
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

> 
> -- 
> 
> Best Regards,
> 
> ROHAN DESHPANDE  
> 

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to