Re: [OMPI users] Segmentation fault with MPI_Type_indexed

Bogdan Sataric Thu, 5 Mar 2015 18:30:00 -0500 (EST)

Hello Tom,

Actually I have tried using: MPI_Type_Create_Hindexed but the same problem
persisted for the same matrix dimensions.


Displacements array values are not a problem. Matrix of a size 800x640x480
creates type that is a bit less then 4GB large in case of complex datatype.
It definitely fits in 32 bit range. So it is not an 32-64 bit issue, at
least not for the value of displacements in this case.

Regards,

----

Bogdan Sataric

email: bogdan.sata...@gmail.com
phone: +381 21-485-2441

Teaching & Research Assistant
Chair for Applied Computer Science
Faculty of Technical Sciences, Novi Sad, Serbia

On Thu, Mar 5, 2015 at 8:13 PM, Tom Rosmond <rosm...@reachone.com> wrote:

> Actually, you are not the first to encounter the problem with
> 'MPI_Type_indexed' for very large datatypes.  I also run with a 1.6
> release, and solved the problem by switching to
> 'MPI_Type_Create_Hindexed' for the datatype.  The critical difference is
> that the displacements for 'MPI_type_indexed' are type integer, i.e. 32
> bit values, while for 'MPI_Type_Create_Hindexed' the displacements are
> in bytes, but with type 'MPI_Address_Kind', i.e. normally 64 bit, and
> therefore of effectively infinite size.  Otherwise the 2 'types' can be
> used identically.
>
> T. Rosmond
>
>
> On Thu, 2015-03-05 at 12:31 -0500, George Bosilca wrote:
> > Bogdan,
> >
> >
> > As far as I can tell your code is correct, and the problem is coming
> > from Open MPI. More specifically, I used alloca in the optimization
> > stage in MPI_Type_commit, and as your arrays of length were too large,
> > alloca failed and lead to a segfault. I fixed in the trunk (3c489ea),
> > and this will get into our next release.
> >
> >
> > Unfortunately there is no fix for the 1.6 that I can think of.
> > Apparently, you are really the first that run into such kind of
> > problems, so guess you are the first creating gigantic datatypes.
> >
> >
> > Thanks for the bug report,
> >   George.
> >
> >
> >
> > On Thu, Mar 5, 2015 at 9:09 AM, Bogdan Sataric
> > <bogdan.sata...@gmail.com> wrote:
> >         I've been having problems with my 3D matrix transpose program.
> >         I'm using MPI_Type_indexed in order to allign specific blocks
> >         that I want to send and receive across one or multiple nodes
> >         of a cluster. Up to few days ago I was able to run my program
> >         without any errors. However several test cases on the cluster
> >         in last few days exposed segmentation fault when I try to form
> >         indexed type on some specific matrix configurations.
> >
> >
> >
> >         The code that forms indexed type is as follows:
> >
> >
> >         #include <stdio.h>
> >         #include <stdlib.h>
> >         #include <mpi.h>
> >
> >
> >         int main(int argc, char **argv) {
> >
> >
> >             int Nx = 800;
> >             int Ny = 640;
> >             int Nz = 480;
> >             int gsize;
> >             int i, j;
> >
> >
> >             MPI_Init(&argc, &argv);
> >             MPI_Comm_size(MPI_COMM_WORLD, &gsize);
> >
> >
> >             printf("GSIZE: %d\n", gsize);
> >
> >
> >             MPI_Datatype double_complex_type;
> >             MPI_Datatype block_send_complex_type;
> >
> >
> >             int * send_displ = (int *) malloc(Nx * Ny/gsize *
> >         sizeof(int));
> >             int * send_blocklen = (int *) malloc(Nx * Ny/gsize *
> >         sizeof(int));
> >
> >
> >             MPI_Type_contiguous(2, MPI_DOUBLE, &double_complex_type);
> >             MPI_Type_commit(&double_complex_type);
> >
> >
> >             for (i = Ny/gsize - 1; i >= 0 ; i--) {
> >                 for (j = 0; j < Nx; j++) {
> >                     send_displ[(Ny/gsize - 1 - i) * Nx + j] = i * Nz +
> >         j * Ny * Nz;
> >                     send_blocklen[(Ny/gsize - 1 - i) * Nx + j] = Nz;
> >
> >                 }
> >             }
> >
> >
> >             MPI_Type_indexed(Nx * Ny/gsize, send_blocklen, send_displ,
> >         double_complex_type, &block_send_complex_type);
> >             MPI_Type_commit(&block_send_complex_type);
> >
> >
> >             free(send_displ);
> >             free(send_blocklen);
> >
> >
> >             MPI_Finalize();
> >         }
> >
> >
> >         Values of the Nx, Ny and Nz respectively are 800, 640 and 480.
> >         Value of gsize for this test was 1 (simulation of MPI program
> >         on 1 node). The node has 32GB of RAM and no other memory has
> >         been allocated (only this code has been run).
> >
> >
> >         In code basically I'm creating double_complex_type to
> >         represent complex number (2 contiguous MPI_DOUBLE) values. The
> >         whole matrix has 800 * 640 * 480 of these values and I'm
> >         trying to catch these values in the indexed type. One indexed
> >         type block length is the whole Nz "rod" while ordering of
> >         these "rods" in displacements array is given by the formula i
> >         * Nz + j * Ny * Nz. Basically displacements start from top
> >         row, and left column of the 3D matrix. Then I gradually sweep
> >         to the right sight of that top row, then go to one row below
> >         sweep to the right side and so on until the bottom row.
> >
> >
> >         The strange thing is that this formula and algorithm WORK if I
> >         put MPI_DOUBLE type instead of derived complex type (1 instead
> >         of 2 in MPI_TYPE_CONTIGIOUS). Also this formula WORKS if I put
> >         1 for Nz dimension instead of 480. However if I change Nz to
> >         even 2 I get segmentation fault error in the MPI_Type_commit
> >         call.
> >
> >
> >         I checked all of the displacements and they seem fine. There
> >         is no overlapping of displacements or going under 0 or over
> >         extent of the formed indexed type. Also the size of the
> >         datatype is below 4GB (which is I believe limit of MPI
> >         datatypes (since MPI_Type_size function returns int * ). Also
> >         I believe amount of memory is not an issue as even if I put Nz
> >         to be 2, I get the same segmentation fault error, and the node
> >         has 32GB of RAM just for this test.
> >
> >
> >         What bothers me is that most of other indexed type
> >         configurations (with plain MPI_DOUBLE type elements), or
> >         complex type with smaller matrix (say 400 * 640 * 480) WORK
> >         without segmentation fault. Also If I commit the indexed type
> >         with MPI_DOUBLE type elements even larger matrices work (say
> >         960 x 800 x 640) which have exactly the same type size as 800
> >         x 640 x 480 complex indexed type (just under 4GB)! So
> >         basically the type size is not an issue here, but somehow
> >         either number of blocks, size of particular blocks, or size of
> >         block elements create problems. I'm not sure weather there is
> >         problem in implementation of OpenMPI or something in my code
> >         is wrong...
> >
> >
> >         I would greatly appreciate any help as I've been stuck on this
> >         problem for days now and nothing in MPI documentation and the
> >         examples I found on the internet is giving me a clue where the
> >         error might be.
> >
> >
> >         At the end I would like to say that code has been compiled
> >         with Open-MPI version 1.6.5.
> >
> >
> >         Thank you,
> >
> >
> >         Bogdan Sataric
> >         ----
> >
> >
> >         Bogdan Sataric
> >
> >
> >
> >         email: bogdan.sata...@gmail.com
> >         phone: +381 21-485-2441
> >
> >
> >         Teaching & Research Assistant
> >         Chair for Applied Computer Science
> >         Faculty of Technical Sciences, Novi Sad, Serbia
> >
> >
> >         _______________________________________________
> >         users mailing list
> >         us...@open-mpi.org
> >         Subscription:
> >         http://www.open-mpi.org/mailman/listinfo.cgi/users
> >         Link to this post:
> >         http://www.open-mpi.org/community/lists/users/2015/03/26430.php
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/03/26431.php
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/03/26432.php
>

Re: [OMPI users] Segmentation fault with MPI_Type_indexed

Reply via email to