Let me explain a little bit. The stack is used to efficiently parse the data-type description. The 32K limit it's not a limit for the number of predefined MPI types in the data-type, but a limit for the number of different data descriptions (a description is like a vector of a predefined type). As an example an MPI_Type_struct with count 10 will use 11 entries. So in order to overload this data description one has to use an MPI_Type_struct with a count bigger than 32K (which might be the case with the BOOST library you're using in your code).
In conclusion if your data-type description contain more than 32K entries, the current implementation will definitively not work for you. How many entries are in your data-type description ? There is an easy way to figure out if this is the problem with your code. Attaching gdb to your process and setting a break in the ompi_generic_simple_pack function is the first step. Once there, doing in gdb "call ompi_ddt_dump(pData)" will print a high level description of the data as represented internally in Open MPI. If you can provide the output of this call I can tell you in few seconds if this is the real issue or not.
However, this raise another question about the performance you expect from your code. A data description with more than 32K items, cannot be efficiently optimized by any automatic data-type engine. Moreover, it cannot be easily parsed. I suggest that if it's possible to identify access patterns that are repetitive, one should use them in order to improve the data-type description.
Thanks, george. On Apr 18, 2007, at 4:16 PM, Michael Gauckler wrote:
Dear Open-MPI Developers,investigations on the segmentation fault (see previous postings "Signal: Segmentation fault (11) Problem") lets us suspect that Open-MPI allows only a limited number of elements in the description of user-defined MPI_Datatypes.Our application segmentation-faults when a large user-defined data structure is passed to MPI_Send.The segmentation fault happens in the function ompi_generic_simple_pack in datatype_pack.c when trying to access pElem (Bad address). The structure pElem is set in line 276, where it is retrieved as276: pElem = &(description[pos_desc]);pos_desc is of type uint32_t with the value 0xffff929f (4294939295), which itself is set on line 271 by a variable of type int16_t and value -1. This leads to the indexing of the description structure at position -1, producing the segmentation fault. The origin of the pos_desc can be faund in the same function at line 271:271: pos_desc = pStack->index;The structure to which pStack is pointing is of type dt_stack, defined in ompi/datatype/convertor.h starting at line 65, where index is and int16_t and commented with “index in the element description”:typedef struct dt_stack { int16_t index; /**< index in the element description */int16_t type; /**< the type used for the last pack/unpack (original or DT_BYTE) */size_t count; /**< number of times we still have to do it */ptrdiff_t disp; /**< actual displacement depending on the count field */} dt_stack_t;We therefore conclude that MPI_Datatypes, which are constructed with Open-MPI (in the release of 1.2.1a of April 10th 2007) have the limitation of containing a maximum of 32’768 separate entries.Although changing the type of the index to int32_t solves the problem of the segmentation fault, I would be happy if the author / maintainer of the code could have a look at it and decide if this is viable fix. Having spent a lot of time in hunting down the issue into the Open-MPI code, I would be glad to see the issue fixed in upcoming releases.Thanx and regards, Michael Gauckler _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature