Ah - cool! Thanks! On Jan 19, 2013, at 7:19 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> On Jan 19, 2013, at 15:44 , Ralph Castain <r...@open-mpi.org> wrote: > >> I used your test code to confirm it also fails on our trunk - it looks like >> someone got the reference count wrong when creating/destructing groups. > > No, the code is not MPI compliant. > > The culprit is line 254 in the test code where Siegmar manually copied the > group_comm_world into group_worker. This is correct as long as you remember > that group_worker is not directly an MPI generated group, and as a result you > are not allowed to free it. > > Now if you replace the: > > group_worker = group_comm_world > > by an MPI operation that create a copy of the original group such as > > MPI_Comm_group (MPI_COMM_WORLD, &group_worker); > > your code become MPI valid, and works without any issue in Open MPI. > > George. > > >> >> Afraid I'll have to defer to the authors of that code area... >> >> >> On Jan 19, 2013, at 1:27 AM, Siegmar Gross >> <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi >>> >>> I have installed openmpi-1.6.4rc2 and have the following problem. >>> >>> tyr strided_vector 110 ompi_info | grep "Open MPI:" >>> Open MPI: 1.6.4rc2r27861 >>> tyr strided_vector 111 mpicc -showme >>> gcc -I/usr/local/openmpi-1.6.4_64_gcc/include -fexceptions -pthread -m64 >>> -L/usr/local/openmpi-1.6.4_64_gcc/lib64 -lmpi -lm -lkstat -llgrp -lsocket >>> -lnsl >>> -lrt -lm >>> >>> >>> tyr strided_vector 112 mpiexec -np 4 data_type_4 >>> Process 2 of 4 running on tyr.informatik.hs-fulda.de >>> Process 0 of 4 running on tyr.informatik.hs-fulda.de >>> Process 3 of 4 running on tyr.informatik.hs-fulda.de >>> Process 1 of 4 running on tyr.informatik.hs-fulda.de >>> >>> original matrix: >>> >>> 1 2 3 4 5 6 7 8 9 10 >>> 11 12 13 14 15 16 17 18 19 20 >>> 21 22 23 24 25 26 27 28 29 30 >>> 31 32 33 34 35 36 37 38 39 40 >>> 41 42 43 44 45 46 47 48 49 50 >>> 51 52 53 54 55 56 57 58 59 60 >>> >>> result matrix: >>> elements are sqared in columns: >>> 0 1 2 6 7 >>> elements are multiplied with 2 in columns: >>> 3 4 5 8 9 >>> >>> 1 4 9 8 10 12 49 64 18 20 >>> 121 144 169 28 30 32 289 324 38 40 >>> 441 484 529 48 50 52 729 784 58 60 >>> 961 1024 1089 68 70 72 1369 1444 78 80 >>> 1681 1764 1849 88 90 92 2209 2304 98 100 >>> 2601 2704 2809 108 110 112 3249 3364 118 120 >>> >>> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) >>> (comm->c_remote_group) >>> )->obj_magic_id, file >>> ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c >>> , line 412 >>> [tyr:18578] *** Process received signal *** >>> [tyr:18578] Signal: Abort (6) >>> [tyr:18578] Signal code: (-1) >>> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) >>> (comm->c_remote_group) >>> )->obj_magic_id, file >>> ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c >>> , line 412 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:opal_backtr >>> ace_print+0x20 >>> [tyr:18580] *** Process received signal *** >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0x2c1bc4 >>> [tyr:18580] Signal: Abort (6) >>> [tyr:18580] Signal code: (-1) >>> /lib/sparcv9/libc.so.1:0xd88a4 >>> /lib/sparcv9/libc.so.1:0xcc418 >>> /lib/sparcv9/libc.so.1:0xcc624 >>> /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)] >>> /lib/sparcv9/libc.so.1:abort+0xd0 >>> /lib/sparcv9/libc.so.1:_assert+0x74 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa4c58 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa2430 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_comm_f >>> inalize+0x168 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_mpi_fi >>> nalize+0xa60 >>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:MPI_Finaliz >>> e+0x90 >>> /home/fd1026/SunOS/sparc/bin/data_type_4:main+0x588 >>> /home/fd1026/SunOS/sparc/bin/data_type_4:_start+0x7c >>> [tyr:18578] *** End of error message *** >>> ... >>> >>> >>> >>> Everything works fine with LAM-MPI (even in a heterogeneous environment >>> with little-endian and big-endian machines) so that it is probably an >>> error in Open MPI (but you never know). >>> >>> >>> tyr strided_vector 125 mpicc -showme >>> gcc -I/usr/local/lam-6.5.9_64_gcc/include -L/usr/local/lam-6.5.9_64_gcc/lib >>> -llamf77mpi -lmpi -llam -lsocket -lnsl >>> tyr strided_vector 126 lamboot -v hosts.lam-mpi >>> >>> LAM 6.5.9/MPI 2 C++ - Indiana University >>> >>> Executing hboot on n0 (tyr.informatik.hs-fulda.de - 2 CPUs)... >>> Executing hboot on n1 (sunpc1.informatik.hs-fulda.de - 4 CPUs)... >>> topology done >>> >>> tyr strided_vector 127 mpirun -v app_data_type_4.lam-mpi >>> 22894 data_type_4 running on local >>> 22895 data_type_4 running on n0 (o) >>> 21998 data_type_4 running on n1 >>> 22896 data_type_4 running on n0 (o) >>> Process 1 of 4 running on tyr.informatik.hs-fulda.de >>> Process 3 of 4 running on tyr.informatik.hs-fulda.de >>> Process 2 of 4 running on sunpc1 >>> Process 0 of 4 running on tyr.informatik.hs-fulda.de >>> >>> original matrix: >>> >>> 1 2 3 4 5 6 7 8 9 10 >>> 11 12 13 14 15 16 17 18 19 20 >>> 21 22 23 24 25 26 27 28 29 30 >>> 31 32 33 34 35 36 37 38 39 40 >>> 41 42 43 44 45 46 47 48 49 50 >>> 51 52 53 54 55 56 57 58 59 60 >>> >>> result matrix: >>> elements are sqared in columns: >>> 0 1 2 6 7 >>> elements are multiplied with 2 in columns: >>> 3 4 5 8 9 >>> >>> 1 4 9 8 10 12 49 64 18 20 >>> 121 144 169 28 30 32 289 324 38 40 >>> 441 484 529 48 50 52 729 784 58 60 >>> 961 1024 1089 68 70 72 1369 1444 78 80 >>> 1681 1764 1849 88 90 92 2209 2304 98 100 >>> 2601 2704 2809 108 110 112 3249 3364 118 120 >>> >>> tyr strided_vector 128 lamhalt >>> >>> LAM 6.5.9/MPI 2 C++ - Indiana University >>> >>> >>> >>> I would be grateful, if somebody could fix the problem. Thank you >>> very much for any help in advance. >>> >>> >>> Kind regards >>> >>> Siegmar >>> /* The program demonstrates how to set up and use a strided vector. >>> * The process with rank 0 creates a matrix. The columns of the >>> * matrix will then be distributed with a collective communication >>> * operation to all processes. Each process performs an operation on >>> * all column elements. Afterwards the results are collected in the >>> * source matrix overwriting the original column elements. >>> * >>> * The program uses between one and n processes to change the values >>> * of the column elements if the matrix has n columns. If you start >>> * the program with one process it has to work on all n columns alone >>> * and if you start it with n processes each process modifies the >>> * values of one column. Every process must know how many columns it >>> * has to modify so that it can allocate enough buffer space for its >>> * column block. Therefore the process with rank 0 computes the >>> * numbers of columns for each process in the array "num_columns" and >>> * distributes this array with MPI_Broadcast to all processes. Each >>> * process can now allocate memory for its column block. There is >>> * still one task to do before the columns of the matrix can be >>> * distributed with MPI_Scatterv: The size of every column block and >>> * the offset of every column block must be computed und stored in >>> * the arrays "sr_counts" and "sr_disps". >>> * >>> * An MPI data type is defined by its size, its contents, and its >>> * extent. When multiple elements of the same size are used in a >>> * contiguous manner (e.g. in a "scatter" operation or an operation >>> * with "count" greater than one) the extent is used to compute where >>> * the next element will start. The extent for a derived data type is >>> * as big as the size of the derived data type so that the first >>> * elements of the second structure will start after the last element >>> * of the first structure, i.e., you have to "resize" the new data >>> * type if you want to send it multiple times (count > 1) or to >>> * scatter/gather it to many processes. Restrict the extent of the >>> * derived data type for a strided vector in such a way that it looks >>> * like just one element if it is used with "count > 1" or in a >>> * scatter/gather operation. >>> * >>> * This version constructs a new column type (strided vector) with >>> * "MPI_Type_vector" and uses collective communication. The new >>> * data type knows the number of elements within one column and the >>> * spacing between two column elements. The program uses at most >>> * n processes if the matrix has n columns, i.e. depending on the >>> * number of processes each process receives between 1 and n columns. >>> * You can execute this program with an arbitrary number of processes >>> * because it creates its own group with "num_worker" (<= n) processes >>> * to perform the work if the matrix has n columns and the basic group >>> * contains too many processes. >>> * >>> * >>> * Compiling: >>> * Store executable(s) into local directory. >>> * mpicc -o <program name> <source code file name> >>> * >>> * Store executable(s) into predefined directories. >>> * make >>> * >>> * Make program(s) automatically on all specified hosts. You must >>> * edit the file "make_compile" and specify your host names before >>> * you execute it. >>> * make_compile >>> * >>> * Running: >>> * LAM-MPI: >>> * mpiexec -boot -np <number of processes> <program name> >>> * or >>> * mpiexec -boot \ >>> * -host <hostname> -np <number of processes> <program name> : \ >>> * -host <hostname> -np <number of processes> <program name> >>> * or >>> * mpiexec -boot [-v] -configfile <application file> >>> * or >>> * lamboot [-v] [<host file>] >>> * mpiexec -np <number of processes> <program name> >>> * or >>> * mpiexec [-v] -configfile <application file> >>> * lamhalt >>> * >>> * OpenMPI: >>> * "host1", "host2", and so on can all have the same name, >>> * if you want to start a virtual computer with some virtual >>> * cpu's on the local host. The name "localhost" is allowed >>> * as well. >>> * >>> * mpiexec -np <number of processes> <program name> >>> * or >>> * mpiexec --host <host1,host2,...> \ >>> * -np <number of processes> <program name> >>> * or >>> * mpiexec -hostfile <hostfile name> \ >>> * -np <number of processes> <program name> >>> * or >>> * mpiexec -app <application file> >>> * >>> * Cleaning: >>> * local computer: >>> * rm <program name> >>> * or >>> * make clean_all >>> * on all specified computers (you must edit the file "make_clean_all" >>> * and specify your host names before you execute it. >>> * make_clean_all >>> * >>> * >>> * File: data_type_4.c Author: S. Gross >>> * Date: 30.08.2012 >>> * >>> */ >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include "mpi.h" >>> >>> #define P 6 /* # of rows >>> */ >>> #define Q 10 /* # of columns */ >>> #define FACTOR 2 /* multiplicator for col. elem. >>> */ >>> #define DEF_NUM_WORKER Q /* # of workers, must be <= Q >>> */ >>> >>> /* define macro to test the result of a "malloc" operation */ >>> #define TestEqualsNULL(val) \ >>> if (val == NULL) \ >>> { \ >>> fprintf (stderr, "file: %s line %d: Couldn't allocate memory.\n", \ >>> __FILE__, __LINE__); \ >>> exit (EXIT_FAILURE); \ >>> } >>> >>> /* define macro to determine the minimum of two values >>> */ >>> #define MIN(a,b) ((a) < (b) ? (a) : (b)) >>> >>> >>> static void print_matrix (int p, int q, double **mat); >>> >>> >>> int main (int argc, char *argv[]) >>> { >>> int ntasks, /* number of parallel tasks */ >>> mytid, /* my task id >>> */ >>> namelen, /* length of processor name */ >>> i, j, /* loop variables */ >>> *num_columns, /* # of columns in column block */ >>> *sr_counts, /* send/receive counts */ >>> *sr_disps, /* send/receive displacements */ >>> tmp, tmp1; /* temporary values */ >>> double matrix[P][Q], >>> **col_block; /* column block of matrix */ >>> char processor_name[MPI_MAX_PROCESSOR_NAME]; >>> MPI_Datatype column_t, /* column type (strided vector) >>> */ >>> col_block_t, >>> tmp_column_t; /* needed to resize the extent */ >>> MPI_Group group_comm_world, /* processes in "basic group" */ >>> group_worker, /* processes in new groups */ >>> group_other; >>> MPI_Comm COMM_WORKER, /* communicators for new groups */ >>> COMM_OTHER; >>> int num_worker, /* # of worker in "group_worker"*/ >>> *group_w_mem, /* array of worker members */ >>> group_w_ntasks, /* # of tasks in "group_worker" */ >>> group_o_ntasks, /* # of tasks in "group_other" */ >>> group_w_mytid, /* my task id in "group_worker" */ >>> group_o_mytid, /* my task id in "group_other" */ >>> *universe_size_ptr, /* ptr to # of "virtual cpu's" */ >>> universe_size_flag; /* true if available */ >>> >>> MPI_Init (&argc, &argv); >>> MPI_Comm_rank (MPI_COMM_WORLD, &mytid); >>> MPI_Comm_size (MPI_COMM_WORLD, &ntasks); >>> /* Determine the correct number of processes for this program. If >>> * there are more than Q processes (i.e., more processes than >>> * columns) available, we split the "basic group" into two groups. >>> * This program uses a group "group_worker" to do the real work >>> * and a group "group_other" for the remaining processes of the >>> * "basic group". The latter have nothing to do and can terminate >>> * immediately. If there are less than or equal to Q processes >>> * available all processes belong to group "group_worker" and group >>> * "group_other" is empty. At first we find out which processes >>> * belong to the "basic group". >>> */ >>> MPI_Comm_group (MPI_COMM_WORLD, &group_comm_world); >>> if (ntasks > Q) >>> { >>> /* There are too many processes, so that we must build a new group >>> * with "num_worker" processes. "num_worker" will be the minimum of >>> * DEF_NUM_WORKER and the "universe size" if it is supported by the >>> * MPI implementation. At first we must check if DEF_NUM_WORKER has >>> * a suitable value. >>> */ >>> if (DEF_NUM_WORKER > Q) >>> { >>> if (mytid == 0) >>> { >>> fprintf (stderr, "\nError:\tInternal program error.\n" >>> "\tConstant DEF_NUM_WORKER has value %d but must be\n" >>> "\tlower than or equal to %d. Please change source\n" >>> "\tcode and compile the program again.\n\n", >>> DEF_NUM_WORKER, Q); >>> } >>> MPI_Group_free (&group_comm_world); >>> MPI_Finalize (); >>> exit (EXIT_FAILURE); >>> } >>> /* determine the universe size, set "num_worker" in an >>> * appropriate way, and allocate memory for the array containing >>> * the ranks of the members of the new group >>> */ >>> MPI_Comm_get_attr (MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, >>> &universe_size_ptr, &universe_size_flag); >>> if ((universe_size_flag != 0) && (*universe_size_ptr > 0)) >>> { >>> num_worker = MIN (DEF_NUM_WORKER, *universe_size_ptr); >>> } >>> else >>> { >>> num_worker = DEF_NUM_WORKER; >>> } >>> group_w_mem = (int *) malloc (num_worker * sizeof (int)); >>> TestEqualsNULL (group_w_mem); /* test if memory was available */ >>> if (mytid == 0) >>> { >>> printf ("\nYou have started %d processes but I need at most " >>> "%d processes.\n" >>> "The universe contains %d \"virtual cpu's\" (\"0\" means " >>> "not supported).\n" >>> "I build a new worker group with %d processes. The " >>> "processes with\n" >>> "the following ranks in the basic group belong to " >>> "the new group:\n ", >>> ntasks, Q, *universe_size_ptr, num_worker); >>> } >>> for (i = 0; i < num_worker; ++i) >>> { >>> /* fetch some ranks from the basic group for the new worker >>> * group, e.g. the last num_worker ranks to demonstrate that >>> * a process may have different ranks in different groups >>> */ >>> group_w_mem[i] = (ntasks - num_worker) + i; >>> if (mytid == 0) >>> { >>> printf ("%d ", group_w_mem[i]); >>> } >>> } >>> if (mytid == 0) >>> { >>> printf ("\n\n"); >>> } >>> /* Create group "group_worker" */ >>> MPI_Group_incl (group_comm_world, num_worker, group_w_mem, >>> &group_worker); >>> free (group_w_mem); >>> } >>> else >>> { >>> /* there are at most as many processes as columns in our matrix, >>> * i.e., we can use the "basic group" >>> */ >>> group_worker = group_comm_world; >>> } >>> /* Create group "group_other" which demonstrates only how to use >>> * another group operation and which has nothing to do in this >>> * program. >>> */ >>> MPI_Group_difference (group_comm_world, group_worker, >>> &group_other); >>> MPI_Group_free (&group_comm_world); >>> /* Create communicators for both groups. The communicator is only >>> * defined for all processes of the group and it is undefined >>> * (MPI_COMM_NULL) for all other processes. >>> */ >>> MPI_Comm_create (MPI_COMM_WORLD, group_worker, &COMM_WORKER); >>> MPI_Comm_create (MPI_COMM_WORLD, group_other, &COMM_OTHER); >>> >>> >>> /* ========================================================= >>> * ====== ====== >>> * ====== Supply work for all different groups. ====== >>> * ====== ====== >>> * ====== ====== >>> * ====== At first you must find out if a process ====== >>> * ====== belongs to a special group. You can use ====== >>> * ====== MPI_Group_rank for this purpose. It returns ====== >>> * ====== the rank of the calling process in the ====== >>> * ====== specified group or MPI_UNDEFINED if the ====== >>> * ====== calling process is not a member of the ====== >>> * ====== group. ====== >>> * ====== ====== >>> * ========================================================= >>> */ >>> >>> >>> /* ========================================================= >>> * ====== This is the group "group_worker". ====== >>> * ========================================================= >>> */ >>> MPI_Group_rank (group_worker, &group_w_mytid); >>> if (group_w_mytid != MPI_UNDEFINED) >>> { >>> MPI_Comm_size (COMM_WORKER, &group_w_ntasks); /* # of processes */ >>> /* Now let's start with the real work */ >>> MPI_Get_processor_name (processor_name, &namelen); >>> /* With the next statement every process executing this code will >>> * print one line on the display. It may happen that the lines will >>> * get mixed up because the display is a critical section. In general >>> * only one process (mostly the process with rank 0) will print on >>> * the display and all other processes will send their messages to >>> * this process. Nevertheless for debugging purposes (or to >>> * demonstrate that it is possible) it may be useful if every >>> * process prints itself. >>> */ >>> fprintf (stdout, "Process %d of %d running on %s\n", >>> group_w_mytid, group_w_ntasks, processor_name); >>> fflush (stdout); >>> MPI_Barrier (COMM_WORKER); /* wait for all other processes */ >>> >>> /* Build the new type for a strided vector and resize the extent >>> * of the new datatype in such a way that the extent of the whole >>> * column looks like just one element so that the next column >>> * starts in matrix[0][i] in MPI_Scatterv/MPI_Gatherv. >>> */ >>> MPI_Type_vector (P, 1, Q, MPI_DOUBLE, &tmp_column_t); >>> MPI_Type_create_resized (tmp_column_t, 0, sizeof (double), >>> &column_t); >>> MPI_Type_commit (&column_t); >>> MPI_Type_free (&tmp_column_t); >>> if (group_w_mytid == 0) >>> { >>> tmp = 1; >>> for (i = 0; i < P; ++i) /* initialize matrix */ >>> { >>> for (j = 0; j < Q; ++j) >>> { >>> matrix[i][j] = tmp++; >>> } >>> } >>> printf ("\n\noriginal matrix:\n\n"); >>> print_matrix (P, Q, (double **) matrix); >>> } >>> /* allocate memory for array containing the number of columns of a >>> * column block for each process >>> */ >>> num_columns = (int *) malloc (group_w_ntasks * sizeof (int)); >>> TestEqualsNULL (num_columns); /* test if memory was available */ >>> >>> /* do an unnecessary initialization to make the GNU compiler happy >>> * so that you won't get a warning about the use of a possibly >>> * uninitialized variable >>> */ >>> sr_counts = NULL; >>> sr_disps = NULL; >>> if (group_w_mytid == 0) >>> { >>> /* allocate memory for arrays containing the size and >>> * displacement of each column block >>> */ >>> sr_counts = (int *) malloc (group_w_ntasks * sizeof (int)); >>> TestEqualsNULL (sr_counts); >>> sr_disps = (int *) malloc (group_w_ntasks * sizeof (int)); >>> TestEqualsNULL (sr_disps); >>> /* compute number of columns in column block for each process */ >>> tmp = Q / group_w_ntasks; >>> for (i = 0; i < group_w_ntasks; ++i) >>> { >>> num_columns[i] = tmp; /* number of columns */ >>> } >>> for (i = 0; i < (Q % group_w_ntasks); ++i) /* adjust size */ >>> { >>> num_columns[i]++; >>> } >>> for (i = 0; i < group_w_ntasks; ++i) >>> { >>> /* nothing to do because "column_t" contains already all >>> * elements of a column, i.e., the "size" is equal to the >>> * number of columns in the block >>> */ >>> sr_counts[i] = num_columns[i]; /* "size" of column-block */ >>> } >>> sr_disps[0] = 0; /* start of i-th column-block */ >>> for (i = 1; i < group_w_ntasks; ++i) >>> { >>> sr_disps[i] = sr_disps[i - 1] + sr_counts[i - 1]; >>> } >>> } >>> /* inform all processes about their column block sizes */ >>> MPI_Bcast (num_columns, group_w_ntasks, MPI_INT, 0, COMM_WORKER); >>> /* allocate memory for a column block and define a new derived >>> * data type for the column block. This data type is possibly >>> * different for different processes if the number of processes >>> * isn't a factor of the row size of the original matrix. Don't >>> * forget to resize the extent of the new data type in such a >>> * way that the extent of the whole column looks like just one >>> * element so that the next column starts in col_block[0][i] >>> * in MPI_Scatterv/MPI_Gatherv. >>> */ >>> col_block = (double **) malloc (P * num_columns[group_w_mytid] * >>> sizeof (double)); >>> TestEqualsNULL (col_block); >>> MPI_Type_vector (P, 1, num_columns[group_w_mytid], MPI_DOUBLE, >>> &tmp_column_t); >>> MPI_Type_create_resized (tmp_column_t, 0, sizeof (double), >>> &col_block_t); >>> MPI_Type_commit (&col_block_t); >>> MPI_Type_free (&tmp_column_t); >>> /* send column block i of "matrix" to process i */ >>> MPI_Scatterv (matrix, sr_counts, sr_disps, column_t, >>> col_block, num_columns[group_w_mytid], >>> col_block_t, 0, COMM_WORKER); >>> /* Modify column elements. The compiler doesn't know the structure >>> * of the column block matrix so that you have to do the index >>> * calculations for mat[i][j] yourself. In C a matrix is stored >>> * row-by-row so that the i-th row starts at location "i * q" if >>> * the matrix has "q" columns. Therefore the address of mat[i][j] >>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j] >>> * itself as "*((double *) mat + i * q + j)". >>> */ >>> for (i = 0; i < P; ++i) >>> { >>> for (j = 0; j < num_columns[group_w_mytid]; ++j) >>> { >>> if ((group_w_mytid % 2) == 0) >>> { >>> /* col_block[i][j] *= col_block[i][j] */ >>> >>> *((double *) col_block + i * num_columns[group_w_mytid] + j) *= >>> *((double *) col_block + i * num_columns[group_w_mytid] + j); >>> } >>> else >>> { >>> /* col_block[i][j] *= FACTOR */ >>> >>> *((double *) col_block + i * num_columns[group_w_mytid] + j) *= >>> FACTOR; >>> } >>> } >>> } >>> /* receive column-block i of "matrix" from process i */ >>> MPI_Gatherv (col_block, num_columns[group_w_mytid], col_block_t, >>> matrix, sr_counts, sr_disps, column_t, >>> 0, COMM_WORKER); >>> if (group_w_mytid == 0) >>> { >>> printf ("\n\nresult matrix:\n" >>> " elements are sqared in columns:\n "); >>> tmp = 0; >>> tmp1 = 0; >>> for (i = 0; i < group_w_ntasks; ++i) >>> { >>> tmp1 = tmp1 + num_columns[i]; >>> if ((i % 2) == 0) >>> { >>> for (j = tmp; j < tmp1; ++j) >>> { >>> printf ("%4d", j); >>> } >>> } >>> tmp = tmp1; >>> } >>> printf ("\n elements are multiplied with %d in columns:\n ", >>> FACTOR); >>> tmp = 0; >>> tmp1 = 0; >>> for (i = 0; i < group_w_ntasks; ++i) >>> { >>> tmp1 = tmp1 + num_columns[i]; >>> if ((i % 2) != 0) >>> { >>> for (j = tmp; j < tmp1; ++j) >>> { >>> printf ("%4d", j); >>> } >>> } >>> tmp = tmp1; >>> } >>> printf ("\n\n\n"); >>> print_matrix (P, Q, (double **) matrix); >>> free (sr_counts); >>> free (sr_disps); >>> } >>> free (num_columns); >>> free (col_block); >>> MPI_Type_free (&column_t); >>> MPI_Type_free (&col_block_t); >>> MPI_Comm_free (&COMM_WORKER); >>> } >>> >>> >>> /* ========================================================= >>> * ====== This is the group "group_other". ====== >>> * ========================================================= >>> */ >>> MPI_Group_rank (group_other, &group_o_mytid); >>> if (group_o_mytid != MPI_UNDEFINED) >>> { >>> /* Nothing to do (only to demonstrate how to divide work for >>> * different groups). >>> */ >>> MPI_Comm_size (COMM_OTHER, &group_o_ntasks); >>> if (group_o_mytid == 0) >>> { >>> if (group_o_ntasks == 1) >>> { >>> printf ("\nGroup \"group_other\" contains %d process " >>> "which has\n" >>> "nothing to do.\n\n", group_o_ntasks); >>> } >>> else >>> { >>> printf ("\nGroup \"group_other\" contains %d processes " >>> "which have\n" >>> "nothing to do.\n\n", group_o_ntasks); >>> } >>> } >>> MPI_Comm_free (&COMM_OTHER); >>> } >>> >>> >>> /* ========================================================= >>> * ====== all groups will reach this point ====== >>> * ========================================================= >>> */ >>> MPI_Group_free (&group_worker); >>> MPI_Group_free (&group_other); >>> MPI_Finalize (); >>> return EXIT_SUCCESS; >>> } >>> >>> >>> /* Print the values of an arbitrary 2D-matrix of "double" values. The >>> * compiler doesn't know the structure of the matrix so that you have >>> * to do the index calculations for mat[i][j] yourself. In C a matrix >>> * is stored row-by-row so that the i-th row starts at location "i * q" >>> * if the matrix has "q" columns. Therefore the address of mat[i][j] >>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j] >>> * itself as "*((double *) mat + i * q + j)". >>> * >>> * input parameters: p number of rows >>> * q number of columns >>> * mat 2D-matrix of "double" values >>> * output parameters: none >>> * return value: none >>> * side effects: none >>> * >>> */ >>> void print_matrix (int p, int q, double **mat) >>> { >>> int i, j; /* loop variables */ >>> >>> for (i = 0; i < p; ++i) >>> { >>> for (j = 0; j < q; ++j) >>> { >>> printf ("%6g", *((double *) mat + i * q + j)); >>> } >>> printf ("\n"); >>> } >>> printf ("\n"); >>> } >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users