[OMPI users] Bug report: Message queues debugging not working

2015-06-01 Thread Alejandro

Dear OpenMPI users/developers,

We are experiencing a problem when debugging the message queues:

Summary: Message queues debugging broken on recent OpenMPI versions.

Affected OpenMPI versions: 1.8.3, 1.8.4 and 1.8.5 (at least).
The debug message queue library is not returning any pending messages. 
This does not happen with previous versions of OpenMPI, as some 
processes are blocked in MPI_Receive.


The function resolution (which message queue we want) is selected by an 
enum 'mqs_op_class':
Values are: mqs_pending_sends, mqs_pending_receives and 
mqs_unexpected_messages.


When setting the corresponding queue iterator (with 
"mqs_setup_operation_iterator"), the return value is zero (==mqs_ok), 
but when we try to read the list (with "mqs_next_operation"), the return 
value is always non-zero, so no pending messages are found.



Should I raise a bug report for this?

Thank you,

Alejandro Palencia
Allinea Software


[OMPI users] OpenMPI4 passive RMA not working properly on some systems.

2022-12-12 Thread Alejandro Fernández Fraga via users
Hello,

I'm running a MPI program which uses passive RMA to access shared arrays.

On some systems this program does not work as expected.
When working with several nodes, even though it produces the correct results, 
only the process with rank 0 (the one with the shared arrays on its local 
memory) is actually able to work on the shared arrays, which is an undesired 
behavior.
This has happened with OpenMPI4, in particular with OpenMPI4.0.5 and 
OpenMPI4.1.4.

However, when compiling and running using OpenMPI3 (in particular OpenMPI3.1.4) 
the program works as expected and all processes work on the shared structures.

In addition, when compiling OpenMPI4 to use verbs instead of UCX, the program 
will also works as expected.
Thus, we have concluded that there may be a problem regarding the use of UCX on 
OpenMPI.

About the system where I am working on:

 - Nodes on the system are connected through an InfiniBand FDR network.
 - I'm running g++ (GCC) 8.3.0 and different versions of OpenMPI, as stated 
previously.

I attach a sample code to help to reproduce the undesired behavior.
I also include the output of the test program (1) when behaving unpropertly and 
(2) when behaving propertly.

Can someone help me understand if there's a problem with the program or with 
OpenMPI and UCX?
Thanks a lot!

(1) Output behaving unpropertly:
+--+
Rank: 0 ||| Position: 0
Rank: 0 ||| Position: 1
Rank: 0 ||| Position: 2
...
Rank: 0 ||| Position: 85
Rank: 0 ||| Position: 86
Rank: 0 ||| Position: 87
...
Rank: 0 ||| Position: 1997
Rank: 0 ||| Position: 1998
Rank: 0 ||| Position: 1999
*
*
*
*
* Small correctness check *
Position 0
  ||| Input value: 0
  ||| Output value:0.00
  ||| Expected output: 0.00
...
Position 1999
  ||| Input value: 1999
  ||| Output value:4997.50
  ||| Expected output: 4997.50
*
*
*
*
* Accesses per process data *
Process 0 accesses: 2000
Process 1 accesses: 0
Process 2 accesses: 0
Process 3 accesses: 0
Process 4 accesses: 0
Process 5 accesses: 0
Process 6 accesses: 0
Process 7 accesses: 0
+--+



(2) Output behaving propertly:
+--+
Rank: 0 ||| Position: 7
Rank: 0 ||| Position: 8
Rank: 0 ||| Position: 9
...
Rank: 3 ||| Position: 24
Rank: 4 ||| Position: 28
Rank: 7 ||| Position: 19
...
Rank: 3 ||| Position: 1976
Rank: 2 ||| Position: 1985
Rank: 6 ||| Position: 1994
*
*
*
*
* Small correctness check *
Position 0
  ||| Input value: 0
  ||| Output value:0.00
  ||| Expected output: 0.00
...
Position 1999
  ||| Input value: 1999
  ||| Output value:4997.50
  ||| Expected output: 4997.50
*
*
*
*
* Accesses per process data *
Process 0 accesses: 425
Process 1 accesses: 226
Process 2 accesses: 222
Process 3 accesses: 226
Process 4 accesses: 228
Process 5 accesses: 227
Process 6 accesses: 222
Process 7 accesses: 224
+--+
#include 
#include 
#include 
#include 

using std::cout;
using std::endl;

// MPI added
#include 

#define MPI_RANK_0 0

#define MULT_FACTOR 2.5

static void
process_data(int *input_buffer,
double *output_buffer,
size_t BLOCK_SIZE){

for (size_t i = 0; i < BLOCK_SIZE; i++)
{
output_buffer[i] = (double) input_buffer[i] * MULT_FACTOR;
}

}


int
main(int argc, char **argv) {



int rank, number_of_processes;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &number_of_processes);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

const size_t VECTOR_SIZE = 2000;
const size_t MY_SIZE = rank ? 0 : VECTOR_SIZE;
int * main_input_buffer;
double * main_output_buffer;


// Rank 0 has the input data
if (rank == MPI_RANK_0){
MPI_Alloc_mem(VECTOR_SIZE * sizeof(int), MPI_INFO_NULL, &main_input_buffer);
MPI_Alloc_mem(VECTOR_SIZE * sizeof(double), MPI_INFO_NULL, &main_output_buffer);
for (size_t i = 0; i < VECTOR_SIZE; i++){
main_input_buffer[i] = (int)i;
}
}


// We will create a shared index to access shared data on Rank 0
// Also, we will share input and output buffers on P0
size_t * main_buffer_index;
MPI_Alloc_mem(1 * sizeof(size_t), MPI_INFO_NULL, &main_buffer_index);
*main_buffer_index = 0;
MPI_Barrier(MPI_COMM_WORLD);

MPI_Win index_window, input_window, output_window;

MPI_Win_create(main_buffer_index, 1 * sizeof(size_t), sizeof(size_t),
MPI_INFO_NULL, MPI_COMM_WORLD, &index_window);

MPI_Win_create(main_input_buffer, 
MY_SIZE * sizeof(int), 
sizeof(int),
MPI_INFO_NULL,