Issue 134451
Summary [OpenMP][OMPT] `initial_device_num` on `initialize` callback set incorrectly
Labels new issue
Assignees
Reporter Thyre
    We've recently released Score-P v9.0, having support for the OpenMP target via the OpenMP Tools Interface for the first time.

However, a user quickly noticed an issue, which boils down to a bug in LLVM.
When the OpenMP runtime initializes a connected tool, the following callback will be dispatched:

```c
typedef int (*ompt_initialize_t) (ompt_function_lookup_t lookup, int initial_device_num, ompt_data_t *tool_data);
```

`initial_device_num` is defined as "the value that a call to omp_get_initial_device would return". This itself is defined as being "the value of the device number is the value of omp_initial_device or the value returned by the omp_get_num_devices routine". 
Looking at the actual returned values however, this is not the case. Instead, LLVM always reports `initial_device_num = 0`. Since LLVM also does not implement `ompt_get_num_devices` correctly 
https://github.com/llvm/llvm-project/blob/19e0233eb844e653a3108de411366bd0165cf3ec/openmp/runtime/src/ompt-general.cpp#L868
a tool cannot safely determine if a device identifier in some target callback is actually the host.

LLVM typically uses `-1` for this, but tools should not have to rely on guessing this, as it might or might not change in later versions. This also makes supporting different runtimes more complicated.

---

To reproduce the issue:

```c
#include <omp.h>

#include <omp-tools.h>
#include <stdlib.h>
#include <assert.h>
#include <inttypes.h>
#include <string.h>

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdatomic.h>

/* MAIN */

int main( int argc, char** argv )
{
#pragma omp target
    {
        printf( "Hello from target region\n" );
 }
}

/* OMPT INTERFACE */

#if ( defined( __ppc64__ ) || defined( __powerpc64__ ) || defined( __PPC64__ ) )
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#elif ( defined( __x86_64 ) || defined( __x86_64__ ) || defined( __amd64 ) || defined( _M_X64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "pause" )
#elif ( defined( __aarch64__ ) || defined( __ARM64__ ) || defined( _M_ARM64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "yield" )
#else
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#endif


/* Test-and-test-and-set lock. Mutexes are of type bool */
#define OMPT_TOOL_LOCK( MUTEX ) \
    while ( true ) \
    { \
        if ( atomic_flag_test_and_set_explicit( &( MUTEX ), memory_order_acquire ) != true ) \
        { \
            break; \
        } \
 OMPT_TOOL_CPU_RELAX; \
    }
#define OMPT_TOOL_UNLOCK( MUTEX ) atomic_flag_clear_explicit( &( MUTEX ), memory_order_release );

#define OMPT_TOOL_GUARDED_PRINTF( ... ) \
    OMPT_TOOL_LOCK( ompt_tool_printf_mutex ) \
    printf( __VA_ARGS__ ); \
 OMPT_TOOL_UNLOCK( ompt_tool_printf_mutex )


_Thread_local int32_t ompt_tool_tid          = -1; /* thread counter. >= 1 after thread_begin */
atomic_flag           ompt_tool_printf_mutex = ATOMIC_FLAG_INIT;


static int
my_initialize_tool( ompt_function_lookup_t lookup,
                    int                    initial_device_num,
 ompt_data_t*           tool_data )
{
 OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | initial_device_num %d\n",
                              __FUNCTION__,
 ompt_tool_tid,
                              initial_device_num );
 return 1; /* non-zero indicates success */
}

static void
my_finalize_tool( ompt_data_t* tool_data )
{
 OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 "\n",
 __FUNCTION__,
                              ompt_tool_tid );
}

ompt_start_tool_result_t*
ompt_start_tool( unsigned int omp_version,
                 const char*  runtime_version )
{
    setbuf( stdout, NULL );
    OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | omp_version %d | runtime_version = \'%s\'\n",
 __FUNCTION__,
                              ompt_tool_tid,
 omp_version,
                              runtime_version );
    static ompt_start_tool_result_t tool = { &my_initialize_tool,
 &my_finalize_tool,
 ompt_data_none };
    return &tool;
}
```

Building the reproducer on a system with at least one GPU (maybe `-fopenmp-targets=x86_64` works too), one will see the following result:

```console
$ clang -fopenmp --offload-arch=sm_75 test.c
$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
Hello from target region
[my_finalize_tool] tid = -1
```

As one can see `initial_device_num` is `0`. In my case, it should be either `-1` or `1`.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to