On Tue, Oct 27, 2020 at 12:55:36PM +0000, Vitaly Slobodskoy wrote: > I'd like to propose instrumentation of GCC OpenMP runtime with ITT API > (https://github.com/intel/ittapi) like it was already done for LLVM > (https://github.com/llvm/llvm-project/tree/master/openmp/runtime/src/thirdparty/ittnotify) > to enable dedicated OpenMP support within the tools like Intel VTune > (https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/openmp-code-analysis-method.html) > and others. This would finally enable "Serial Time", "Parallel Time", > "Imbalance Time" metrics and would allow performance tools to focus on serial > or parallel execution separately. > > In order to optimize OpenMP workloads, it is quite important to have a > dedicated performance analysis tool familiar with the OpenMP runtime > specifics. The typical OpenMP performance issues are: > - Not all the performance-critical code is parallel, serial execution > significantly affects scaling (Amdahl's law) > - Work balance is not good, not all the cores doing useful work > - Overhead on synchronization, scheduling, threads creation
So, first thing is that this brings in quite large code that clearly was written by somebody else, so the most important question is what is the upstream repo for it, what is the license, for the steering committee the question is if we can allow it in the GCC codebase or whether instead if the library is configured in certain way we just shouldn't require users to have that library installed instead. If it is included, we need a process of updates from the upstream repo being synced into GCC tree. > Performance analysis tool should be able to distinguish serial vs parallel > execution. Imbalance within the parallel region can hardly be calculated > without dedicated runtime support. > > ITT is a lightweight API for source-based instrumentation. Open-source part > is simply a set of APIs and single .c file for loading dynamic ITT library > (so-called ITT collector, can be easily created by anyone). In order to > enable tracing, target application needs to be launched under the > "INTEL_LIBITTNOTIFY64=<collector>" environment variable. Otherwise all the > ITT calls would do nothing without causing any noticeable runtime overhead. > > This proposal adds new "--disable-itt-instrumentation" configure option which > completely disables (removes) all the tracing. The tracing is ON by default. > OpenMP Imbalance time calculation is not included in this patch. Second thing, making this on by default is a very bad idea. Most people will not need it and it will just slow things down. Also, OpenMP 5.0 adds OMPT support which is exactly meant for tracing, wouldn't it be better to add OMPT support and then add an ITT plugin as one of the perhaps multiple users of OMPT? > diff --git a/libgomp/parallel.c b/libgomp/parallel.c > index 2fe4f573a32..bb42a71b2db 100644 > --- a/libgomp/parallel.c > +++ b/libgomp/parallel.c > @@ -28,6 +28,10 @@ > #include "libgomp.h" > #include <limits.h> > > +#ifdef ENABLE_ITT_INSTRUMENTATION > +#include "ittnotify.h" > +#endif > + > > /* Determine the number of threads to be launched for a PARALLEL construct. > This algorithm is explicitly described in OpenMP 3.0 section 2.4.1. > @@ -168,15 +172,50 @@ GOMP_parallel_end (void) > } > ialias (GOMP_parallel_end) > > +#ifdef ENABLE_ITT_INSTRUMENTATION > +static __itt_domain* s_gomp_parallel_domain = NULL; > +#endif Code in libgomp proper needs to follow the GCC Coding Conventions. So, e.g. space before * rather than after it. > +#ifdef ENABLE_ITT_INSTRUMENTATION > + if (__itt_frame_submit_v3_ptr) > + { { indented 2 spaces more than the if, and the body another 2. > + if (!s_gomp_parallel_domain) > + { > + s_gomp_parallel_domain = __itt_domain_create("$omp$parallel"); > + __itt_thread_set_name("OMP Master Thread"); Spaces before ( in calls. > + __itt_frame_submit_v3(s_gomp_parallel_domain, NULL, > + parallel_region_begin_ts, parallel_region_end_ts); The arguments should be aligned (i.e. parallel_region_begin_ts below s_gomp_parallel_domain). Also, all the itt specific variable names should have itt somewhere in it. > +#ifdef ENABLE_ITT_INSTRUMENTATION > + char thread_name[30]; > +#endif This isn't acceptable. Can't you instead just gomp_alloca the space for the string in the conditional block? > #if defined HAVE_TLS || defined USE_EMUTLS > thr = &gomp_tls_data; > @@ -95,6 +103,14 @@ gomp_thread_start (void *xdata) > > thr->ts.team->ordered_release[thr->ts.team_id] = &thr->release; > > +#ifdef ENABLE_ITT_INSTRUMENTATION > + if (__itt_frame_submit_v3_ptr) > + { > + snprintf(thread_name, 30, "OMP Worker Thread #%d", data->ts.team_id); > + __itt_thread_set_name(thread_name); > + } Also, this doesn't really deal with nested parallelism or host teams etc. Jakub