Hello Jakub, Thank you very much for giving me a detailed instruction. I spent yesterday studying the information over. I have few questions, and have also attached a work-in-progress patch to confirm that I am heading the right direction.
On Tue, Jul 28, 2020 at 6:26 AM Jakub Jelinek <ja...@redhat.com> wrote: > On Mon, Jul 27, 2020 at 01:59:01PM -0400, y2s1982 via Gcc-patches wrote: > > I do know you have said this several times, and I thought I understood > it, > > but it seems I am wrong each time. I just want to clarify my > understanding > > and what I had intended on doing on this and would like further > explanation > > on what you would like to see happen more specifically so that I may make > > less mistakes. > > > > My assumption in this patch was that, as the > ompd_callback_symbol_addr_fn_t > > callback function takes 2 context addresses and string query as input, > and > > ompd_address_t address as an output, I should give it: > > - the context addresses, which I assumed would contain a single data in > > compact form generated from the tool's side having size, offset, and > > address information, > > - and a string, which is basically a query that needs to be interpreted > > by the callback function to determine which part of the data it needs to > > use to generate to the returning pointer. > > The standard of course can't go into details on what exactly it is, because > it is heavily dependent on what object format is used, etc. > But for ELF, the intent is that the symbol addr callback does pretty much > what dlsym does, i.e. perform a (dynamic) symbol lookup like the dynamic > linker normally performs. It likely can't use the dynamic linker directly, > for running processes that would mean having to perform an inferior call to > the libdl library, and for core files there is no running process to do it > in, but it knows all the currently loaded libraries, their search order and > so can in that order go one library by one and search their dynamic symbol > tables (if you do nm -D on a library or readelf -Ws, you'll see what is in > there), relocate the addresses in there for shared libraries depending on > their load bias (difference between the addresses they are loaded at and > linked to) and return those. If filename is supplied, then it would > perform > the lookup primarily in the given shared library and only then fallback to > others. > > > I wasn't sure what role the filename input played in this. > > This seemed to fit what you meant by having a single compact data that > > contains all the information while resolving my ongoing mystery as to > what > > the callback was expecting to have as the string identifying the symbol. > To > > further clarify, I assumed the callback would do string manipulation and > > comparison to identify which information is being requested, refer to the > > single data contained in appropriate context, and return the address > > information. > > > > I am more than willing to scrap my bad idea for a better one. I am > > sincerely interested in learning better ways to implement and improve > > myself. I just need to know more specifics of the design, particularly: > > - where the compact data is stored (I assumed context, which means it > might > > not be defined in the library side, but should it be in the handle or > > global to library?), > > - how the information necessary for the compact data is gathered (if it > is > > done on the library side, should I just use the ompd_callback_sizeof_fn_t > > to obtain primitive sizes, and from there, just build the offset > > information for each variable members? How about variable address?) > > - how the ompd_callback_symbol_addr_fn_t callback function would likely > use > > it given its arguments (input of 2 contexts, 1 file name, 1 string to > > output 1 address), > > - what are the expected strings for the callback that would correspond to > > each variable (here, I assume, the types would be gomp_thread, > > gompd_thread_pool, gomp_team, etc.) or each member of said variables, > > (these, at least I expected, should be documented and agreed on between > > library and tool), > > among other things. > > So, there are multiple things you might need to know about the libgomp > shared library's internal ABI in order to implement the OMPD library that > can handle it. > One thing are addresses of internal variables. > > Say if OMPD wants to query the value of some ICV, those have different > scopes, some of them are per-process, others per-device, others per-thread, > others per-task. Say the cancel-var is per process and stored in > gomp_cancel_var which is a bool variable (in env.c). > If you look for that symbol: > readelf -Ws libgomp.so.1 | grep gomp_cancel_var > 478: 0000000000041704 1 OBJECT LOCAL DEFAULT 24 gomp_cancel_var > you'll see that there is such a symbol, but it is (intentionally) not > exported and not present in .dynsym, so the symbol lookup callback can't > find it. > So, if OMPD needs to get at the address of that, it again should look at > the single exported variable with all the data and find in it the address > of the variable. So does that mean the library should have a global variable storing the compact data whose symbol resides in the .map file? > Now, addresses are the only thing which need extra care, > it is something where one needs the help of the assembler and linker. > One way to do it is to have > void *gomp_internal_addresses[] = { > &gomp_cancel_var, > ... > }; > table and export that. It is a fallback for when the target can't do > something better, but it will require dynamic relocations, so if possible, > it would be better to use something like: > ptrdiff_t gomp_internal_addr_offsets[] = { > (char *)&gomp_cancel_var - (char *)&gomp_internal_addr_offsets[0], > ... > }; > which doesn't need dynamic relocations. All other details, sizeof > something, offsetof (something, field), how many times one needs to > dereference something, or e.g. for the ICVs also the information whether > the ICV lives in a per-process variable (which), or is e.g. at some offset > in the gomp_thread or where exactly, is something I'd highly suggest > to initially just hardcode in the ompd source with some specific comment > above it, so that you can then very quickly find everything that initially > non-portably is hardcoded in libgompd.so.1 and thus e.g. it can't support > -m32 vs. -m64, and only after you get a better picture of what exactly you > need you start putting it into the compact data section. > ICV was something I was going to take a more thorough look after thread/parallel/task handles were taken care of. I previously looked over how the non-OMPD portion was handled, and also looked over how the LLVM team had done it, but I would need to dig further. Just to clarify, though, should I rely on ICV for the variable addresses? Does this mean I should use the ICV functions to fetch a compact data value and then use the compact data directly to fetch necessary variable information? > So, say you start with libgompd.so.1 sources doing: > /* INFO */ > addr += offsetof (struct blah, field); > and once you have big part of OMPD implemented, you go through everything > you have such comments on, and instead create ompd-info-generate.c > to contain some macros that in the end will store the offsetof value in the > output and then on the libgompd.so.1 part read that back. > > Jakub > > I made some changes to how I was coding the thread section, but I would like to know if this is the direction suitable for now. I have coded ompd_get_thread_in_parallel as a proof-of-concept and have attached a patch below. If this looks good enough, I will code the rest of the thread section with this in mind. Cheers, Tony Sim
From bf44daf4295aecf5cbb637106aad45b44e11aa64 Mon Sep 17 00:00:00 2001 From: y2s1982 <y2s1...@gmail.com> Date: Wed, 29 Jul 2020 14:50:05 -0400 Subject: [PATCH] OMPD: proof-of-concept This patch includes ompd_get_thread_in_parallel as a proof-of-concept, based on discussion regarding memory handling. 2020-07-29 Tony Sim <y2s1...@gmail.com> libgomp/ChangeLog: * Makefile.am (libgompd_la_SOURCE): Add ompd-thread.c. * Makefile.in: Regenerate * libgompd.h (ompd_thread_handle_t): Add definition. (ompd_parallel_handle_t): Add definition. * ompd-thread.c: New file. --- libgomp/Makefile.am | 2 +- libgomp/Makefile.in | 5 +-- libgomp/libgompd.h | 13 +++++++ libgomp/ompd-thread.c | 79 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 3 deletions(-) create mode 100644 libgomp/ompd-thread.c diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am index fe0a92122ea..0e6f2d1c830 100644 --- a/libgomp/Makefile.am +++ b/libgomp/Makefile.am @@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \ oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \ affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c -libgompd_la_SOURCES = ompd-lib.c ompd-proc.c +libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c include $(top_srcdir)/plugin/Makefrag.am diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in index f74d5f3ac8e..137ce35dfc1 100644 --- a/libgomp/Makefile.in +++ b/libgomp/Makefile.in @@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \ $(am__objects_1) libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS) libgompd_la_LIBADD = -am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo +am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-thread.lo libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS) AM_V_P = $(am__v_P_@AM_V@) am__v_P_ = $(am__v_P_@AM_DEFAULT_V@) @@ -593,7 +593,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \ oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \ affinity-fmt.c teams.c allocator.c oacc-profiling.c \ oacc-target.c $(am__append_4) -libgompd_la_SOURCES = ompd-lib.c ompd-proc.c +libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c # Nvidia PTX OpenACC plugin. @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION) @@ -819,6 +819,7 @@ distclean-compile: @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@ +@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-thread.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@ diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h index 495995e00d3..f6e8987c655 100644 --- a/libgomp/libgompd.h +++ b/libgomp/libgompd.h @@ -30,6 +30,7 @@ #define LIBGOMPD_H 1 #include "omp-tools.h" +#include <string.h> #define ompd_stringify(x) ompd_str2(x) #define ompd_str2(x) #x @@ -47,4 +48,16 @@ typedef struct _ompd_aspace_handle { ompd_size_t ref_count; } ompd_address_space_handle_t; +typedef struct _ompd_thread_handle { + ompd_parallel_handle_t *ph; + ompd_address_space_handle_t *ah; + ompd_thread_context_t *tcontext; + ompd_address_t *thread; +} ompd_thread_handle_t; + +typedef struct _ompd_parallel_handle { + ompd_address_space_handle_t *ah; + ompd_address_t *tpool; +} ompd_parallel_handle_t; + #endif /* LIBGOMPD_H */ diff --git a/libgomp/ompd-thread.c b/libgomp/ompd-thread.c new file mode 100644 index 00000000000..0c695d4b8b1 --- /dev/null +++ b/libgomp/ompd-thread.c @@ -0,0 +1,79 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + Contributed by Yoosuk Sim <y2s1...@gmail.com>. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* This file contains OMPD function definitions related to threads, as defined + in the section 5.5.5 in the OpenMP API Specification 5.0. */ + +#include "omp-tools.h" +#include "ompd-types.h" +#include "libgompd.h" +#include "libgomp.h" +#include <stddef.h> + +ompd_rc_t +ompd_get_thread_in_parallel (ompd_parallel_handle_t *ph, int thread_num, + ompd_thread_handle_t **th) +{ + ompd_rc_t ret; + int max_size = 0; + // TODO: Using ICV, obtain max_size from ompd-team-size-var + if ( thread_num < 0 || thread_num > max_size) + return ompd_rc_bad_input; + + ompd_device_type_sizes_t type_size; + ompd_address_t threadAddr; + ompd_address_t threadAddrPtr = *ph->tpool; + + // TODO: fix hardcoded offset + threadAddrPtr.address += offsetof (struct gomp_thread_pool, threads); + + ret = gompd_callbacks.sizeof_type (ph->ah->context, &type_size); + if (ret != ompd_rc_ok) + return ret; + + ret = gompd_callbacks.read_memory (ph->ah->context, NULL, &threadAddrPtr, + type_size.sizeof_pointer, + (void *) &threadAddr.address); + if (ret != ompd_rc_ok) + return ret; + threadAddr.address += thread_num * type_size.sizeof_pointer; + + ompd_thread_handle_t *p; + ret = gompd_callbacks.alloc_memory (sizeof (ompd_thread_handle_t), + (void *) p); + if (ret != ompd_rc_ok) + return ret; + ret = gompd_callbacks.alloc_memory (sizeof (ompd_address_t), + (void *) p->thread); + if (ret != ompd_rc_ok) + return ret; + + p->ph = ph; + p->ah = ph->ah; + *p->thread = threadAddr; + *th = p; + + return ompd_rc_ok; +} -- 2.27.0