Hello Jakub,

Thank you very much for giving me a detailed instruction. I spent yesterday
studying the information over. I have few questions, and have also attached
a work-in-progress patch to confirm that I am heading the right direction.

On Tue, Jul 28, 2020 at 6:26 AM Jakub Jelinek <ja...@redhat.com> wrote:

> On Mon, Jul 27, 2020 at 01:59:01PM -0400, y2s1982 via Gcc-patches wrote:
> > I do know you have said this several times, and I thought I understood
> it,
> > but it seems I am wrong each time. I just want to clarify my
> understanding
> > and what I had intended on doing on this and would like further
> explanation
> > on what you would like to see happen more specifically so that I may make
> > less mistakes.
> >
> > My assumption in this patch was that, as the
> ompd_callback_symbol_addr_fn_t
> > callback function takes 2 context addresses and string query as input,
> and
> > ompd_address_t address as an output, I should give it:
> >   - the context addresses, which I assumed would contain a single data in
> > compact form generated from the tool's side having size, offset, and
> > address information,
> >   - and a string, which is basically a query that needs to be interpreted
> > by the callback function to determine which part of the data it needs to
> > use to generate to the returning pointer.
>
> The standard of course can't go into details on what exactly it is, because
> it is heavily dependent on what object format is used, etc.
> But for ELF, the intent is that the symbol addr callback does pretty much
> what dlsym does, i.e. perform a (dynamic) symbol lookup like the dynamic
> linker normally performs.  It likely can't use the dynamic linker directly,
> for running processes that would mean having to perform an inferior call to
> the libdl library, and for core files there is no running process to do it
> in, but it knows all the currently loaded libraries, their search order and
> so can in that order go one library by one and search their dynamic symbol
> tables (if you do nm -D on a library or readelf -Ws, you'll see what is in
> there), relocate the addresses in there for shared libraries depending on
> their load bias (difference between the addresses they are loaded at and
> linked to) and return those.  If filename is supplied, then it would
> perform
> the lookup primarily in the given shared library and only then fallback to
> others.
>
> > I wasn't sure what role the filename input played in this.
> > This seemed to fit what you meant by having a single compact data that
> > contains all the information while resolving my ongoing mystery as to
> what
> > the callback was expecting to have as the string identifying the symbol.
> To
> > further clarify, I assumed the callback would do string manipulation and
> > comparison to identify which information is being requested, refer to the
> > single data contained in appropriate context, and return the address
> > information.
> >
> > I am more than willing to scrap my bad idea for a better one. I am
> > sincerely interested in learning better ways to implement and improve
> > myself. I just need to know more specifics of the design, particularly:
> > - where the compact data is stored (I assumed context, which means it
> might
> > not be defined in the library side, but should it be in the handle or
> > global to library?),
> > - how the information necessary for the compact data is gathered (if it
> is
> > done on the library side, should I just use the ompd_callback_sizeof_fn_t
> > to obtain primitive sizes, and from there, just build the offset
> > information for each variable members? How about variable address?)
> > - how the ompd_callback_symbol_addr_fn_t callback function would likely
> use
> > it given its arguments (input of 2 contexts, 1 file name, 1 string to
> > output 1 address),
> > - what are the expected strings for the callback that would correspond to
> > each variable (here, I assume, the types would be gomp_thread,
> > gompd_thread_pool, gomp_team, etc.) or each member of said variables,
> > (these, at least I expected, should be documented and agreed on between
> > library and tool),
> > among other things.
>
> So, there are multiple things you might need to know about the libgomp
> shared library's internal ABI in order to implement the OMPD library that
> can handle it.
> One thing are addresses of internal variables.
>
> Say if OMPD wants to query the value of some ICV, those have different
> scopes, some of them are per-process, others per-device, others per-thread,
> others per-task.  Say the cancel-var is per process and stored in
> gomp_cancel_var which is a bool variable (in env.c).
> If you look for that symbol:
> readelf -Ws libgomp.so.1 | grep gomp_cancel_var
>    478: 0000000000041704     1 OBJECT  LOCAL  DEFAULT   24 gomp_cancel_var
> you'll see that there is such a symbol, but it is (intentionally) not
> exported and not present in .dynsym, so the symbol lookup callback can't
> find it.
> So, if OMPD needs to get at the address of that, it again should look at
> the single exported variable with all the data and find in it the address
> of the variable.


So does that mean the library should have a global variable storing the
compact data whose symbol resides in the .map file?


> Now, addresses are the only thing which need extra care,
> it is something where one needs the help of the assembler and linker.
> One way to do it is to have
> void *gomp_internal_addresses[] = {
>   &gomp_cancel_var,
>   ...
> };
> table and export that.  It is a fallback for when the target can't do
> something better, but it will require dynamic relocations, so if possible,
> it would be better to use something like:
> ptrdiff_t gomp_internal_addr_offsets[] = {
>   (char *)&gomp_cancel_var - (char *)&gomp_internal_addr_offsets[0],
>   ...
> };
> which doesn't need dynamic relocations.  All other details, sizeof
> something, offsetof (something, field), how many times one needs to
> dereference something, or e.g. for the ICVs also the information whether
> the ICV lives in a per-process variable (which), or is e.g. at some offset
> in the gomp_thread or where exactly, is something I'd highly suggest
> to initially just hardcode in the ompd source with some specific comment
> above it, so that you can then very quickly find everything that initially
> non-portably is hardcoded in libgompd.so.1 and thus e.g. it can't support
> -m32 vs. -m64, and only after you get a better picture of what exactly you
> need you start putting it into the compact data section.
>

ICV was something I was going to take a more thorough look after
thread/parallel/task handles were taken care of.  I previously looked over
how the non-OMPD portion was handled, and also looked over how the LLVM
team had done it, but I would need to dig further. Just to clarify, though,
should I rely on ICV for the variable addresses? Does this mean I should
use the ICV functions to fetch a compact data value and then use the
compact data directly to fetch necessary variable information?


> So, say you start with libgompd.so.1 sources doing:
>           /* INFO */
>   addr += offsetof (struct blah, field);
> and once you have big part of OMPD implemented, you go through everything
> you have such comments on, and instead create ompd-info-generate.c
> to contain some macros that in the end will store the offsetof value in the
> output and then on the libgompd.so.1 part read that back.
>
>         Jakub
>
>
I made some changes to how I was coding the thread section, but I would
like to know if this is the direction suitable for now. I have coded
ompd_get_thread_in_parallel as a proof-of-concept and have attached a patch
below. If this looks good enough, I will code the rest of the thread
section with this in mind.

Cheers,

Tony Sim
From bf44daf4295aecf5cbb637106aad45b44e11aa64 Mon Sep 17 00:00:00 2001
From: y2s1982 <y2s1...@gmail.com>
Date: Wed, 29 Jul 2020 14:50:05 -0400
Subject: [PATCH] OMPD: proof-of-concept

This patch includes ompd_get_thread_in_parallel as a proof-of-concept,
based on discussion regarding memory handling.

2020-07-29  Tony Sim  <y2s1...@gmail.com>

libgomp/ChangeLog:

	* Makefile.am (libgompd_la_SOURCE): Add ompd-thread.c.
	* Makefile.in: Regenerate
	* libgompd.h (ompd_thread_handle_t): Add definition.
	(ompd_parallel_handle_t): Add definition.
	* ompd-thread.c: New file.

---
 libgomp/Makefile.am   |  2 +-
 libgomp/Makefile.in   |  5 +--
 libgomp/libgompd.h    | 13 +++++++
 libgomp/ompd-thread.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/ompd-thread.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index fe0a92122ea..0e6f2d1c830 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
 	oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index f74d5f3ac8e..137ce35dfc1 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
 	$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-thread.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -593,7 +593,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
 	oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -819,6 +819,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-thread.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 495995e00d3..f6e8987c655 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -30,6 +30,7 @@
 #define LIBGOMPD_H 1
 
 #include "omp-tools.h"
+#include <string.h>
 
 #define ompd_stringify(x) ompd_str2(x)
 #define ompd_str2(x) #x
@@ -47,4 +48,16 @@ typedef struct _ompd_aspace_handle {
   ompd_size_t ref_count;
 } ompd_address_space_handle_t;
 
+typedef struct _ompd_thread_handle {
+  ompd_parallel_handle_t *ph;
+  ompd_address_space_handle_t *ah;
+  ompd_thread_context_t *tcontext;
+  ompd_address_t *thread;
+} ompd_thread_handle_t;
+
+typedef struct _ompd_parallel_handle {
+  ompd_address_space_handle_t *ah;
+  ompd_address_t *tpool;
+} ompd_parallel_handle_t;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-thread.c b/libgomp/ompd-thread.c
new file mode 100644
index 00000000000..0c695d4b8b1
--- /dev/null
+++ b/libgomp/ompd-thread.c
@@ -0,0 +1,79 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Yoosuk Sim <y2s1...@gmail.com>.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains OMPD function definitions related to threads, as defined
+   in the section 5.5.5 in the OpenMP API Specification 5.0.  */
+
+#include "omp-tools.h"
+#include "ompd-types.h"
+#include "libgompd.h"
+#include "libgomp.h"
+#include <stddef.h>
+
+ompd_rc_t
+ompd_get_thread_in_parallel (ompd_parallel_handle_t *ph, int thread_num,
+				      ompd_thread_handle_t **th)
+{
+  ompd_rc_t ret;
+  int max_size = 0;
+  // TODO: Using ICV, obtain max_size from ompd-team-size-var
+  if ( thread_num < 0 || thread_num > max_size)
+    return ompd_rc_bad_input;
+
+  ompd_device_type_sizes_t type_size;
+  ompd_address_t threadAddr;
+  ompd_address_t threadAddrPtr = *ph->tpool;
+
+  // TODO: fix hardcoded offset
+  threadAddrPtr.address += offsetof (struct gomp_thread_pool, threads);
+
+  ret = gompd_callbacks.sizeof_type (ph->ah->context, &type_size);
+  if (ret != ompd_rc_ok)
+    return ret;
+
+  ret = gompd_callbacks.read_memory (ph->ah->context, NULL, &threadAddrPtr,
+				     type_size.sizeof_pointer,
+				     (void *) &threadAddr.address);
+  if (ret != ompd_rc_ok)
+    return ret;
+  threadAddr.address += thread_num * type_size.sizeof_pointer;
+
+  ompd_thread_handle_t *p;
+  ret = gompd_callbacks.alloc_memory (sizeof (ompd_thread_handle_t),
+					      (void *) p);
+  if (ret != ompd_rc_ok)
+    return ret;
+  ret = gompd_callbacks.alloc_memory (sizeof (ompd_address_t),
+					      (void *) p->thread);
+  if (ret != ompd_rc_ok)
+    return ret;
+
+  p->ph = ph;
+  p->ah = ph->ah;
+  *p->thread = threadAddr;
+  *th = p;
+
+  return ompd_rc_ok;
+}
-- 
2.27.0

Reply via email to