Re: [PATCH 2/3 v3] src/readelf.c: Support concurrency for -w, --debug-dump

2025-06-05 Thread Mark Wielaard
Hi Aaron,

On Tue, May 27, 2025 at 10:04:20AM -0400, Aaron Merey wrote:
> Implement concurrent execution of print_debug_* functions during handling
> of -w, --debug-dump using libthread.a.

Do you intend to also enable this for other sections?

> A new `-C, --concurrency=NUM` command line option controls the maximum
> number of threads that may be used. This value defaults to the number of CPUs.
> 
> Job output is buffered and printed in the order that jobs were added to
> the queue. This helps preserve the existing order of stdout output. Full
> support for output buffering in print_debug_* functions is added in the
> next patch in this series.

I would have expected 3/3 "src/readelf.c: Add support for
print_debug_* output buffering" before this patch. Does this patch
really work correctly before that is done?

>   * src/readelf.c (default_concurrency): Function estimating the
>   maximum number of threads.
>   (parse_opt): Handle -C, --concurrency=NUM.
>   (do_job): Entry point function for threads.
>   (schedule_job): If thread safety is enabled, add job to the
>   job queue.  Otherwise just run the job from the main thread.
>   (print_debug): Pass print_debug_* function pointers and
>   args to schedule_job. Also call run_jobs if thread safety
>   is enabled.
> 
> Signed-off-by: Aaron Merey 
> 
> ---
> v3:
> Replace assert before run_jobs with a conditional.
> 
>  src/readelf.c | 153 --
>  1 file changed, 149 insertions(+), 4 deletions(-)
> 
> diff --git a/src/readelf.c b/src/readelf.c
> index b7dba390..e3aece8b 100644
> --- a/src/readelf.c
> +++ b/src/readelf.c
> @@ -57,6 +57,18 @@
>  
>  #include "../libdw/known-dwarf.h"
>  
> +#ifdef USE_LOCKS
> +#include "threadlib.h"
> +#endif
> +
> +#ifdef HAVE_SCHED_H
> +#include 
> +#endif
> +
> +#ifdef HAVE_SYS_RESOURCE_H
> +#include 
> +#endif

OK, I see the following in configure.ac:

dnl for debuginfod concurrency heuristics
AC_CHECK_HEADERS([sched.h])
AC_CHECK_FUNCS([sched_getaffinity])
AC_CHECK_HEADERS([sys/resource.h])

>  #ifdef __linux__
>  #define CORE_SIGILL  SIGILL
>  #define CORE_SIGBUS  SIGBUS
> @@ -150,6 +162,10 @@ static const struct argp_option options[] =
>  N_("Ignored for compatibility (lines always wide)"), 0 },
>{ "decompress", 'z', NULL, 0,
>  N_("Show compression information for compressed sections (when used with 
> -S); decompress section before dumping data (when used with -p or -x)"), 0 },
> +#ifdef USE_LOCKS
> +  { "concurrency", 'C', "NUM", 0,
> +N_("Set maximum number of threads. Defaults to the number of CPUs."), 0 
> },
> +#endif
>{ NULL, 0, NULL, 0, NULL, 0 }
>  };

OK.
  
> @@ -249,6 +265,11 @@ static bool print_decompress = false;
>  /* True if we want to show split compile units for debug_info skeletons.  */
>  static bool show_split_units = false;
>  
> +#if USE_LOCKS
> +/* Maximum number of threads.  */
> +static int max_threads = -1;
> +#endif
> +

OK. Could also have been zero since -C will set it to 1 or higher.

>  /* Select printing of debugging sections.  */
>  static enum section_e
>  {
> @@ -380,6 +401,43 @@ cleanup_list (struct section_argument *list)
>  }
>  }
>  
> +#ifdef USE_LOCKS
> +/* Estimate the maximum number of threads. This is normally
> +   #CPU.  Return value is guaranteed to be at least 1.  */
> +static int
> +default_concurrency (void)
> +{
> +  unsigned aff = 0;
> +#ifdef HAVE_SCHED_GETAFFINITY
> +  {
> +int ret;
> +cpu_set_t mask;
> +CPU_ZERO (&mask);
> +ret = sched_getaffinity (0, sizeof(mask), &mask);
> +if (ret == 0)
> +  aff = CPU_COUNT (&mask);
> +  }
> +#endif

Is this better/worse than using get_nprocs or sysconf
(_SC_NPROCESSORS_ONLN)?
https://www.gnu.org/software/libc/manual/html_node/Processor-Resources.html

Also from the sched_getaffinity man page it looks like cpu_set_t might
be too small and the call might fail on systems with > 1024
cores. Maybe this is a non-issue though?

> +  unsigned fn = 0;
> +#ifdef HAVE_GETRLIMIT
> +  {
> +struct rlimit rlim;
> +int rc = getrlimit (RLIMIT_NOFILE, &rlim);
> +if (rc == 0)
> +  fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2);
> +/* Conservatively estimate that at least 2 fds are used
> +   by each thread.  */
> +  }
> +#endif

Interesting. I wouldn't have thought about that. But makes sense. If
you check available file descriptors, would it also make sense to
check for available memory? And limit the number of jobs to mem/1024M?

> +  unsigned d = MIN (MAX (aff, 2U),
> + MAX (fn, 2U));
> +
> +  return d;
> +}
> +#endif
> +
>  int
>  main (int argc, char *argv[])
>  {
> @@ -403,6 +461,12 @@ main (int argc, char *argv[])
>/* Before we start tell the ELF library which version we are using.  */
>elf_version (EV_CURRENT);
>  
> +#ifdef USE_LOCKS
> +  /* If concurrency wasn't set by argp_parse, then set a default value.  */
> +  if (max_threads == -1)
> +max_threads = d

Re: [PATCH 1/3 v2] src: Add threadlib library for parallel job execution

2025-06-05 Thread Mark Wielaard
Hi Aaron,

On Thu, 2025-05-22 at 18:28 -0400, Aaron Merey wrote:
> Add new internal static library libthread.a that provides infrastructure
> for eu-* tools to run functions concurrently using pthreads.
> 
> threadlib.c manages per-job threads as well as per-job buffers for stdout
> output.  Output for each job is printed to stdout in the order that the
> jobs were added to the job queue.  This helps preserve the order of
> output when parallelization is added to an eu-* tool.
> 
> threadlib.h declares functions add_job and run_jobs. Jobs are added to
> a threadlib.c internal job queue using add_job. run_jobs concurrently
> executes jobs in parallel.

It isn't stated anywhere explicitly but I think this means all jobs
must be added first with add_job and then when run_jobs will be called
no new jobs can be added (concurrently)? I assumed this while
reviewing, because if add_job can be called after run_jobs has already
started by head would explode :} Could you document this and/or maybe
even add an assert somewhere?

> eu-readelf now links against libthread.a when elfutils is configured
> with --enable-thread-safety.
> 
>   * src/Makefile.am: libthread.a is compiled and and linked with
>   readelf when USE_LOCKS is defined.
>   * src/threadlib.c: New file. Manages job creation, concurrent
>   execution and output handling.
>   * src/threadlib.h: New file. Declares functions add_job and
>   run_jobs.
> 
> Signed-off-by: Aaron Merey 
> 
> ---
> v2:
> Remove unnecessary thread_LDADD variable in src/Makefile.am. Minor comment
> rewordings.
> 
>  src/Makefile.am |  16 +++
>  src/threadlib.c | 252 
>  src/threadlib.h |  34 +++
>  3 files changed, 302 insertions(+)
>  create mode 100644 src/threadlib.c
>  create mode 100644 src/threadlib.h
> 
> diff --git a/src/Makefile.am b/src/Makefile.am
> index 4a3fb957..90f2d55f 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -52,6 +52,19 @@ libar.manifest: $(libar_a_OBJECTS)
>  MOSTLYCLEANFILES = *.gconv
>  CLEANFILES = $(bin_SCRIPTS) $(EXTRA_libar_a_DEPENDENCIES)
>  
> +if USE_LOCKS
> +noinst_LIBRARIES += libthread.a
> +
> +libthread_a_SOURCES = threadlib.c
> +
> +EXTRA_DIST += threadlib.h

I think this should be be added to noinst_HEADERS which isn't defined
yet in this Makefile.am, so it probably should be:

noinst_HEADERS = threadlib.h

Please double check with make distcheck and/or make rpmbuild

> +libthread.manifest: $(libthread_a_OBJECTS)
> + $(AM_V_GEN)echo $^ > $@
> +
> +CLEANFILES += $(EXTRA_libthread_a_DEPENDENCIES)
> +endif

EXTRA_libthread_a_DEPENDENCIES doesn't seem to be defined anywhere,
should it be defined first as:

EXTRA_libtreahd_a_DEPENDENCIES = libthread.manifest

?

> +
>  if BUILD_STATIC
>  libasm = ../libasm/libasm.a
>  libdw = ../libdw/libdw.a -lz $(zip_LIBS) $(libelf) -ldl -lpthread
> @@ -91,6 +104,9 @@ ar_no_Wstack_usage = yes
>  unstrip_no_Wstack_usage = yes
>  
>  readelf_LDADD = $(libdw) $(libebl) $(libelf) $(libeu) $(argp_LDADD)
> +if USE_LOCKS
> +readelf_LDADD += libthread.a
> +endif
>  nm_LDADD = $(libdw) $(libebl) $(libelf) $(libeu) $(argp_LDADD) 
> $(obstack_LIBS) \
>  $(demanglelib)
>  size_LDADD = $(libelf) $(libeu) $(argp_LDADD)

OK.

> diff --git a/src/threadlib.c b/src/threadlib.c
> new file mode 100644
> index ..ab158b69
> --- /dev/null
> +++ b/src/threadlib.c
> @@ -0,0 +1,252 @@
> +/* Functions for running jobs concurrently.
> +   Copyright (C) 2025 Red Hat, Inc.
> +   This file is part of elfutils.
> +
> +   This file is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   elfutils is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see .  */
> +
> +#ifdef HAVE_CONFIG_H
> +# include 
> +#endif
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "threadlib.h"
> +
> +/* Dynamic buffer for thread output.  */
> +typedef struct {
> +  size_t sizeloc;
> +  char *buf;
> +  FILE *file;
> +} output_stream_t;
> +
> +/* Allocate resources for STREAM.  */
> +static void
> +init_thread_output_stream (output_stream_t *stream)
> +{
> +  stream->buf = NULL;
> +  stream->sizeloc = 0;
> +  stream->file = open_memstream (&(stream->buf), &(stream->sizeloc));
> +
> +  if (stream->file == NULL)
> +error (1, 0, _("cannot open thread output stream"));
> +}
> +
> +/* Print and deallocate resources for STREAM.  */
> +static void
> +print_thread_output_stream (output_stream_t *stream)
> +{
>

[PATCH v2] doc: Update elf_begin.3

2025-06-05 Thread Aaron Merey
Signed-off-by: Aaron Merey 
---

v2: Address Mark's suggestions
https://patchwork.sourceware.org/project/elfutils/patch/20250603012245.411580-1-ame...@redhat.com/

doc/elf_begin.3 | 118 ++--
 1 file changed, 93 insertions(+), 25 deletions(-)

diff --git a/doc/elf_begin.3 b/doc/elf_begin.3
index 6a1d0c27..4d68562f 100644
--- a/doc/elf_begin.3
+++ b/doc/elf_begin.3
@@ -1,37 +1,105 @@
-.\" Modified Thu Sep 5 2017 by Ben Woodard 
-.\"
-.TH ELF_BEGIN 3 2017-09-05 "Libelf" "Libelf Programmer's Manual"
+.TH ELF_BEGIN 3 2025-06-02 "Libelf" "Libelf Programmer's Manual"
+
 .SH NAME
-elf_begin \- Return descriptor for ELF file.
-.nf
+elf_begin \- initialize an ELF descriptor
 .SH SYNOPSIS
-.B #include 
-.sp
-.BI "Elf *elf_begin (int " filedes ", Elf_Cmd " cmd ", Elf *" ref ");"
-.BI "Elf *elf_clone (int " filedes ", Elf_Cmd " cmd ");"
-.BI "int elf_end (Elf *" elf ");"
+.nf
+#include 
+
+Elf *elf_begin(int fildes, Elf_Cmd cmd, Elf *ref);
 .fi
 .SH DESCRIPTION
-The
-.BR elf_begin ()
+Initialize and return a handle to an ELF file for use with the elfutils
+\fBlibelf\fP library and related elfutils libraries such as \fBlibdw\fP.
+
+The returned \fBElf\fP descriptor must be released using \fBelf_end(3)\fP.
+
+\fBelf_version(3)\fP must be called before using any \fBlibelf\fP library
+including \fBelf_begin(3)\fP.
+
+.SH PARAMETERS
+.TP
+\fIfildes\fP
+A file descriptor referring to an ELF object. The descriptor should be open
+for reading, and optionally for writing, depending on the intended operation.
+If \fIref\fP is non-NULL, then \fIfildes\fP must either be -1 or be set to the
+same file descriptor as the one associated with \fIref\fP.
+.TP
+\fIcmd\fP
+Specifies the action to perform. Common values include:
+.RS
+.TP
+\fBELF_C_NULL\fP
+Return a NULL pointer instead of initializing an ELF descriptor.  Ignores
+\fIref\fP.
+.TP
+\fBELF_C_READ\fP
+Open an ELF descriptor for reading.
+.TP
+\fBELF_C_WRITE\fP
+Open an ELF descriptor for writing.  The descriptor initially refers to an
+empty file.
+.TP
+\fBELF_C_RDWR\fP
+Open an ELF descriptor for reading and writing.
+.TP
+\fBELF_C_READ_MMAP\fP
+Open an ELF descriptor for reading using mmap, if available.  The
+\fBELF_C_*_MMAP\fP commands are an elfutils libelf extension and may not be
+available in other libelf implementations.  Once the mmap size is set attempts
+to extend the size may fail.  Therefore, \fBELF_C_*_MMAP\fP commands tend to be
+more useful for in-place modifications or removal of data from an ELF
+descriptor.
+.TP
+\fBELF_C_WRITE_MMAP\fP
+Open an ELF descriptor for writing using mmap, if available.  The descriptor
+initially refers to an empty file.
+.TP
+\fBELF_C_RDWR_MMAP\fP
+Open an ELF descriptor for reading and writing using mmap, if available.
+.TP
+\fBELF_C_READ_MMAP_PRIVATE\fP
+Open an ELF descriptor for reading using mmap, if available.  This command
+invokes mmap with MAP_PRIVATE whereas the other \fB*_MMAP\fP commands invoke
+mmap with MAP_SHARED.  See \fBmmap(2)\fP for more information.
+.RE
+.TP
+\fIref\fP
+A reference to an existing \fBElf\fP descriptor. If not NULL, this will be
+used to duplicate a previous ELF descriptor.  The reference count associated
+with \fIref\fP will be incremented and \fBelf_end(3)\fP will need to be called
+an additional time to deallocate \fIref\fP.  \fIref\fP must have been opened
+with read/write permissions consistent with \fIcmd\fP.  If \fIref\fP is an AR
+file, then the ELF descriptor returned will be the first available object 
member
+of the archive (see \fBelf_next(3)\fP for more information).
+
 .SH RETURN VALUE
-.SH ERRORS
-elf_begin ELF_E_NO_VERSION ELF_E_INVALID_FILE ELF_E_INVALID_CMD ELF_E_NOMEM
-elf_clone ELF_E_NOMEM
-elf_end
+On success, \fBelf_begin()\fP returns a pointer to a new Elf descriptor.
+If \fIcmd\fP is \fBELF_C_NULL\fP then NULL is returned.  If \fIref\fP is
+non-NULL and isn't an AR file, then a copy of \fIref\fP is returned.  On
+failure, \fBelf_begin()\fP returns NULL and sets an internal error
+state that can be retrieved with \fBelf_errmsg(3)\fP.
+
+.SH SEE ALSO
+.BR mmap (2),
+.BR elf_clone (3),
+.BR elf_end (3),
+.BR elf_rand (3),
+.BR libelf (3),
+.BR elf (5)
+
 .SH ATTRIBUTES
-For an explanation of the terms used in this section, see
-.BR attributes (7).
 .TS
 allbox;
-lbw29 lb lb
+lbx lb lb
 l l l.
-Interface   Attribute   Value
+Interface  Attribute   Value
 T{
-.BR elf_begin (),
-.BR elf_clone (),
-.BR elf_end ()
-T}  Thread safety   MT-Safe
+.na
+.nh
+.BR elf_begin ()
+T} Thread safety   MT-Safe
 .TE
 
-.SH SEE ALSO
+.SH REPORTING BUGS
+Report bugs to  or 
https://sourceware.org/bugzilla/.
-- 
2.49.0



Re: [PATCH 1/3 v2] src: Add threadlib library for parallel job execution

2025-06-05 Thread Mark Wielaard
Hi Aaron,

Part 2...

On Thu, May 22, 2025 at 06:28:52PM -0400, Aaron Merey wrote:
> +typedef enum {
> +  /* pthread_create has not been called.  */
> +  NOT_STARTED,
> +
> +  /* pthread_create has been called.  */
> +  STARTED,
> +
> +  /* The thread has finished running the job but has not been joined.  */
> +  DONE,
> +
> +  /* pthread_join has been called.  */
> +  JOINED
> +} thread_state_t;

Do we have to think about the state of other possibly failed jobs?
Or does a failed job just call exit and that terminates everything?

> +struct job_t {
> +  /* A job consists of calling this function then printing any output
> + to stdout.  This function is run from thread_start_job, which also
> + initializes the FILE *.  */
> +  void *(*start_routine)(void *, FILE *);
> +
> +  /* Arg passed to start_routine.  */
> +  void *arg;
> +
> +  /* Thread to run start_routine.  */
> +  pthread_t thread;
> +
> +  /* See thread_state_t.  */
> +  _Atomic thread_state_t state;
> +
> +  /* Dynamic buffer for output generated during start_routine.
> + Contents will get printed to stdout when a job finishes.  */
> +  output_stream_t stream;
> +
> +  /* Next job in the linked list.  */
> +  struct job_t *next;
> +};

OK.

> +typedef struct {
> +  struct job_t *head;
> +  struct job_t *tail;
> +} job_queue_t;
> +
> +static job_queue_t jobs = { NULL, NULL };

OK. Maybe add what it means for head and/or tail to be NULL or not.
head is the first job to run next. tail is the last job to run?

> +void
> +add_job (void *(*start_routine)(void *, FILE *), void *arg)
> +{
> +  struct job_t *job = malloc (sizeof (struct job_t));
> +
> +  if (job == NULL)
> +error (1, 0, _("cannot create job"));
> +
> +  job->start_routine = start_routine;
> +  job->arg = arg;
> +  job->next = NULL;
> +  atomic_store (&job->state, NOT_STARTED);
> +
> +  /* Insert job into the job queue.  */
> +  if (jobs.head == NULL)
> +{
> +  assert (jobs.tail == NULL);
> +  jobs.head = job;
> +  jobs.tail = job;
> +}
> +  else
> +{
> +  assert (jobs.tail != NULL);
> +  jobs.tail->next = job;
> +  jobs.tail = job;
> +}
> +}

OK.

> +
> +/* Thread entry point.  */
> +static void *
> +thread_start_job (void *arg)
> +{
> +  struct job_t *job = (struct job_t *) arg;
> +
> +  init_thread_output_stream (&job->stream);
> +  job->start_routine (job->arg, job->stream.file);
> +  atomic_store (&job->state, DONE);
> +
> +  return NULL;
> +}

OK. Will be called when job->state has been set to STARTED.

> +/* Run all jobs that have been added to the job queue by add_job.  */
> +void
> +run_jobs (int max_threads)
> +{
> +  if (jobs.head == NULL)
> +{
> +  assert (jobs.tail == NULL);
> +  return;
> +}
> +  assert (jobs.tail != NULL);

OK. Immediately done if not jobs were ever added.

> +  /* jobs.tail was only needed to facilitate adding jobs.  */
> +  jobs.tail = NULL;
> +  int num_threads = 0;
> +
> +  /* Start no more than MAX_THREAD jobs.  */
> +  for (struct job_t *job = jobs.head;
> +   job != NULL && num_threads < max_threads;
> +   job = job->next)
> +{
> +  assert (job->start_routine != NULL);
> +  atomic_store (&job->state, STARTED);
> +
> +  if (pthread_create (&job->thread, NULL,
> +   thread_start_job, (void *) job) != 0)
> + error(1, 0, _("cannot create thread"));
> +  num_threads++;
> +}
> +
> +  int available_threads = max_threads - num_threads;
> +  assert (available_threads >= 0);
> +
> +  /* Iterate over the jobs until all have completed and all output has
> + been printed.  */
> +  while (jobs.head != NULL)
> +{
> +  /* Job output should be printed in the same order that the jobs
> +  were added.  Track whether there is at least one previous job
> + whose output hasn't been printed yet.  If true, then defer
> + printing for the current job.  */
> +  bool wait_to_print = false;
> +
> +  struct job_t *job = jobs.head;
> +  struct job_t *prev = NULL;
> +  while (job != NULL)
> + {
> +  /* Check whether this job should be started.  */
> +   if (atomic_load (&job->state) == NOT_STARTED)
> + {
> +   /* Start this job if there is an available thread.  */
> +   if (available_threads > 0)
> + {
> +   atomic_store (&job->state, STARTED);
> +   if (pthread_create (&job->thread, NULL,
> +   thread_start_job, (void *) job) != 0)
> + error (1, 0, _("cannot create thread"));
> +
> +   available_threads--;
> + }
> + }
> +
> +   /* Join thread if we haven't done so already.  */
> +   if (atomic_load (&job->state) == DONE)
> + {
> +   if (pthread_join (job->thread, NULL) != 0)
> + error (1, 0, _("cannot join thread"));
> +
> +   atomic_store (&job->state, JOINED);
> +   available_threads++;
> + }