Re: Storing package metadata in ELF objects

2021-05-14 Thread Guillem Jover
On Sat, 2021-04-10 at 13:38:31 +0100, Luca Boccassi wrote:
> On Sat, 2021-04-10 at 13:29 +0100, Luca Boccassi wrote:
> > After an initial discussion [0], recently we have been working on a new
> > specification [0] to encode rich package-level metadata inside ELF
> > objects, so that it can be included automatically in generated coredump
> > files. The prototype to parse this in systemd-coredump and store the
> > information in systemd-journal is ready for testing and merged
> > upstream. We are now seeking further comments/opinions/suggestions, as
> > we have a few months before the next release and thus there's plenty of
> > time to make incompatible changes to the format and implementation, if
> > required.

I've skimmed over the discussion at [0], and while having this data
seems like it might be "nice", I've to agree with the comments there
voicing that there does not really seem to be an actual need and the
overhead and additional work do not seem worth it, TBH, at least
in the Debian context.

> > The Fedora Wiki and the systemd.io document have more details, but to
> > make a long story short, a new .notes.package section with a JSON
> > payload will be included in ELF objects, encoding various package-
> > build-time information like distro name&version, package name&version,
> > etc.
> > 
> > To summarize from the discussion, the main reasons why we believe this
> > is useful are as following:
> > 
> > 1) minimal containers: the rpm database is not installed in the
> > containers. The information about build-ids needs to be stored
> > externally, so package name information is not available immediately,
> > but only after offline processing. The new note doesn't depend on the
> > rpm db in any way.

In the Debian context, the build-ids data is going to be available
in the affected executables, and in debug symbols packages and the
Packages metaindices listing them, so there's no need for access to
any local dpkg database. Given that someone needing to install debug
packages will need access to those indices (either with outgoing network
access or with a repository mirror), these can be queried at that time.
Not to mention that AFAIR the Debian debug symbol servers make it
possible to query for specific build-ids.

> > 2) handling of a core from a container, where the container and host
> > have different distros

How each distribution handles debug packages and debug symbols is
going to be different, so it seems there will be a need for explicit
handling of these, at which point the above mentioned querying can be
implemented as well, w/o the need to encode the packaging data inside
the executable.

> > 3) self-built and external packages: unless a lot of care is taken to
> > keep access to the debuginfo packages, this information may be lost.
> > The new note is available even if the repository metadata gets lost.
> > Users can easily provide equivalent information in a format that makes
> > sense in their own environment. It should work even when rpms and debs
> > and other formats are mixed, e.g. during container image creation.

I'm not sure I see the problem here. Either these self-built 3rd-party
packages are kept in repos that also provide the debug symbols
somewhere for all historically released versions or these will not be
accessible anyway. If they are, they can as well be located as per
above from the Packages metaindices, and even if the repository
metadata gets lost, as long as the debug symbol packages are present
(if they are not what's the point anyway) the build-ids can always be
re-scanned from them as they are part of the Build-Ids field in the
.deb control file.

> > Other than in Fedora, we are already making the required code changes
> > at Microsoft to use the same format&specification for internally-built
> > binaries, and for tools that parse core files and logs.
> > 
> > Tools for RPM and DEB (debhelper) integration are also available [3].

So, to conclude, I don't really see the point of this in the Debian
context. (Not to mention the problems with encoding binary versions
that might be wrong, and the busy work involved.)

> > [0] https://github.com/systemd/systemd/issues/18433
> > [1] https://systemd.io/COREDUMP_PACKAGE_METADATA/
> > [2] 
> > https://fedoraproject.org/wiki/Changes/Package_information_on_ELF_objects
> > [3] https://github.com/systemd/package-notes

Thanks,
Guillem


Re: [Bug debuginfod/27859] New: reused debuginfod_client objects don't clean out curl handles enough

2021-05-14 Thread Mark Wielaard
On Thu, May 13, 2021 at 01:26:42AM +, fche at redhat dot com via 
Elfutils-devel wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=27859
>
> In a sequence of queries on the same debuginfod_client, as long as
> they are all successful, things are fine.  Once there is a 404 error
> however, this appears to latch, and subsequent requests give 404
> whether or not they were resolvable by upstream.

Makes sense that curl remembers 404 results. Does that mean we need to
refresh the curl handle when a request is made for a negative cached
entry and cache_miss_s expires?


[Bug debuginfod/27859] reused debuginfod_client objects don't clean out curl handles enough

2021-05-14 Thread mark at klomp dot org via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=27859

--- Comment #1 from Mark Wielaard  ---
On Thu, May 13, 2021 at 01:26:42AM +, fche at redhat dot com via
Elfutils-devel wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=27859
>
> In a sequence of queries on the same debuginfod_client, as long as
> they are all successful, things are fine.  Once there is a 404 error
> however, this appears to latch, and subsequent requests give 404
> whether or not they were resolvable by upstream.

Makes sense that curl remembers 404 results. Does that mean we need to
refresh the curl handle when a request is made for a negative cached
entry and cache_miss_s expires?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: Storing package metadata in ELF objects

2021-05-14 Thread Luca Boccassi
On Fri, 2021-05-14 at 12:41 +0200, Guillem Jover wrote:
> On Sat, 2021-04-10 at 13:38:31 +0100, Luca Boccassi wrote:
> > On Sat, 2021-04-10 at 13:29 +0100, Luca Boccassi wrote:
> > > After an initial discussion [0], recently we have been working on a new
> > > specification [0] to encode rich package-level metadata inside ELF
> > > objects, so that it can be included automatically in generated coredump
> > > files. The prototype to parse this in systemd-coredump and store the
> > > information in systemd-journal is ready for testing and merged
> > > upstream. We are now seeking further comments/opinions/suggestions, as
> > > we have a few months before the next release and thus there's plenty of
> > > time to make incompatible changes to the format and implementation, if
> > > required.
> 
> I've skimmed over the discussion at [0], and while having this data
> seems like it might be "nice", I've to agree with the comments there
> voicing that there does not really seem to be an actual need and the
> overhead and additional work do not seem worth it, TBH, at least
> in the Debian context.

Hi Guillem, thanks for having a look, much appreciated!

Just to clarify, the need is there - this is not an experimental
exercise, but it is borne out of an actual need&requirement, and it is
undergoing testing right now before deployment in a large scale
production infrastructure.
Not _everybody_ will need it, and not everywhere - that's absolutely
fair, and discussions on whether the ovearhead is worth it for
something that is not universally needed, but only in certain use
cases, is perfectly reasonable and welcome. I know Zbigniew is going to
try and get some raw numbers on the kind of overhead we are talking
about, that will hopefully help frame the discussion with more
precision.

> > > The Fedora Wiki and the systemd.io document have more details, but to
> > > make a long story short, a new .notes.package section with a JSON
> > > payload will be included in ELF objects, encoding various package-
> > > build-time information like distro name&version, package name&version,
> > > etc.
> > > 
> > > To summarize from the discussion, the main reasons why we believe this
> > > is useful are as following:
> > > 
> > > 1) minimal containers: the rpm database is not installed in the
> > > containers. The information about build-ids needs to be stored
> > > externally, so package name information is not available immediately,
> > > but only after offline processing. The new note doesn't depend on the
> > > rpm db in any way.
> 
> In the Debian context, the build-ids data is going to be available
> in the affected executables, and in debug symbols packages and the
> Packages metaindices listing them, so there's no need for access to
> any local dpkg database. Given that someone needing to install debug
> packages will need access to those indices (either with outgoing network
> access or with a repository mirror), these can be queried at that time.
> Not to mention that AFAIR the Debian debug symbol servers make it
> possible to query for specific build-ids.

This is not strictly related to debug packages, though? In fact, on
systems where this could be of most use you explicitly do _not_ install
debug packages (or anything at all). Or even if you wanted to, you
could not - corefiles are not handled inside the container, but
outside. Even if you wanted to and were allowed to (which for many
environments it's not the case), you can't install a Debian debug
package on a CoreOS host or Mariner host or a Flatcar host.

> > > 2) handling of a core from a container, where the container and host
> > > have different distros
> 
> How each distribution handles debug packages and debug symbols is
> going to be different, so it seems there will be a need for explicit
> handling of these, at which point the above mentioned querying can be
> implemented as well, w/o the need to encode the packaging data inside
> the executable.

Again, matching to debug symbols is not the main goal here, build-id
works for that. The main goal is to have useful metadata immediately
available in all occasions, regardless of where the core was generated
on the host, without reaching out to external services, so that it is
directly included and collated in the system journal when the core file
is handled.

With a common metadata definition, there's no need to query or
explicitly handle anything - this already works if you use systemd-
coredump built from the main branch, and handle a core file from
different containers running different distros with binaries having
this metadata in the ELF file, and it just works. This is tested, not
theoretical.

> > > 3) self-built and external packages: unless a lot of care is taken to
> > > keep access to the debuginfo packages, this information may be lost.
> > > The new note is available even if the repository metadata gets lost.
> > > Users can easily provide equivalent information in a format that makes
> > > sense i

PR27859 PATCH: correct 404-latch in connection reuse

2021-05-14 Thread Frank Ch. Eigler via Elfutils-devel
Hi -

(I'll deploy this fix to one of the public servers to confirm it there.)


Author: Frank Ch. Eigler 
Date:   Fri May 14 18:37:30 2021 -0400

PR27859: correct 404-latch bug in debuginfod client reuse

PR27701 implemented curl handle reuse in debuginfod_client objects,
but with an unexpected bug.  Server responses returning an error
"latched" because the curl_easy handles for error cases weren't all
systematically removed from the curl multi handle.  This prevented
their proper re-addition the next time.

This version of the code simplfies matters by making only the curl
curl_multi handle long-lived.  This turns out to be enough, because it
can maintain a pool of long-lived http/https connections and related
data, and lend them out to short-lived curl_easy handles.  This mode
handles errors or hung downloads even better, because the easy handles
don't undergo complex state transitions between reuse.

A new test case confirms this correction via the federating debuginfod
instance (cleaning caches between subtests to make sure http* is being
used and reused).

diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 249385b6a3f7..21407dc2e524 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,11 @@
+2021-05-14  Frank Ch. Eigler 
+
+   PR27859
+   * debuginfod-client.c (debuginfod_client): Retain only
+   long-lived multi handle from PR27701 work.
+   (debuginfo_begin,debuginfod_end): ctor/dtor for surviving field only.
+   (debuginfod_query_server): Rework to reuse multi handle only.
+
 2021-04-19  Martin Liska  
 
* debuginfod-client.c (debuginfod_query_server): Use startswith.
diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
index cb51c2611796..ee7eda24df9f 100644
--- a/debuginfod/debuginfod-client.c
+++ b/debuginfod/debuginfod-client.c
@@ -119,9 +119,8 @@ struct debuginfod_client
   /* File descriptor to output any verbose messages if > 0.  */
   int verbose_fd;
 
-  /* Count DEBUGINFOD_URLS elements and corresponding curl handles. */
-  int num_urls;
-  CURL **server_handles;
+  /* Maintain a long-lived curl multi-handle, which keeps a
+ connection/tls/dns cache to recently seen servers. */
   CURLM *server_mhandle;
   
   /* Can contain all other context, like cache_path, server_urls,
@@ -541,12 +540,6 @@ debuginfod_query_server (debuginfod_client *c,
 
   /* Is there any server we can query?  If not, don't do any work,
  just return with ENOSYS.  Don't even access the cache.  */
-  if (c->num_urls == 0)
-{
-  rc = -ENOSYS;
-  goto out;
-}
-  
   urls_envvar = getenv(DEBUGINFOD_URLS_ENV_VAR);
   if (vfd >= 0)
 dprintf (vfd, "server urls \"%s\"\n",
@@ -770,13 +763,20 @@ debuginfod_query_server (debuginfod_client *c,
   goto out0;
 }
 
+  /* Count number of URLs.  */
+  int num_urls = 0;
+  for (int i = 0; server_urls[i] != '\0'; i++)
+if (server_urls[i] != url_delim_char
+&& (i == 0 || server_urls[i - 1] == url_delim_char))
+  num_urls++;
+  
   CURLM *curlm = c->server_mhandle;
   assert (curlm != NULL);
   
   /* Tracks which handle should write to fd. Set to the first
  handle that is ready to write the target file to the cache.  */
   CURL *target_handle = NULL;
-  struct handle_data *data = malloc(sizeof(struct handle_data) * c->num_urls);
+  struct handle_data *data = malloc(sizeof(struct handle_data) * num_urls);
   if (data == NULL)
 {
   rc = -ENOMEM;
@@ -786,7 +786,7 @@ debuginfod_query_server (debuginfod_client *c,
   /* thereafter, goto out1 on error.  */
 
   /* Initialize handle_data with default values. */
-  for (int i = 0; i < c->num_urls; i++)
+  for (int i = 0; i < num_urls; i++)
 {
   data[i].handle = NULL;
   data[i].fd = -1;
@@ -797,23 +797,20 @@ debuginfod_query_server (debuginfod_client *c,
   char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
 
   /* Initialize each handle.  */
-  for (int i = 0; i < c->num_urls && server_url != NULL; i++)
+  for (int i = 0; i < num_urls && server_url != NULL; i++)
 {
   if (vfd >= 0)
dprintf (vfd, "init server %d %s\n", i, server_url);
 
   data[i].fd = fd;
   data[i].target_handle = &target_handle;
-  data[i].handle = c->server_handles[i];
-  assert (data[i].handle != NULL);
-  curl_easy_reset(data[i].handle); // esp. previously sent http headers
-  data[i].client = c;
-
+  data[i].handle = curl_easy_init();
   if (data[i].handle == NULL)
 {
   rc = -ENETUNREACH;
   goto out1;
 }
+  data[i].client = c;
 
   /* Build handle url. Tolerate both  http://foo:999  and
  http://foo:999/  forms */
@@ -869,7 +866,7 @@ debuginfod_query_server (debuginfod_client *c,
 
   /* Query servers in parallel.  */
   if (vfd >= 0)
-dprintf (vfd, "query %d urls in parallel\n", c->num_urls);
+dprintf (vfd, "query %d