On Thu, Nov 02, 2023 at 12:07:04AM +0100, Mark Wielaard wrote:
> Hi Omar,
> 
> On Wed, Sep 27, 2023 at 11:20:57AM -0700, Omar Sandoval wrote:
> > The .debug_cu_index and .debug_tu_index sections in DWARF package files
> > are basically hash tables mapping a unit's 8 byte signature to an offset
> > and size in each section used by that unit [1].  Add support for parsing
> > and doing lookups in the index sections.
> > 
> > We look up a unit in the index when we intern it and cache its hash
> > table row in Dwarf_CU.
> 
> This looks good. Thanks for the various testcases.
> 
> Do I understand correctly that binutils dwp only does the DWARF4 GNU
> extension and llvm-dwp only does the DWARF5 standardized format? Maybe
> we should create a eu-dwp that does both.

llvm-dwp can do both the DWARF4 GNU format and the DWARF5 format.
binutils dwp can only do DWARF4.

> It looks like you are very careful checking boundaries, but it would
> be bad to do some fuzzing on this.
> 
> > Then, a new function, dwarf_cu_dwp_section_info,
> > can be used to look up the section offsets and sizes for a unit.  This
> > will mostly be used internally in libdw, but it will also be needed in
> > static inline functions shared with eu-readelf.  Additionally, making it
> > public it makes dwp support much easier for external tools that do their
> > own low-level parsing of DWARF information, like drgn [2].
> 
> Although I am not against this new interface, I am not super
> enthousiastic about it.
> 
> There is one odd part, DW_SECT_TYPES should be defined in dwarf.h with
> the other DW_SECT constants, otherwise it cannot be used (unless you
> define it yourself to 2).

Yeah, I wasn't sure about adding it or not since it's not technically
defined in the DWARF5 standard. But if we can count on future DWARF
standards not reusing the value 2 (which we probably can), I agree that
it seems better to add it to dwarf.h.

> For eu-readelf we don't really need it being public, it already cheats
> a little and uses some (non-public) libdwP.h functions. That is
> actually not great either. So if there is a public function that is
> available that is actually preferred. But if it is just for
> eu-readelf, I don't think it is really needed. And it seems you
> haven't actually added the support to eu-readelf to print these
> offsets.

Right, eu-readelf only uses it indirectly via the static inline
functions modified in patch 13. It looks like eu-readelf dynamically
links to libdw (on Fedora, at least), so either some part of the dwp
implementation needs to be exported, or a big chunk of it needs to be
inlined in libdwP.h, right?

Either way, I'd still love to have this interface for drgn.

> It is fine to expose these offsets and sizes, but how exactly are you
> using them in drgn? It seems we don't have any other interfaces in
> libdw that you can then use these with.

Here's the branch I linked to in my cover letter:
https://github.com/osandov/drgn/tree/dwp. I need this interface for the
couple of places where drgn parses DWARF itself.

One of those places is in the global name indexing step drgn does at
startup. We do this by enumerating all CUs using libdw, then using our
own purpose-built DWARF parser to do a fast, parallelized scan.

Our bespoke parser needs to know the base of the abbreviation table and
the string offsets table, which is easy without dwp. This interface also
makes it easy with dwp. Without this interface, I'd need to reimplement
the CU index parser in drgn :(

> Can we split off this public interface from the rest of this patch?
> But then we also need to split off the tests. So maybe keep them together?

Yeah, I included the interface in this patch exactly so that I could
test the implementation in the same patch. I'm happy to split that up
however you'd prefer.

[snip]

> > +  Dwarf_Package_Index *index = malloc (sizeof (*index));
> > +  if (index == NULL)
> > +    {
> > +      __libdw_seterrno (DWARF_E_NOMEM);
> > +      return NULL;
> > +    }
> > +
> > +  index->dbg = dbg;
> > +  /* Set absent sections to UINT32_MAX.  */
> > +  memset (index->sections, 0xff, sizeof (index->sections));
> 
> OK, although I would have preferred a simple for loop and let the
> compiler optimize it.

Ok, I can fix that.

[snip]

> > +static int
> > +__libdw_dwp_section_info (Dwarf_Package_Index *index, uint32_t unit_row,
> > +                     unsigned int section, Dwarf_Off *offsetp,
> > +                     Dwarf_Off *sizep)
> > +{
> > +  if (index == NULL)
> > +    return -1;
> > +  if (unit_row == 0)
> > +    {
> > +      __libdw_seterrno (DWARF_E_INVALID_DWARF);
> > +      return -1;
> > +    }
> 
> OK, but why isn't the caller checking this?

Partially becase it simplified callers like __libdw_dwp_findcu_id to not
require a separate check, and partially just to be defensive.

[snip]

Thanks for taking a look!

Reply via email to