Hello Leonard,

while I'm in principle supportive of this idea, I think it's not going
to be as easy as you might imagine. There are currently at least two
mechanisms which rely on this crc UUID.

1. .gnu_debuglink separate file pointer
<https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html>.
This is where the choice of the crc algorithm comes from.

In short, this mechanism for debug info location works like this: The
stripped file contains a .gnu_debuglink section.  The section contains
a file path and a crc checksum. After reading this section the
debugger is expected to look for the file at the given path, and then
compute it's checksum to verify it is indeed the correct file (hasn't
been modified).

In LLDB, this is implemented somewhat differently. First we have a
mechanism for assigning UUIDs to modules. This mechanism does one of
two things (I'm assuming here none of the files have proper build-ids
in them). If the file has a .gnu_debuglink section (the stripped
file), we fetch the CRC from there, and assign that as the build-id of
the file. If the file doesn't have this section, we **assume** it is
going to be a target of gnu_debuglink pointer in another file, compute
its CRC via the specified algorithm, and assign that as the build-id.

Second, we have a mechanism for locating the unstripped file. This
just fetches the path from gnu_debuglink, and compares the UUIDs of
the two modules. This works because of the UUID algorithm in the first
part.

Now, this is not a particularly smart way of doing things and it is
the cause of the most of your problems. However, it is completely
avoidable. Instead of piggy-backing on the UUID mechanism, we should
just change the search mechanism (the second part) to compute the crc
itself, and only after it successfully finds the file referenced by
the gnu_debuglink section. This way, we will only compute the crc only
when absolutely necessary (i.e. for your use case, never).


2. Currently, for remote debugging, we assume that each module has
some sort of a unique identifier which we can check to see whether we
have downloaded that file already (see qModuleInfo packet). I am not
sure what would happen if we suddenly just stopped computing a UUID
for these files, but we at the very least lose the ability to detect
changes to the remote files. For this item, I am not sure what would
be the best course of action. Maybe we should just start relying on
modification timestamp for these files?

On Sat, 4 Aug 2018 at 02:17, Leonard Mosescu <mose...@google.com> wrote:
>
> Greg, Mark,
>
> Looking at the code, LLDB falls back to a full file crc32 to create the 
> module UUID if the ELF build-id is missing. This works, in the sense that the 
> generated UUID does indeed identify the module.
>
> But there are a few problems with this approach:
>
> 1. First, runtime performance: a full file crc32 is a terribly inefficient 
> way to generate a temporary UUID that is basically just used to match a local 
> file to itself.
> - especially when some unstripped binaries can be very large. for example a 
> local chromium build produces a 5.3Gb chrome binary
> - the crc32 implementation is decent, but single-threaded
> - to add insult to the injury, it seems a small bug defeats the intention to 
> cache the hash value so it ends up being recalculated multiple times
>
> 2. The fake UUID is not going to match any external UUID that may be floating 
> around (and yet not properly embedded into the binary)
> - an example is Breakpad, which unfortunately also attempts to make up UUIDs 
> when the build-id is missing (something we'll hopefully fix soon)
>
> Is there a fundamental reason to calculate the full file crc32? If not I 
> propose to improve this based on the following observations:
>
> A. Model the reality more accurately: an ELF w/o a build-id doesn't really 
> have an UUID. So use a zero-length UUID in LLDB.
> B. The full file name should be enough to prove the identity of a local 
> module.
> C. If we try to match an external UUID (ex. from a minidump) with a local 
> file which does not have an UUID it may help to have an option to allow it to 
> match (off by default, and only if there's no better match)

I think we might have something already which could serve this
purpose. Eugene added a couple of months ago a mechanism to force-load
symbols for a given file regardless of the UUIDs
<https://reviews.llvm.org/D35607>. It requires an explicit "target
symbols add" command (which seems reasonable, as lldb has no way to
tell if it's doing things right). Would something like that work for
you?

cheers,
pl
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to