Hello Leonard, while I'm in principle supportive of this idea, I think it's not going to be as easy as you might imagine. There are currently at least two mechanisms which rely on this crc UUID.
1. .gnu_debuglink separate file pointer <https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html>. This is where the choice of the crc algorithm comes from. In short, this mechanism for debug info location works like this: The stripped file contains a .gnu_debuglink section. The section contains a file path and a crc checksum. After reading this section the debugger is expected to look for the file at the given path, and then compute it's checksum to verify it is indeed the correct file (hasn't been modified). In LLDB, this is implemented somewhat differently. First we have a mechanism for assigning UUIDs to modules. This mechanism does one of two things (I'm assuming here none of the files have proper build-ids in them). If the file has a .gnu_debuglink section (the stripped file), we fetch the CRC from there, and assign that as the build-id of the file. If the file doesn't have this section, we **assume** it is going to be a target of gnu_debuglink pointer in another file, compute its CRC via the specified algorithm, and assign that as the build-id. Second, we have a mechanism for locating the unstripped file. This just fetches the path from gnu_debuglink, and compares the UUIDs of the two modules. This works because of the UUID algorithm in the first part. Now, this is not a particularly smart way of doing things and it is the cause of the most of your problems. However, it is completely avoidable. Instead of piggy-backing on the UUID mechanism, we should just change the search mechanism (the second part) to compute the crc itself, and only after it successfully finds the file referenced by the gnu_debuglink section. This way, we will only compute the crc only when absolutely necessary (i.e. for your use case, never). 2. Currently, for remote debugging, we assume that each module has some sort of a unique identifier which we can check to see whether we have downloaded that file already (see qModuleInfo packet). I am not sure what would happen if we suddenly just stopped computing a UUID for these files, but we at the very least lose the ability to detect changes to the remote files. For this item, I am not sure what would be the best course of action. Maybe we should just start relying on modification timestamp for these files? On Sat, 4 Aug 2018 at 02:17, Leonard Mosescu <mose...@google.com> wrote: > > Greg, Mark, > > Looking at the code, LLDB falls back to a full file crc32 to create the > module UUID if the ELF build-id is missing. This works, in the sense that the > generated UUID does indeed identify the module. > > But there are a few problems with this approach: > > 1. First, runtime performance: a full file crc32 is a terribly inefficient > way to generate a temporary UUID that is basically just used to match a local > file to itself. > - especially when some unstripped binaries can be very large. for example a > local chromium build produces a 5.3Gb chrome binary > - the crc32 implementation is decent, but single-threaded > - to add insult to the injury, it seems a small bug defeats the intention to > cache the hash value so it ends up being recalculated multiple times > > 2. The fake UUID is not going to match any external UUID that may be floating > around (and yet not properly embedded into the binary) > - an example is Breakpad, which unfortunately also attempts to make up UUIDs > when the build-id is missing (something we'll hopefully fix soon) > > Is there a fundamental reason to calculate the full file crc32? If not I > propose to improve this based on the following observations: > > A. Model the reality more accurately: an ELF w/o a build-id doesn't really > have an UUID. So use a zero-length UUID in LLDB. > B. The full file name should be enough to prove the identity of a local > module. > C. If we try to match an external UUID (ex. from a minidump) with a local > file which does not have an UUID it may help to have an option to allow it to > match (off by default, and only if there's no better match) I think we might have something already which could serve this purpose. Eugene added a couple of months ago a mechanism to force-load symbols for a given file regardless of the UUIDs <https://reviews.llvm.org/D35607>. It requires an explicit "target symbols add" command (which seems reasonable, as lldb has no way to tell if it's doing things right). Would something like that work for you? cheers, pl _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev