> On Aug 3, 2018, at 6:17 PM, Leonard Mosescu <mose...@google.com> wrote: > > Greg, Mark, > > Looking at the code, LLDB falls back to a full file crc32 to create the > module UUID if the ELF build-id is missing. This works, in the sense that the > generated UUID does indeed identify the module. > > But there are a few problems with this approach: > > 1. First, runtime performance: a full file crc32 is a terribly inefficient > way to generate a temporary UUID that is basically just used to match a local > file to itself. > - especially when some unstripped binaries can be very large. for example a > local chromium build produces a 5.3Gb chrome binary > - the crc32 implementation is decent, but single-threaded > - to add insult to the injury, it seems a small bug defeats the intention to > cache the hash value so it ends up being recalculated multiple times > > 2. The fake UUID is not going to match any external UUID that may be floating > around (and yet not properly embedded into the binary) > - an example is Breakpad, which unfortunately also attempts to make up UUIDs > when the build-id is missing (something we'll hopefully fix soon) > > Is there a fundamental reason to calculate the full file crc32? If not I > propose to improve this based on the following observations: > > A. Model the reality more accurately: an ELF w/o a build-id doesn't really > have an UUID. So use a zero-length UUID in LLDB. > B. The full file name should be enough to prove the identity of a local > module. > C. If we try to match an external UUID (ex. from a minidump) with a local > file which does not have an UUID it may help to have an option to allow it to > match (off by default, and only if there's no better match) > > What do you think?
I am fine with all the above except some reservations about case C. No need to calculate something if it isn't useful. For case C it should be fine to never match as if a file has a UUID to begin with it typically isn't something that gets stripped in a stripped binary. So we should either have it or not. If breakpad does calculate a CRC32, then we need to know to ignore the UUID. The problem is we probably won't be able to tell what the UUID is: real from build ID, or from GNU debug info CRC, or CRC of entire file. So the minidump code will need to do something here. If a minidump has the linux auxv and memory map in them, then we might need to dig through the section information and deduce if a file matches or not based off the size of mapped program headers to further help with the matching. One other idea is to make a set of enumerations for the UUID type: class UUID { enum class Type { BuildID, // A build ID from the compiler or linker GNUDebugInfoCRC, // GNU debug info CRC MD5, // MD5 of entire file MD5NonDebug, // MD5 of the non debug info related bits CRC32, // CRC32 of entire file Other, // Anything else }; }; The eTypeMD5NonDebug is what apple does: it MD5 checksums only the parts of the file that don't change with debug info or any paths found in debug info or symbols tables. So if you build a binary in /tmp/a or in /private/var/local/foo, the UUID is the same if the binary is essentially the same (code, data, etc). Then we can make intelligent comparisons between UUID types. Might even be possible for a module to have more than 1 UUID then if a binary contains a eTypeBuildID and a eTypeGNUDebugInfoCRC. If a tool stores its UUIDs as a CRC32 or MD5, then those can be calculated on the fly. The GetUUID on lldb_private::Module might become: const lldb_private::UUID &Module::GetUUID(UUID::Type uuid_type); Thoughts? Greg > > Thanks, > Lemo. > _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev