> On Aug 3, 2018, at 6:17 PM, Leonard Mosescu <mose...@google.com> wrote:
> 
> Greg, Mark,
> 
> Looking at the code, LLDB falls back to a full file crc32 to create the 
> module UUID if the ELF build-id is missing. This works, in the sense that the 
> generated UUID does indeed identify the module. 
> 
> But there are a few problems with this approach:
> 
> 1. First, runtime performance: a full file crc32 is a terribly inefficient 
> way to generate a temporary UUID that is basically just used to match a local 
> file to itself.
> - especially when some unstripped binaries can be very large. for example a 
> local chromium build produces a 5.3Gb chrome binary 
> - the crc32 implementation is decent, but single-threaded
> - to add insult to the injury, it seems a small bug defeats the intention to 
> cache the hash value so it ends up being recalculated multiple times
> 
> 2. The fake UUID is not going to match any external UUID that may be floating 
> around (and yet not properly embedded into the binary)
> - an example is Breakpad, which unfortunately also attempts to make up UUIDs 
> when the build-id is missing (something we'll hopefully fix soon)
> 
> Is there a fundamental reason to calculate the full file crc32? If not I 
> propose to improve this based on the following observations:
> 
> A. Model the reality more accurately: an ELF w/o a build-id doesn't really 
> have an UUID. So use a zero-length UUID in LLDB.
> B. The full file name should be enough to prove the identity of a local 
> module.
> C. If we try to match an external UUID (ex. from a minidump) with a local 
> file which does not have an UUID it may help to have an option to allow it to 
> match (off by default, and only if there's no better match)
> 
> What do you think?

I am fine with all the above except some reservations about case C. No need to 
calculate something if it isn't useful. For case C it should be fine to never 
match as if a file has a UUID to begin with it typically isn't something that 
gets stripped in a stripped binary. So we should either have it or not. If 
breakpad does calculate a CRC32, then we need to know to ignore the UUID. The 
problem is we probably won't be able to tell what the UUID is: real from build 
ID, or from GNU debug info CRC, or CRC of entire file. So the minidump code 
will need to do something here. If a minidump has the linux auxv and memory map 
in them, then we might need to dig through the section information and deduce 
if a file matches or not based off the size of mapped program headers to 
further help with the matching. 

One other idea is to make a set of enumerations for the UUID type:

class UUID {
  enum class Type {
    BuildID,         // A build ID from the compiler or linker
    GNUDebugInfoCRC, // GNU debug info CRC
    MD5,             // MD5 of entire file
    MD5NonDebug,     // MD5 of the non debug info related bits
    CRC32,           // CRC32 of entire file
    Other,           // Anything else
  };
};

The eTypeMD5NonDebug is what apple does: it MD5 checksums only the parts of the 
file that don't change with debug info or any paths found in debug info or 
symbols tables. So if you build a binary in /tmp/a or in 
/private/var/local/foo, the UUID is the same if the binary is essentially the 
same (code, data, etc).

Then we can make intelligent comparisons between UUID types. Might even be 
possible for a module to have more than 1 UUID then if a binary contains a 
eTypeBuildID and a eTypeGNUDebugInfoCRC. If a tool stores its UUIDs as a CRC32 
or MD5, then those can be calculated on the fly. The GetUUID on 
lldb_private::Module might become:

const lldb_private::UUID &Module::GetUUID(UUID::Type uuid_type);

Thoughts?

Greg

> 
> Thanks,
> Lemo.
> 

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to