Hi, Omar - Thanks. I wish this sort of amazing kludge weren't necessary, but given that it helps, so be it.
I'd like to commend you on the effort needed to match your code up with the stylistic idiosyncracies of the debuginfod c++ code. It looks just like the other code. My only reservation is the schema change. Reindexing some of our large repos takes WEEKS. Here's a possible way to avoid that: - Preserve the current BUILDID schema id and tables as is. - Add a new table for the intra-archive coordinates. Think of it like a cache. Index it with archive-file-name and content-file-name (source0, source1 IIRC). - During a fetch out of the archive-file-name, check whether the new table has a record for that file. If yes, cache hit, go through to the xz extraction stuff, winner! - If not, try the is_seekable() check on the archive. If it is true, we have an archive that should be seekable, but we don't have it in the intra-archive cache. So take this opportunity to index that archive (only), populate the cache table, as the archive is being extracted. (No need to use the new cache data then, since we've just paid the effort of decompressing/reading the whole thing already.) - Need to confirm that during grooming, a disappeared archive-file-name would also drop the corresponding intra-archive rows. - Heck, during grooming or scanning, maybe the tool could preemptively do the intra-archive coordinate cache thing if it's not already done, just to defeat the latency of doing it on demand. What do you think? - FChE