Hi, Omar -

Thanks.  I wish this sort of amazing kludge weren't necessary, but
given that it helps, so be it.

I'd like to commend you on the effort needed to match your code up
with the stylistic idiosyncracies of the debuginfod c++ code.  It
looks just like the other code.  My only reservation is the schema
change.  Reindexing some of our large repos takes WEEKS.  Here's a
possible way to avoid that:

- Preserve the current BUILDID schema id and tables as is.

- Add a new table for the intra-archive coordinates.  Think of it like a cache.
  Index it with archive-file-name and content-file-name (source0, source1 IIRC).

- During a fetch out of the archive-file-name, check whether the new
  table has a record for that file.  If yes, cache hit, go through to
  the xz extraction stuff, winner!

- If not, try the is_seekable() check on the archive.  If it is true, we have an
  archive that should be seekable, but we don't have it in the intra-archive 
cache.
  So take this opportunity to index that archive (only), populate the cache 
table,
  as the archive is being extracted.  (No need to use the new cache data then, 
since
  we've just paid the effort of decompressing/reading the whole thing already.)

- Need to confirm that during grooming, a disappeared
  archive-file-name would also drop the corresponding intra-archive
  rows.

- Heck, during grooming or scanning, maybe the tool could preemptively
  do the intra-archive coordinate cache thing if it's not already
  done, just to defeat the latency of doing it on demand.


What do you think?


- FChE

Reply via email to