On Thu, Jul 11, 2024 at 04:16:25PM -0400, Frank Ch. Eigler wrote:
> Hi, Omar -
> 
> Thanks.  I wish this sort of amazing kludge weren't necessary, but
> given that it helps, so be it.
> 
> I'd like to commend you on the effort needed to match your code up
> with the stylistic idiosyncracies of the debuginfod c++ code.  It
> looks just like the other code.  My only reservation is the schema
> change.  Reindexing some of our large repos takes WEEKS.  Here's a
> possible way to avoid that:
> 
> - Preserve the current BUILDID schema id and tables as is.
> 
> - Add a new table for the intra-archive coordinates.  Think of it like a 
> cache.
>   Index it with archive-file-name and content-file-name (source0, source1 
> IIRC).
> 
> - During a fetch out of the archive-file-name, check whether the new
>   table has a record for that file.  If yes, cache hit, go through to
>   the xz extraction stuff, winner!
> 
> - If not, try the is_seekable() check on the archive.  If it is true, we have 
> an
>   archive that should be seekable, but we don't have it in the intra-archive 
> cache.
>   So take this opportunity to index that archive (only), populate the cache 
> table,
>   as the archive is being extracted.  (No need to use the new cache data 
> then, since
>   we've just paid the effort of decompressing/reading the whole thing 
> already.)
> 
> - Need to confirm that during grooming, a disappeared
>   archive-file-name would also drop the corresponding intra-archive
>   rows.
> 
> - Heck, during grooming or scanning, maybe the tool could preemptively
>   do the intra-archive coordinate cache thing if it's not already
>   done, just to defeat the latency of doing it on demand.
> 
> 
> What do you think?

Hi, Frank,

I didn't realize how expensive reindexing could be, thank you for
pointing that out.  Your proposal makes sense to me, I'll rework this.

Thanks,
Omar

Reply via email to