Hi Frank, On Fri, 2019-12-06 at 22:03 -0500, Frank Ch. Eigler wrote: > @@ -851,7 +867,11 @@ handle_buildid_r_match (int64_t b_mtime, > > > return 0; > > > } > > > > > > - string popen_cmd = string("rpm2cpio " + shell_escape(b_source0)); > > > + string archive_decoder = "/dev/null"; > > > + for (auto&& arch : scan_archives) > > > + if (string_endswith(b_source0, arch.first)) > > > + archive_decoder = arch.second; > > > + string popen_cmd = archive_decoder + " " + shell_escape(b_source0); > > > FILE* fp = popen (popen_cmd.c_str(), "r"); // "e" O_CLOEXEC? > > > if (fp == NULL) > > > throw libc_exception (errno, string("popen ") + popen_cmd); > > > > This seems a lot of work for dealing with non-archives. > > We don't do this for non-archives. This is in the path where an > archive record already matched in the database. > > > > +archive_classify (const string& rps, sqlite_ps& ps_upsert_buildids, > > > sqlite_ps& ps_upsert_files, > > > sqlite_ps& ps_upsert_de, sqlite_ps& ps_upsert_sref, > > > sqlite_ps& ps_upsert_sdef, > > > time_t mtime, > > > unsigned& fts_executable, unsigned& fts_debuginfo, > > > unsigned& fts_sref, unsigned& fts_sdef, > > > bool& fts_sref_complete_p) > > > { > > > - string popen_cmd = string("rpm2cpio " + shell_escape(rps)); > > > + string archive_decoder = "/dev/null"; > > > + for (auto&& arch : scan_archives) > > > + if (string_endswith(rps, arch.first)) > > > + archive_decoder = arch.second; > > > + string popen_cmd = archive_decoder + " " + shell_escape(rps); > > > FILE* fp = popen (popen_cmd.c_str(), "r"); // "e" O_CLOEXEC? > > > if (fp == NULL) > > > throw libc_exception (errno, string("popen ") + popen_cmd); > > > > Likewise as above. Can we skip the whole popen dance if nothing > > matches? > > If you check out the caller, this part is not even called if > the extension does not match.
I see, I missed that both functions are only called after first checking the archive type. I think it might be helpful/clearer if both methods would be called with the intended archive type then, also because that might make it simpler to... > > But it does feel like the errors, logs and metrics are a little > > generic (e.g. "cannot select all format"). > > The way in which specializing the format errors could help if > debuginfod were run against rpms that had a non-cpio payload, or debs > that had a non-tar payload. This means some sort of corruption, which > contravenes our "trustworthy data" assumption -- or upstream policy > change, which is nothing to worry about. > > If you think separate metrics for .deb vs .rpm archives might be > useful, can do. If it isn't too much work then I do think it would be useful/clearer if the logs/metrics reported debs and rpms separately. Thanks, Mark