Re: missing patch for texlive-bin (e77412362f)

zimoun Thu, 03 Feb 2022 11:48:45 -0800

Hi Timothy,

On Thu, 03 Feb 2022 at 10:46, Timothy Sample <samp...@ngyro.com> wrote:


>> But the question is if Disarchive dissambles and preserves external
>> patches.  Timothy?

[...]

> The bad news is that 0.75 is not there.  At first I was going to
> apologize for the shortcomings of the sampling approach... until I
> realized you are trying to trick me!  ;)  Unless I’m misreading the Git
> history, that patch appeared and disappeared on core-updates and was
> never part of master.

Because of the good news, the same could be applied for these patches,
no?

For instance, one missing patch–as Maxime pointed it–is there:

https://github.com/archlinux/svntogit-packages/blob/155510dd18d2f290085f40d2a95a3701db4a224d/texlive-bin/repos/extra-x86_64/pdftex-poppler0.75.patch

And SWH contains it:

https://archive.softwareheritage.org/browse/revision/155510dd18d2f290085f40d2a95a3701db4a224d/?path=texlive-bin/repos/extra-x86_64/pdftex-poppler0.75.patch

Therefore, somehow it “only” misses to dissamble this data and add an
entry to the database, no?


I miss what you mean by «was never part of master».  After the merge,
what was core-updates and what was master is somehow indistinguishable,
no?  Or are you walking only to first-parent after merge commit?  Well,
Git history and sorting leads to headache; as git-log doc shows. :-)

I think it is fine to simplify “complex” history with a sampling
considering only first-parent walk.


> The way the PoG script tracks down sources is pretty robust.  It takes
> the derivation graph to be canonical, and only uses the graph of
> high-level objects (packages, origins, etc.) for extra info.  I do my
> best to follow the links of the high-level objects, and then verify that
> I did a good job by lowering them and checking coverage against the set
> of derivations found by following the derivation graph.  Since the
> derivation graph necessarily contains everything that matters, this is a
> good way to track down all the high-level objects that matter.  See
> <https://git.ngyro.com/preservation-of-guix/tree/pog-inferior.scm#n113>
> for a rather scary looking procedure that finds the edges of the
> high-level object graph.

Cool!  Thanks for explaining and pointing how PoG is doing.


> That being said, coverage is not perfect.  The most obvious problem (to
> me) is the sampling approach.  Surely there are sources that are missed
> by only examining one commit per week.  This can be checked and fixed by
> using data from the Guix Data Service, which has data from essentially
> every Guix commit.

No, the Data Service and even Cuirass are using a sampling approach too;
they do not process all the commits.

Cuirass uses a «every 5 minutes» approach; please CI savvy people
correct me if I mistake.  The Data Service uses a «batch guix-commits»
approach; more details in this thread [1].


Well, the coverage is twofold, IMHO.

 1. preserve what is currently entering in Guix
 2. archive what was available in Guix

About #1, the main mechanism are sources.json, “guix lint”, and update
disarchive-db (now done by CI).  What is missed should be fixed by #2.

About #2, it is hard to fix all the issues at once.  One commit per week
already provides a good view to spot some problems.  Somehow, process
all the commits just means burn more CPU; it seems “easy” once the
infrastructure is in-place, no?

1: <https://yhetil.org/guix/863617oe1h....@gmail.com/>


Cheers,
simon

Re: missing patch for texlive-bin (e77412362f)

Reply via email to