Hi,
On 1/24/24 10:22, Ludovic Courtès wrote:
The question boils down to: Git-LFS or Git Annex?
[...]
What’s your experience? What would you suggest?
I have a few times had a problem for which I thought Git LFS might be a
solution, and each time I have ended up ripping out Git LFS in
frustration before long.
I have not used Git Annex. I have looked into it a few times, but each
time I decided it was too complex or not quite suitable for my use-case
in some way. On the other hand, I have heard good things about it from
people who have used it: in particular, I believe Morgan Lemmer-Webber
(CC'ed) used it to manage a large set of art history images.
The main thing in this context that still isn't clear to me from by
reading so far is how sharing lists of remotes works with Git Annex. In
plain Git, remotes are part of the local state of a particular clone,
not distributed as part of the repository. For the objectives here,
though, a lot of the benefit would seem to be having many copies in
synchronized, possibly "special" remotes so that anyone trying to get
the videos would have plenty of ways to get them. I'm not sure to what
extent Git Annex does that out of the box.
I did see that Git Annex can use Git LFS as a "special remote".
There are also two other approaches I think would be worth at least
considering:
1. Just use Git
While the limitations of Git for storing large media files are well
known, I have found it to be good enough for several use-cases, and it
has the strong advantage of not requiring additional tools. My
impression is that a significant factor in people using Git LFS, in
particular, is the limit on repository size imposed by the popular
hosting providers. There are strategies within Git to avoid having to
download unwanted artifacts, including creating branches with unrelated
histories, shallow clones (e.g. --depth=1 --single-branch), partial
clones [1][2][3] (e.g. --filter=blob:none), and sparse checkouts [4][5],
with the later two being fairly new features.
[1]: https://git-scm.com/docs/partial-clone
[2]:
https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---filterltfilter-specgt
[3]:
https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---filterltfilter-specgt
[4]: https://git-scm.com/docs/git-sparse-checkout
[5]: https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---sparse
2. Mirror URLs
Another approach would be just to make each video available at a few
URLs and have Guix origins with the list. If one of the available URLs
were the Internet Archive, it would have a high degree of assurance of
long-term preservation. I think the biggest downside is that this might
not help much with managing the collection of videos.
Philip