[DISCUSS] pruning surplus branches/tags from the main hadoop repo

Steve Loughran Wed, 29 May 2024 05:37:29 -0700

I'm waiting for a git update of trunk to complete, not having done it since
last week. The 1.8 GB download is taking a long time over a VPN.


Updating files: 100% (8518/8518), done.
Switched to branch 'trunk'
Your branch is up to date with 'apache/trunk'.
remote: Enumerating objects: 4142992, done.
remote: Counting objects: 100% (4142972/4142972), done.
remote: Compressing objects: 100% (503038/503038), done.
^Receiving objects:  11% (483073/4142404), 204.18 MiB | 7.05 MiB/s
remote: Total 4142404 (delta 3583765), reused 4140936 (delta 3582453)
Receiving objects: 100% (4142404/4142404), 1.80 GiB | 6.36 MiB/s, done.
Resolving deltas:  42% (1505182/3583765)
...


We have too many branches and too many tags, which makes for big downloads
and slow clones, as well as complaints from git whenever I manually push
things to gitbox.apache.org.

I think we can/should clean up, which can be done as


   1. Create a hadoop-source-archive repository,
   2. into which we add all of the current hadoop-repository. This ensures
   all the history is preserved.
   3. Delete all the old release branches, where old is defined as, maybe <
   2.6?
   4. feature branches which are merged/abandoned
   5. all the single JIRA branches which are the same thing, "MR-279" being
   a key example. ozone-* probably too.
   6. Do some tag pruning too. (Is there a way to do this with wildcards? I
   could use it locally...)

With an archive repo, all the old development history for branches off the
current release chain + tags are still available, but the core repo is
much, much smaller.

What do people think?

If others are interested, I'll need some help carefully getting the
hadoop-source-archive repo up. We'd need to somehow get all of hadoop trunk
into it.

Meanwhile, I will cull some merged feature branches.

Steve

[DISCUSS] pruning surplus branches/tags from the main hadoop repo

Reply via email to