Re: [DISCUSS] pruning surplus branches/tags from the main hadoop repo

Ayush Saxena Wed, 29 May 2024 07:12:31 -0700

+1 for the proposal, the first thing might be to just drop the
unnecessary branches which are either from dependabot or someone
accidentally created a branch in the main repo rather than in their
fork, there are many, I don't think we need them in the archived repo
either


Regarding (3) If you mean just branches, then should be ok, maybe lets
not touch the release tags for now IMO

Regrading the regex, I tried this locally & it works
```
ayushsaxena@ayushsaxena hadoop % git tag --delete  `git tag --list 'ozone*'`
Deleted tag 'ozone-0.2.1-alpha-RC0' (was 90b070452bc7)
Deleted tag 'ozone-0.3.0-alpha' (was e9921ebf7e8d)
Deleted tag 'ozone-0.3.0-alpha-RC0' (was 3fbd1f15b894)
Deleted tag 'ozone-0.3.0-alpha-RC1' (was cdad29240e52)
Deleted tag 'ozone-0.4.0-alpha-RC0' (was 07fd26ef6d8c)
Deleted tag 'ozone-0.4.0-alpha-RC1' (was c4f9a20bbe55)
Deleted tag 'ozone-0.4.0-alpha-RC2' (was 6860c595ed19)
Deleted tag 'ozone-0.4.1-alpha' (was 687173ff4be4)
Deleted tag 'ozone-0.4.1-alpha-RC0' (was 9062dac447c8)
ayushsaxena@ayushsaxena hadoop %

```

-Ayush

On Wed, 29 May 2024 at 18:07, Steve Loughran
<ste...@cloudera.com.invalid> wrote:
>
> I'm waiting for a git update of trunk to complete, not having done it since
> last week. The 1.8 GB download is taking a long time over a VPN.
>
> Updating files: 100% (8518/8518), done.
> Switched to branch 'trunk'
> Your branch is up to date with 'apache/trunk'.
> remote: Enumerating objects: 4142992, done.
> remote: Counting objects: 100% (4142972/4142972), done.
> remote: Compressing objects: 100% (503038/503038), done.
> ^Receiving objects:  11% (483073/4142404), 204.18 MiB | 7.05 MiB/s
> remote: Total 4142404 (delta 3583765), reused 4140936 (delta 3582453)
> Receiving objects: 100% (4142404/4142404), 1.80 GiB | 6.36 MiB/s, done.
> Resolving deltas:  42% (1505182/3583765)
> ...
>
>
> We have too many branches and too many tags, which makes for big downloads
> and slow clones, as well as complaints from git whenever I manually push
> things to gitbox.apache.org.
>
> I think we can/should clean up, which can be done as
>
>
>    1. Create a hadoop-source-archive repository,
>    2. into which we add all of the current hadoop-repository. This ensures
>    all the history is preserved.
>    3. Delete all the old release branches, where old is defined as, maybe <
>    2.6?
>    4. feature branches which are merged/abandoned
>    5. all the single JIRA branches which are the same thing, "MR-279" being
>    a key example. ozone-* probably too.
>    6. Do some tag pruning too. (Is there a way to do this with wildcards? I
>    could use it locally...)
>
> With an archive repo, all the old development history for branches off the
> current release chain + tags are still available, but the core repo is
> much, much smaller.
>
> What do people think?
>
> If others are interested, I'll need some help carefully getting the
> hadoop-source-archive repo up. We'd need to somehow get all of hadoop trunk
> into it.
>
> Meanwhile, I will cull some merged feature branches.
>
> Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] pruning surplus branches/tags from the main hadoop repo

Reply via email to