On Thu, 30 May 2024 at 03:47, Xiaoqiao He <hexiaoq...@apache.org> wrote:

> Strong +1. One concerns, how we define 'unnecessary branches', which
> mean how to distinguish the branches someone accidentally created, I try
> to traverse some of them but didn't get one obvious rule. Thanks.
>


If we take a snapshot into an archive repository then nothing will be lost
forever (actually, github fork should let us do this, so someone could just
log in as hadoop yetus and fork it there, perhaps)

Then we'd just look to see

   1. what feature branches are merged. e.g MR-379 is actually yarn itself.
   2. what feature branches are abandoned
   3. what release branches can we retire (all of 0.x, 1.x, early 2.x).
   maybe every branch not getting active maintenance, leaving only release
   tags?


Keeping all the tags will still result in a large repo. but removing
feature branches will potentially be good as the final merge was inevitably
a squashed merge...the intermediate chain of commits can be purged provided
there aren't tags associated with them




> Best Regards,
> - He Xiaoqiao
>
>
>
> On Wed, May 29, 2024 at 10:11 PM Ayush Saxena <ayush...@gmail.com> wrote:
>
>> +1 for the proposal, the first thing might be to just drop the
>> unnecessary branches which are either from dependabot or someone
>> accidentally created a branch in the main repo rather than in their
>> fork, there are many, I don't think we need them in the archived repo
>> either
>>
>> Regarding (3) If you mean just branches, then should be ok, maybe lets
>> not touch the release tags for now IMO
>>
>> Regrading the regex, I tried this locally & it works
>> ```
>> ayushsaxena@ayushsaxena hadoop % git tag --delete  `git tag --list
>> 'ozone*'`
>> Deleted tag 'ozone-0.2.1-alpha-RC0' (was 90b070452bc7)
>> Deleted tag 'ozone-0.3.0-alpha' (was e9921ebf7e8d)
>> Deleted tag 'ozone-0.3.0-alpha-RC0' (was 3fbd1f15b894)
>> Deleted tag 'ozone-0.3.0-alpha-RC1' (was cdad29240e52)
>> Deleted tag 'ozone-0.4.0-alpha-RC0' (was 07fd26ef6d8c)
>> Deleted tag 'ozone-0.4.0-alpha-RC1' (was c4f9a20bbe55)
>> Deleted tag 'ozone-0.4.0-alpha-RC2' (was 6860c595ed19)
>> Deleted tag 'ozone-0.4.1-alpha' (was 687173ff4be4)
>> Deleted tag 'ozone-0.4.1-alpha-RC0' (was 9062dac447c8)
>> ayushsaxena@ayushsaxena hadoop %
>>
>> ```
>>
>> -Ayush
>>
>> On Wed, 29 May 2024 at 18:07, Steve Loughran
>> <ste...@cloudera.com.invalid> wrote:
>> >
>> > I'm waiting for a git update of trunk to complete, not having done it
>> since
>> > last week. The 1.8 GB download is taking a long time over a VPN.
>> >
>> > Updating files: 100% (8518/8518), done.
>> > Switched to branch 'trunk'
>> > Your branch is up to date with 'apache/trunk'.
>> > remote: Enumerating objects: 4142992, done.
>> > remote: Counting objects: 100% (4142972/4142972), done.
>> > remote: Compressing objects: 100% (503038/503038), done.
>> > ^Receiving objects:  11% (483073/4142404), 204.18 MiB | 7.05 MiB/s
>> > remote: Total 4142404 (delta 3583765), reused 4140936 (delta 3582453)
>> > Receiving objects: 100% (4142404/4142404), 1.80 GiB | 6.36 MiB/s, done.
>> > Resolving deltas:  42% (1505182/3583765)
>> > ...
>> >
>> >
>> > We have too many branches and too many tags, which makes for big
>> downloads
>> > and slow clones, as well as complaints from git whenever I manually push
>> > things to gitbox.apache.org.
>> >
>> > I think we can/should clean up, which can be done as
>> >
>> >
>> >    1. Create a hadoop-source-archive repository,
>> >    2. into which we add all of the current hadoop-repository. This
>> ensures
>> >    all the history is preserved.
>> >    3. Delete all the old release branches, where old is defined as,
>> maybe <
>> >    2.6?
>> >    4. feature branches which are merged/abandoned
>> >    5. all the single JIRA branches which are the same thing, "MR-279"
>> being
>> >    a key example. ozone-* probably too.
>> >    6. Do some tag pruning too. (Is there a way to do this with
>> wildcards? I
>> >    could use it locally...)
>> >
>> > With an archive repo, all the old development history for branches off
>> the
>> > current release chain + tags are still available, but the core repo is
>> > much, much smaller.
>> >
>> > What do people think?
>> >
>> > If others are interested, I'll need some help carefully getting the
>> > hadoop-source-archive repo up. We'd need to somehow get all of hadoop
>> trunk
>> > into it.
>> >
>> > Meanwhile, I will cull some merged feature branches.
>> >
>> > Steve
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>>

Reply via email to