I'm waiting for a git update of trunk to complete, not having done it since last week. The 1.8 GB download is taking a long time over a VPN.
Updating files: 100% (8518/8518), done. Switched to branch 'trunk' Your branch is up to date with 'apache/trunk'. remote: Enumerating objects: 4142992, done. remote: Counting objects: 100% (4142972/4142972), done. remote: Compressing objects: 100% (503038/503038), done. ^Receiving objects: 11% (483073/4142404), 204.18 MiB | 7.05 MiB/s remote: Total 4142404 (delta 3583765), reused 4140936 (delta 3582453) Receiving objects: 100% (4142404/4142404), 1.80 GiB | 6.36 MiB/s, done. Resolving deltas: 42% (1505182/3583765) ... We have too many branches and too many tags, which makes for big downloads and slow clones, as well as complaints from git whenever I manually push things to gitbox.apache.org. I think we can/should clean up, which can be done as 1. Create a hadoop-source-archive repository, 2. into which we add all of the current hadoop-repository. This ensures all the history is preserved. 3. Delete all the old release branches, where old is defined as, maybe < 2.6? 4. feature branches which are merged/abandoned 5. all the single JIRA branches which are the same thing, "MR-279" being a key example. ozone-* probably too. 6. Do some tag pruning too. (Is there a way to do this with wildcards? I could use it locally...) With an archive repo, all the old development history for branches off the current release chain + tags are still available, but the core repo is much, much smaller. What do people think? If others are interested, I'll need some help carefully getting the hadoop-source-archive repo up. We'd need to somehow get all of hadoop trunk into it. Meanwhile, I will cull some merged feature branches. Steve