jlebar added a subscriber: jlebar.
jlebar added a comment.

I'm sure you all have thought about this more than I have, and I apologize if 
this has been brought up before because I haven't been following the thread 
closely.  But I am not convinced by this document that using subrepositories 
beats using a single git repo.

I see two reasons here for using subrepos as opposed to one big repository.

1. Subrepos mirror our current scheme.
2. Subrepos let people check out only the bits of llvm that they want.

I don't find either of these particularly compelling, compared to the 
advantages of one-big-repo (discussed below).  Taking them in turn:

1. Although subrepos would mirror our current scheme, it's going to be 
different *enough* that existing tools are going to have to change either way.  
In particular, the svn view of the master repository is not going to be useful 
for anything.  I tried `svn checkout 
https://github.com/chapuni/llvm-project-submodule`, and the result was 
essentially an empty repository.

2. It's true that subrepos let people check out only the bits that they want.  
But disk space and bandwidth are very cheap today, and LLVM is not as large as 
one might think.  My copy of https://github.com/llvm-project/llvm-project, 
which includes *everything* is 2.5G, whereas my copy of just llvm is 626M.

  Given that a release build of llvm and clang is ~3.5G, a 2.5G source checkout 
doesn't seem at all unreasonable to me.

  If it's really problematic, you can do a shallow checkout, which would take 
the contains-everything repo from 2.5G to 1.3G.  Moreover if it's *really* a 
problem, you can mirror the subdir of llvm that you care about.  Maybe the LLVM 
project could maintain said mirrors for some of the small subrepos that are 
often used independently.

So what's the advantage of using one big repository?  The simple answer is: 
Have you ever *tried* using git submodules?  :)

Submodules make everything more complicated.  Here's an example that I hope 
proves the point.  Suppose you want to commit your current work and switch to a 
new clean branch off head.  You make some changes there, then come back to your 
current work.  And let's assume that all of your changes are to clang only.

  # Commit current work, switch to a clean branch off head, then switch back.
  
  # One big repo: 
  $ git commit  # on old-branch
  $ git fetch
  $ git checkout -b new-branch origin/master
  # Hack hack hack...
  $ git commit
  $ git checkout old-branch
  
  # Submodules, attempt 1:
  $ cd clang
  $ git commit  # on old-branch
  $ git fetch
  $ git checkout -b new-branch origin/master
  # Also have to update llvm...
  $ cd ../llvm
  $ git fetch
  $ git checkout origin/master
  $ cd ../clang
  # Hack hack hack
  $ git commit
  
  # Now we're ready to switch back to old-branch, but...it's not going to work.
  # When we committed our old branch, we didn't save the state of our llvm
  # checkout.  So in particular we don't know which revision to roll it back to.
  
  # Let's try again.
  # Submodules, attempt 2:
  $ cd clang
  $ git commit  # on old-branch
  $ cd ..
  $ git checkout -b old-branch # in master repo
  $ git commit
  
  # Now we have two branches called "old-branch": One in the master repo, and 
one
  # in the clang submodules.  Now let's fetch head.
  
  $ git fetch  # in master repo
  $ git checkout -b new-branch origin/master
  $ git submodule update
  $ cd clang
  $ git checkout -b new-branch
  # Hack hack hack
  $ git commit  # in submodule
  $ cd ..
  $ git commit  # in master repo
  
  # Now we're ready to switch back.
  
  $ git checkout old-branch  # in master repo
  $ git submodule update

For those keeping track at home, this is 5 git commands with the big repo, and 
15 commands (11 git commands) in the submodules world.

Above we assumed that all of our changes were only to clang.  If we're making 
changes to both llvm and clang (say), the one-big-repo workflow remains 
identical, but the submodules workflow becomes even more complicated.

I'm sure people who are better at git than I can golf the above commands, but 
I'll suggest that I'm an above-average git user, so this is probably a 
lower-than-average estimate for the number of git commands (particularly `git 
help` :).  git is hard enough as-is; using submodules like this is asking a lot.

Similarly, I'm sure much of this can be scripted, but...seriously?  :)

Sorry for the wall of text.  tl;dr: One big repo doesn't actually cost that 
much, and that cost is dwarfed by the cost to humans of using submodules as 
proposed.


https://reviews.llvm.org/D22463



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to