On Thu, Apr 12, 2012 at 11:28 AM, Chiheng Xu <chiheng...@gmail.com> wrote: > > The reason why GCC's code is very hard to hack is not simple. In part, > this is because GCC use a very old, extremely hard to understand build > system. In part, this is because GCC developer are more focused on > fixing bugs or adding new features, rather than re-factoring GCC's > code itself. For example, for a .c file that have 15 years old, > people tend to fix its bugs to make it more and more ugly, rather to > rewrite it. > > But I think the big reason is that, GCC tend to have extremely large > .c files, which is typical > 6000 LOC. If you look at LLVM, there are > rarely source code files that is > 2000 LOC. Typical LLVM source code > files have 1000~2000 LOC. Just separating a source code file of 6000 > LOC to several small files or file sections of 1000 LOC can improve > the code significantly. Why has this not been done before ? GCC > developers are reluctant to re-factoring their code may be the reason. > And, as the .c file grows, it become even harder to re-factor. > Thinking in C++ can help you write smaller, easier to understand, > easier to maintain code(C or C++), which have high cohesion and low > coupling. > > And I think the file names of GCC's source can also be changed more > friendly to newbies, using some notion of FQN(fully qualified name) > may be good.
I think one of the reasons is a tools deficiency - at least subversion (which we use) is not able to track code motion, so if you dig in the revision history you will need more intermediate steps, but more important, rely on 2nd level information (like the ChangeLog entry) to tell where a function was moved from. Still some refactoring happens (I think mostly trying to remove APIs is important). But yes, I think we never renamed files ... I suppose when we start moving things into sub-directories that would be a good time to re-think names. At least subversion can handle file-renames just fine ;) Yes, files are too big - but splitting them is not easy unless you can figure out a hierarchy that you can expose. The largest file is dwarf2out.c with 22825 lines, but the average is more like 2000 (just looking at gcc/*.c files). There are only 23 files bigger than 6000 lines (out of 356), so the situation is not as bad as you paint it. But yes, looking at filenames hardly tells you about its contents anymore. Richard.