It took a couple of days of struggling, but I have succeeded in getting the GCC repo to load into reposurgeon on a 64GB machine. In only 6 hours. :-)
In the process I found a couple of optimizations to reposurgeon that dramatically increased its read speed and somewhat reduced maximimum working set. But the real win was switching to PyPy rather than CPython as the Python interpreter - it turns out this is exactly the kind of job load for which their JIT compilation shines. You have a crapload of cv2svn artifacts in the early history - redundant D/M pairs generated while making Subversion tag commits. That in itself is quite usual. But fully 50% of the load time (three hours!) is spent optimizing these out, which is a degree of severity I've never seen before. That's a solved problem now. There's a fair amount of surgery to be done still. You have 151 mid-branch deletealls. This usually indicates that a Subversion tag or branch was created by mistake, and someone later tried to undo the error by deleting the tag/branch directory before recreating it with a copy operation. *Usually* the right thing is to reroot the portion of the branch forward of the delete and discard the commits before it, but these cases will need to be checked by hand. But now that the initial load has succeeded, the rest is just hard work, as opposed to can-it-be-done-at-all? territory. And, as previously noted, I am now authorized to concentrate on it until it's done. Actually my project manager and the senior devs on the NTPsec team are following this work with lively interest and making constructive suggestions. Moving that history out of Bitkeeper was an epic, too, and as a result most of them are at least somewhat familiar with this class of problem and find it interesting. Now synced to r255661. -- <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a> My work is funded by the Internet Civil Engineering Institute: https://icei.org Please visit their site and donate: the civilization you save might be your own.