On Mon, 2018-07-09 at 06:16 -0400, Eric S. Raymond wrote: > Janus Weil <ja...@gcc.gnu.org>: > > > The bad news is that my last test run overran the memnory > > > capacity of > > > the 64GB Great Beast. I shall have to find some way of reducing > > > the > > > working set, as 128GB DD4 memory is hideously expensive. > > > > Or maybe you could use a machine from the GCC compile farm? > > > > According to https://gcc.gnu.org/wiki/CompileFarm, there are three > > machines with at least 128GB available (gcc111, gcc112, gcc119). > > The Great Beast is a semi-custom PC optimized for doing graph theory > on working sets gigabytes wide - its design emphasis is on the best > possible memory caching. If I dropped back to a conventional machine > the test times would go up by 50% (benchmarked, that's not a guess), > and they're already bad enough to make test cycles very painful. > I just saw elapsed time 8h30m36.292s for the current test - I had it > down to 6h at one point but the runtimes scale badly with increasing > repo size, there is intrinsically O(n**2) stuff going on. > > My first evasive maneuver is therefore to run tests with my browser > shut down. That's working. I used to do that before I switched from > C-Python to PyPy, which runs faster and has a lower per-object > footprint. Now it's mandatory again. Tells me I need to get the > conversion finished before the number of commits gets much higher.
I wonder if one approach would be to tune PyPy for the problem? I was going to check that you've read: https://pypy.org/performance.html but I see you've already contributed text to it :) For CPU, does PyPy's JIT get a chance to kick in and turn the hot loops into machine code, or is it stuck interpreting bytecode for the most part? For RAM, is there a way to make PyPy make more efficient use of the RAM to store the objects? (PyPy already has a number of tricks it uses to store things more efficiently, and it's possible, though hard, to teach it new ones) This is possibly self-serving, as I vaguely know them from my days in the Python community, but note that the PyPy lead developers have a consulting gig where they offer paid consulting on dealing with Python and PyPy performance issues: https://baroquesoftware.com/ (though I don't know who would pay for that for the GCC repo conversion) Hope this is constructive. Dave > More memory would avoid OOM but not run the tests faster. More cores > wouldn't help due to Python's GIL problem - many of reposurgeon's > central algorithms are intrinsically serial, anyway. Higher > single-processor speed could help a lot, but there plain isn't > anything in COTS hardware that beats a Xeon 3 cranking 3.5Ghz by > much. (The hardware wizard who built the Beast thinks he might be > able > to crank me up to 3.7GHz later this year but that hardware hasn't > shipped yet.) > The one technical change that might help is moving reposurgeon from > Python to Go - I might hope for as much as a 10x drop in runtimes > from > that and a somewhat smaller decrease in working set. Unfortunately > while the move is theoretically possible (I've scoped the job) that > too would be very hard and take a long time. It's 14KLOC of the most > algorithmically dense Python you are ever likely to encounter, with > dependencies on Python libraries sans Go equivalents that might > double the LOC; only the fact that I built a *really good* > regression- > and unit-test suite in self-defense keeps it anywhere near to > practical. > > (Before you ask, at the time I started reposurgeon in 2010 there > wasn't any really production-ready language that might have been a > better fit than Python. I did look. OO languages with GC and compiled > speed are still pretty thin on the ground.) > > The truth is we're near the bleeding edge of what conventional tools > and hardware can handle gracefully. Most jobs with working sets as > big as this one's do only comparatively dumb operations that can be > parallellized and thrown on a GPU or supercomputer. Most jobs with > the algorithmic complexity of repository surgery have *much* smaller > working sets. The combination of both extrema is hard.