On Mon, 2018-07-09 at 06:16 -0400, Eric S. Raymond wrote:
> Janus Weil <ja...@gcc.gnu.org>:
> > > The bad news is that my last test run overran the memnory
> > > capacity of
> > > the 64GB Great Beast.  I shall have to find some way of reducing
> > > the
> > > working set, as 128GB DD4 memory is hideously expensive.
> > 
> > Or maybe you could use a machine from the GCC compile farm?
> > 
> > According to https://gcc.gnu.org/wiki/CompileFarm, there are three
> > machines with at least 128GB available (gcc111, gcc112, gcc119).
> 
> The Great Beast is a semi-custom PC optimized for doing graph theory
> on working sets gigabytes wide - its design emphasis is on the best
> possible memory caching. If I dropped back to a conventional machine
> the test times would go up by 50% (benchmarked, that's not a guess),
> and they're already bad enough to make test cycles very painful.
> I just saw elapsed time 8h30m36.292s for the current test - I had it
> down to 6h at one point but the runtimes scale badly with increasing
> repo size, there is intrinsically O(n**2) stuff going on.
> 
> My first evasive maneuver is therefore to run tests with my browser
> shut down.  That's working.  I used to do that before I switched from
> C-Python to PyPy, which runs faster and has a lower per-object
> footprint.  Now it's mandatory again.  Tells me I need to get the
> conversion finished before the number of commits gets much higher.

I wonder if one approach would be to tune PyPy for the problem?

I was going to check that you've read:
  https://pypy.org/performance.html
but I see you've already contributed text to it :)

For CPU, does PyPy's JIT get a chance to kick in and turn the hot loops
into machine code, or is it stuck interpreting bytecode for the most
part?

For RAM, is there a way to make PyPy make more efficient use of the RAM
to store the objects?  (PyPy already has a number of tricks it uses to
store things more efficiently, and it's possible, though hard, to teach
it new ones)

This is possibly self-serving, as I vaguely know them from my days in
the Python community, but note that the PyPy lead developers have a
consulting gig where they offer paid consulting on dealing with Python
and PyPy performance issues:
  https://baroquesoftware.com/
(though I don't know who would pay for that for the GCC repo
conversion)

Hope this is constructive.
Dave



> More memory would avoid OOM but not run the tests faster.  More cores
> wouldn't help due to Python's GIL problem - many of reposurgeon's
> central algorithms are intrinsically serial, anyway.  Higher
> single-processor speed could help a lot, but there plain isn't
> anything in COTS hardware that beats a Xeon 3 cranking 3.5Ghz by
> much. (The hardware wizard who built the Beast thinks he might be
> able
> to crank me up to 3.7GHz later this year but that hardware hasn't
> shipped yet.)

> The one technical change that might help is moving reposurgeon from
> Python to Go - I might hope for as much as a 10x drop in runtimes
> from
> that and a somewhat smaller decrease in working set. Unfortunately
> while the move is theoretically possible (I've scoped the job) that
> too would be very hard and take a long time.  It's 14KLOC of the most
> algorithmically dense Python you are ever likely to encounter, with
> dependencies on Python libraries sans Go equivalents that might
> double the LOC; only the fact that I built a *really good*
> regression-
> and unit-test suite in self-defense keeps it anywhere near to
> practical.
> 
> (Before you ask, at the time I started reposurgeon in 2010 there
> wasn't any really production-ready language that might have been a
> better fit than Python. I did look. OO languages with GC and compiled
> speed are still pretty thin on the ground.)
> 
> The truth is we're near the bleeding edge of what conventional tools
> and hardware can handle gracefully.  Most jobs with working sets as
> big as this one's do only comparatively dumb operations that can be
> parallellized and thrown on a GPU or supercomputer.  Most jobs with
> the algorithmic complexity of repository surgery have *much* smaller
> working sets.  The combination of both extrema is hard.

Reply via email to